From social to natural and applied sciences, overall scientific output has been growing worldwide – it doubles every nine years. Traditionally, researchers solve a problem by conducting
new experiments. With the ever-growing body of scientific literature, though, it is becoming more common to make a discovery
based on the vast number of already-published journal articles.
Researchers synthesize the findings from previous studies to develop a more complete understanding of a phenomenon. Making
sense of this explosion of studies is critical for scientists not only
to build on previous work but also to push research fields forward.
My colleagues Hazhir Rahmandad and Kamran Paynabar and
I have developed a new, more robust way to pull together all the
prior research on a particular topic. In a five-year joint project
between MIT and Georgia Tech, we worked to create a new
technique for research aggregation. Our recently published paper
in PLOS ONE introduces a flexible method that helps synthesize
findings from prior studies, even potentially those with diverse
methods and diverging results. We call it generalized model aggregation, or GMA.
Consider an example from health literature. Obesity and nutrition researchers need reliable equations that estimate basal
metabolic rate (BMR), or the amount of energy the human body
spends at complete rest. Understanding BMR has implications for
real-world questions of weight management.
Researchers often estimate BMR as a function of different
attributes: age, height, weight, fat mass and fat-free mass. The
challenge is that current publications in research journals provide
over 200 such equations estimated for different samples and age
groups. These equations also include different subsets of those
So, which equations are you going to choose to accurately estimate BMR? How do you ensure that your selected equation is
more reliable than the rest?
In order to address these questions, we identified 27 published
BMR equations for white males from published studies. Then we
used GMA to aggregate them into a single equation, which we
called a meta-model.
Through validation tests, we showed that our meta-model is
more precise than any of the prior equations for estimating BMR.
It also can deal with a logarithmic relationship between two variables—something not captured by any of the original 27 linear
How does it work?
There is no magic here. In fact, the intuition behind GMA is
simple—it lets researchers with no extensive statistical background
Broadly, each previous empirical study is an attempt to estimate an underlying reality. Let’s call this the “true model.” And
it is unknown to us; whatever is actually driving the phenomenon
under investigation is nature’s secret. The empirical studies report
relevant information about the true model, even if they are biased
Generalized model aggregation uses computer simulations to
replicate prior studies. This time, though, the simulated studies
attempt to estimate a meta-model instead of the true model (that
We feed the empirical studies’ reported estimates into the simulation. The flexibility of the GMA allows us to also use any other
additional information about the underlying true model too, such
as the relationships among the variables or the quality of empirical
studies’ estimates. This extra information helps increase the reliability of GMA estimates.
The GMA algorithm carefully applies the same sample characteristics to each previous study and replicates their same method.
Then it compares the outcomes of the simulated studies with the
actual results of the empirical studies, trying to find the closest
match. Through this matching process, GMA estimates the meta-model.
If the simulated and actual outputs match, the meta-model may
be a good representation of the true model. That is, by running a
bunch of studies through the GMA algorithm, we are able to tease
out a closer approximation of how the phenomenon in question
In our paper, we discussed a wide range of examples, from
health to climate change and environmental sciences, that can benefit from generalized model aggregation.
In the current replicability crisis, GMA can help not only identify studies that are reproducible, but also distinguish reliable findings from less robust ones.
A recipe for using GMA and its codes, along with instructions,
is publicly available.
Editor’s Note: This is a shortened version of an article originally
published by The Conversation. Please visit their website to read
the article in its entirety. We invite you to submit your personal
commentaries to Last Word on topics that impact your work and
affect the overall industry. Please send your commentaries and
suggested topics to Michelle Taylor, Editor-in-Chief, at michelle.
A New Tool for