When a Bunch of Economists Look at the Same Data, Do They All See It the Same Way?
Skip to content
Economics Finance & Accounting Jan 7, 2022

When a Bunch of Economists Look at the Same Data, Do They All See It the Same Way?

Not at all, according to a recent study, which showed just how much noise can be introduced by researchers’ unique analytical approaches.

two people stand on a scatterplot grid

Lisa Röper

Based on the research of

Robert Korajczyk

Dermot Murphy

and coauthors

If you handed the same data to 164 teams of economists and asked them to answer the same questions, would they reach a consensus? Or would they offer 164 different answers?

A new study put this exact proposition to the test. One hundred sixty-four teams of researchers analyzed the same financial-market dataset separately and wrote up their conclusions in 164 short papers. Teams were then given several rounds of feedback, mimicking the kind of informal peer-review process that economists engage in before they submit to an academic journal. All the researchers involved wanted to know how much variation would exist among their different papers.

It turns out, a lot.

Data can be messy, notoriously so. And so scientists and researchers have developed reams of strategies for cleaning and analyzing and ultimately harnessing data to draw conclusions. But this unusual study—an analysis of 164 separate analyses—suggests that the decisions that go into choosing how to clean the datasets, analyze them, and come to a conclusion can in fact add just as much noise as the data themselves.

In an increasingly data-driven world, this is important to keep in mind, according to Robert Korajczyk, a professor of finance at Kellogg. Korajczyk and a former Kellogg PhD student, Dermot Murphy, now a professor at University of Illinois Chicago, served as one of the 164 research teams involved in the project.

Kellogg Insight recently spoke with Korajczyk about the experience, and what researchers and the general public can take away from the study’s surprising conclusion.

This conversation has been edited for length and clarity.

Kellogg Insight: Can you start by explaining the data that you and the other 163 research teams were asked to analyze?

Korajczyk: Yes. Each research team was given a dataset that covers 17 years of trading activity in the most liquid futures contract in Europe, the Euro Stoxx 50. That was essentially 720 million trades. And there were six research questions that teams were asked to look at. For example, did pricing get more or less efficient? Did the markets get more or less liquid? And did the fraction of agency trades change over time?

KI: These are pretty fundamental trends that you would want to understand if you were trying to gauge the health of this market.

Korajczyk: Yes, absolutely. But the broader goal of the research was what really interested me.

KI: Namely, how different research teams would approach the same set of questions?

Korajczyk: Yes. These types of “crowdsource” projects have happened in other fields, but this is the first that I’m aware of in finance. And few projects are at the scale of this particular project. It’s more typical to have 15 or 20 teams. A hundred and sixty-four is really large. So my coauthor Dermot Murphy and I decided to team up and get involved.

KI: Talk to me about the 164 different papers that were submitted. What should we understand?

Korajczyk: There’s a statistical concept called “standard error,” which tells you about the uncertainty in a parameter estimate such as a mean. The standard error of a mean is going to be larger when data are noisy and it’s going to be smaller when there are more observations.

But then there is another kind of “error” or noise to take into consideration. And that’s all the decisions that go into getting to that point. There are a lot of different ways to measure market efficiency, for instance, so that’s one of the decisions that a research team would have to make. When you clean the data, how do you handle those outliers? Do you throw them out or do you change them to another value that is large but not as large? What will be the form of your statistical model? What software are you using? Are you a good coder or a bad coder?

All those choices that are made by the research team, as well as their inherent ability, go into creating new variation in the output. We call this the “nonstandard error.”

KI: And when the teams originally submitted their papers, these nonstandard errors were about as large as the standard errors.

Korajczyk: Right, so I guess one way to think about it is if you’re going to read a paper and say, “Okay, how much credence do I place on these results?” the standard errors tell you something about the noise in the data. But the researchers made a lot of choices that I may or may not have made. So maybe that noisiness in the results is actually double what it looks like from just looking at the standard errors.

KI: Did that surprise you?

Korajczyk: It doesn’t surprise me that there was variation. The size was larger than I thought it would be. There were also some clear outliers that seemed totally outlandish to me.

Another surprise was that some of these outlandish results were there in every round. At each stage you learn something about what reviewers think or what other teams have done, and you’re allowed to revise your paper with that knowledge. But even after peer review and the opportunity to see other teams’ papers, a lot of outlandish results stuck around.

In each stage, though, the dispersion across teams did go down somewhat.

KI: It seems there were some true philosophical differences in how the questions should be approached and how the analyses should be conducted.

Korajczyk: Absolutely. And in a sense this project actually constrained these differences. We were told, “Here are the data and you’re only allowed to use these data.” You weren’t allowed to grab other data that might be relevant for answering that question and add them to the database. That would have likely increased the dispersion across teams.

KI: There’s certainly a “researchers beware!” message to this work, as you determine just how much you can trust the conclusions in the literature. This only adds to growing concern among scientists about a “replication crisis.”

Are there certain changes that you think should make to account for these ubiquitous nonstandard errors? For instance, should academic articles allot more space to methods sections so that researchers can communicate more transparently about their choices?

Korajczyk: The standard has always been that someone who’s read your paper and decides to replicate it should be able to do that from what you’ve written in the paper. If you have truncated some outliers, they should know exactly how you truncated them. Now I can’t guarantee you that every paper is written that way. But that’s the standard of good writing, and that standard has been there for a long time.

But what a paper doesn’t normally tell people is, “I tried this specification and decide not to use it, and I tried that specification and decided not to use it. And, oh yeah, I should have controlled for this other variable.”

But there are some changes for the better. These days it is much more common to have lengthy appendices available on the journal’s website. These can go into much more detail about the robustness of the results. That can give the reader some confidence that you can look at the data in a lot of different ways and get the same results. Does everyone go through and read the 120-page appendix? No, but people who are very interested in that topic might. Another thing that’s getting more common is requiring researchers to post our code. That makes it easier to replicate results and determine whether they are robust.

KI: What should the general public make of this research? If I’m reading an article in Bloomberg or The Wall Street Journal that cites a new finance study, how seriously should I take those conclusions?

Korajczyk: Well, whether it’s finance research or medical research or psychology or sociology, it’s always helpful to be skeptical. If I’m listening to the news, for instance, one thing that news reports rarely tell you is the sample size of the study. Now, with Covid-19, this is changing somewhat, but knowing the sample size tells me a lot about whether I want to take this result seriously.

I also think it’s helpful to ask, “What are the incentives?” If it is someone trying to get tenure, there is a bias toward finding statistically significant results. If it is someone who works for a money-management firm, their financial incentives could be aligned with economically significant results going in a particular direction.

Finally, be cognizant of the fact that there are many different choices that researchers have to make. If you read, “we did X” in one line in a paper or footnote, it may not be as innocuous as it seems.

Featured Faculty

Harry G. Guthmann Professor of Finance; Co-Director, Financial Institutions and Markets Research Center

About the Writer

Jessica Love is editor in chief of Kellogg Insight.

About the Research

Menkweld, Albert J., Anna Dreber, Felix Holzmeister, Juergen Huber, et al. 2021. “Non-Standard Errors.” SSRN. November 23.

Read the original

Most Popular This Week
  1. Sitting Near a High-Performer Can Make You Better at Your Job
    “Spillover” from certain coworkers can boost our productivity—or jeopardize our employment.
    The spillover effect in offices impacts workers in close physical proximity.
  2. Podcast: How to Discuss Poor Performance with Your Employee
    Giving negative feedback is not easy, but such critiques can be meaningful for both parties if you use the right roadmap. Get advice on this episode of The Insightful Leader.
  3. 2 Factors Will Determine How Much AI Transforms Our Economy
    They’ll also dictate how workers stand to fare.
    robot waiter serves couple in restaurant
  4. Will AI Kill Human Creativity?
    What Fake Drake tells us about what’s ahead.
    Rockstars await a job interview.
  5. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  6. 5 Tips for Growing as a Leader without Burning Yourself Out
    A leadership coach and former CEO on how to take a holistic approach to your career.
    father picking up kids from school
  7. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  8. What Should Leaders Make of the Latest AI?
    As ChatGPT flaunts its creative capabilities, two experts discuss the promise and pitfalls of our coexistence with machines.
    person working on computer next to computer working at a computer
  9. Today’s Gig Workers Are Subject to Endless Experimentation
    “It raises the question, do we want to be a society where experimentation is just the norm?”
    gig worker at computer with three scientists studying them through a window
  10. How to Make Inclusivity More Than Just an Office Buzzword
    Tips for turning good intentions into actions.
    A group of coworkers sit in various chairs.
  11. China’s Youth Unemployment Problem
    If the record-breaking joblessness persists, as seems likely, China will have an even harder time supporting its rapidly aging population.
    college graduate standing before Chinese flag
  12. The Psychological Factor That Helps Shape Our Moral Decision-Making
    We all have a preferred motivation style. When that aligns with how we’re approaching a specific goal, it can impact how ethical we are in sticky situations.
    a person puts donuts into a bag next to a sign that reads "limit one"
  13. How to Manage a Disengaged Employee—and Get Them Excited about Work Again
    Don’t give up on checked-out team members. Try these strategies instead.
    CEO cheering on team with pom-poms
  14. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  15. Why Are We So Quick to Borrow When the Value of Our Home Rises?
    The reason isn’t as simple as just feeling wealthier.
    A homeowner uses the value of their home to buy things.
  16. One Key to a Happy Marriage? A Joint Bank Account.
    Merging finances helps newlyweds align their financial goals and avoid scorekeeping.
    married couple standing at bank teller's window
  17. What’s at Stake in the Debt-Ceiling Standoff?
    Defaulting would be an unmitigated disaster, quickly felt by ordinary Americans.
    two groups of politicians negotiate while dangling upside down from the ceiling of a room
  18. Take 5: Research-Backed Tips for Scheduling Your Day
    Kellogg faculty offer ideas for working smarter and not harder.
    A to-do list with easy and hard tasks
  19. The Second-Mover Advantage
    A primer on how late-entering companies can compete with pioneers.
More in Economics