How AI Can Help Weed Out Faulty Scientific Research
Skip to content
Data Analytics Aug 4, 2020

How AI Can Help Weed Out Faulty Scientific Research

Solid science is more important than ever, yet experts often struggle to predict which studies will replicate. Artificial intelligence could do the job better.

Riley Mann

Based on the research of

Yang Yang

Wu Youyou

Brian Uzzi

Whether it’s a breakthrough vaccine or a discovery about the effectiveness of online learning, all good science relies on one thing: consistency.

Reputable scientific journals assume that the research they publish will replicate—that is, deliver the same results even when the experiment is repeated by someone else. But when a group of researchers put this assumption to the test in 2015, they found that 60 percent of randomly selected psychology papers from the highest-quality journals failed to replicate. Similar patterns were found in economics, biology, and medicine, kicking off what has come to be known as a “replication crisis” in science.

Add Insight
to your inbox.

We’ll send you one email a week with content you actually want to read, curated by the Insight team.

How can scientists restore confidence in their findings? Manually repeating all published experiments would be a straightforward solution, but “it’s completely unaffordable,” says Kellogg professor Brian Uzzi. Instead, since 2015, scientists have identified a technique called “prediction markets,” which can forecast replicability with high accuracy. But the process only works on small batches of studies and can take nearly a year to complete.

Uzzi wondered if artificial intelligence could provide a better shortcut.

Recent advances in natural language processing—the ability of computers to analyze the meaning of text—had convinced Uzzi that AI systems had “some superhuman capabilities” that could be applied to the replication crisis. By training one of these systems to read scientific papers, Uzzi, along with Northwestern University collaborators Yang Yang and Wu Youyou, was able to predict replicability as accurately as prediction markets—but much, much faster.

This boost in efficiency could potentially give journal editors—and even the researchers themselves—an early warning system for gauging whether a scientific study will replicate.

“We wanted a system for self-assessment,” Uzzi says. “We begin with the belief that no scientist is trying to publish bad work. A scientist could write a paper and then put it through the algorithm to see what it thinks. And if it gives you a bad answer, maybe you need to go back and retrace your steps, because it’s a clue that something’s not right.”

Hidden Signals

In order to predict whether a scientific study will replicate without literally rerunning the experiment, a reviewer has to assess the study and look for clues. Traditionally, reviewers didn’t pay much attention to the wording of the paper; instead, they inspected the quantitative methods of the experiment itself: the data, models, and sample sizes that the experimenter used. If the methods appeared sound, it seemed reasonable to assume that the study would replicate. But in practice, “it turned out to be not very diagnostic” for weeding out faulty research, Uzzi says.

“We’re currently doing a study to see if our model could help review all these new COVID-19 papers coming out.”

— Brian Uzzi

Prediction markets preserve this basic strategy of assessing methodology, but improve its effectiveness by asking groups of scientists to review batches of studies at once in a way that mimics the stock market. For instance, 100 reviewers might each be asked to make replication predictions for 100 papers by “investing” in each of them from an imaginary budget. Some reviewers will “invest” more in certain papers than others, representing a higher confidence in those papers’ replicability. With an accuracy rate between 71 and 85 percent, these prediction markets represent the current state of the art for predicting replication.

Uzzi and his team, meanwhile, started from a very different intuition for predicting replication.

Instead of examining scientific studies’ methods and measurements, they discarded that information and looked at what Uzzi calls the “narrative” of a research paper—the description
of it in prose.

The idea was inspired by a branch of psychology called discourse analysis, which shows that people unknowingly phrase their sentences differently depending on how confident they are in what they’re saying. Uzzi thought that a similar hidden signal might be present in the wording of scientific papers—and that modern machine-learning techniques could detect it.

“Imagine that a researcher is writing up how an experiment works, and maybe there’s something that they have a concern about that doesn’t reach their consciousness, but nonetheless leaks out in their writing,” Uzzi says. “We thought the machine might be able to pick some of this up.”

Confidence Maps

Computers can’t actually read scientific papers, but they can be trained to spot sophisticated statistical patterns among the words.

So Uzzi and his coauthors had their AI system convert two million scientific abstracts into a massive matrix showing how many times each word appeared next to every other word. The result was a kind of general, numerical “map” of the scientific writing style. Then, in order to train the system to spot potentially problematic studies, the researchers fed it the full text of 96 psychology papers. Sixty percent of those had failed to replicate.

These machine-learned word-association maps capture differences in how scientists write when their research replicates, and when it doesn’t—with a subtlety and precision that human reviewers can’t match. “When humans read text, by the time you’ve read to the seventh word in a sentence, you’ve already potentially forgotten the first and second word,” Uzzi says. “The machine, on the other hand, has essentially an unlimited consciousness when it comes to absorbing text.”

The team then tested the system on hundreds of scientific papers that it hadn’t encountered before. These papers had all been put to the test of manual replication: some succeeded, some failed. When the AI analyzed the word associations within these papers, it correctly predicted the replication outcome 65 to 78 percent of the time.

That’s roughly equivalent to the accuracy of prediction markets—but with one major advantage. For every batch of 100 papers, prediction markets take months to deliver results.

“Our AI model makes predictions in hours,” says Uzzi. For a single paper, it takes mere minutes.

An Improvement, Not a Replacement

Don’t expect computers to replace human peer review anytime soon, though.

Uzzi stresses that his research is very preliminary and needs further validation. Plus, the AI system has a major downside: there’s no way to tell exactly what pattern the machine is using to make its predictions.

“This is still one of the shortcomings of all artificial intelligence: we’re really not sure why it works the way it does,” he says.

Yet while Uzzi and his collaborators couldn’t establish exactly what their system was paying attention to, they were able to show that it wasn’t
falling prey to biases that often afflict human reviewers, such as an author’s gender or the names of prestigious institutions with which they’re affiliated.

To rule out these potential biases, the researchers added this extra information to the AI system’s training data and reran their experiments to see if that information skewed the results. Including these extra details did not sway the system’s predictions—in practical terms, it ignored them. Additionally, they found no evidence that differences in the scientific discipline, for example social psychology vs. cognitive psychology, affected the predicted outcome.

To Uzzi, that sends a good signal about its reliability.

“Okay, so we don’t know what the machine is doing—that’s a limitation. At the same time, quite frankly, it’s easier to take bias out of a machine than it is a human being,” he says.

Additionally, Uzzi sees AI as a way of improving scientists’ ongoing response to the replication crisis. This is particularly important in the age of COVID-19, when some peer-review and replication standards have become more relaxed in an effort to speed the discovery of a vaccine. An AI-powered early warning system for flagging faulty research could help focus the scientific community’s attention on the findings that are important enough to warrant rigorous—and expensive—manual replication tests.

“We’re currently doing a study to see if our model could help review all these new COVID-19 papers coming out,” Uzzi says. “That’ll help us pinpoint those papers that create the strongest foundation for new scientific discoveries in this race to come up with a cure, a therapy, or both.”

Featured Faculty

Research Assistant Professor

Richard L. Thomas Professor of Leadership and Organizational Change; Co-Director, Northwestern Institute on Complex Systems (NICO); Faculty Director, Kellogg Architectures of Collaboration Initiative (KACI); Professor of Industrial Engineering and Management Sciences, McCormick School (Courtesy); Professor of Sociology, Weinberg College (Courtesy)

About the Writer
John Pavlus is a writer and filmmaker focusing on science, technology, and design topics. He lives in Portland, Oregon.
About the Research
Yang, Yang, Wu Youyou, and Brian Uzzi. 2020. “Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence.” PNAS. 117 (20) 10762-10768.

Read the original

Most Popular This Week
  1. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  2. 3 Tips for Reinventing Your Career After a Layoff
    It’s crucial to reassess what you want to be doing instead of jumping at the first opportunity.
    woman standing confidently
  3. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  4. 6 Takeaways on Inflation and the Economy Right Now
    Are we headed into a recession? Kellogg’s Sergio Rebelo breaks down the latest trends.
    inflatable dollar sign tied down with mountains in background
  5. What Is the Purpose of a Corporation Today?
    Has anything changed in the three years since the Business Roundtable declared firms should prioritize more than shareholders?
    A city's skyscrapers interspersed with trees and rooftop gardens
  6. How to Get the Ear of Your CEO—And What to Say When You Have It
    Every interaction with the top boss is an audition for senior leadership.
    employee presents to CEO in elevator
  7. Why We Can’t All Get Away with Wearing Designer Clothes
    In certain professions, luxury goods can send the wrong signal.​
    Man wearing luxury-brand clothes walks with a cold wind behind him, chilling three people he passes.
  8. Why You Should Skip the Easy Wins and Tackle the Hard Task First
    New research shows that you and your organization lose out when you procrastinate on the difficult stuff.
    A to-do list with easy and hard tasks
  9. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  10. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  11. When Do Open Borders Make Economic Sense?
    A new study provides a window into the logic behind various immigration policies.
    How immigration affects the economy depends on taxation and worker skills.
  12. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  13. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  14. How Old Are Successful Tech Entrepreneurs?
    A definitive new study dispels the myth of the Silicon Valley wunderkind.
    successful entrepreneurs are most often middle aged
  15. How Offering a Product for Free Can Backfire
    It seems counterintuitive, but there are times customers would rather pay a small amount than get something for free.
    people in grocery store aisle choosing cheap over free option of same product.
  16. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  17. College Campuses Are Becoming More Diverse. But How Much Do Students from Different Backgrounds Actually Interact?
    Increasing diversity has been a key goal, “but far less attention is paid to what happens after we get people in the door.”
    College quad with students walking away from the center
  18. How Peer Pressure Can Lead Teens to Underachieve—Even in Schools Where It’s “Cool to Be Smart”
    New research offers lessons for administrators hoping to improve student performance.
    Eager student raises hand while other student hesitates.
More in Data Analytics