How AI Can Help Weed Out Faulty Scientific Research
Skip to content
Data Analytics Aug 4, 2020

How AI Can Help Weed Out Faulty Scientific Research

Solid science is more important than ever, yet experts often struggle to predict which studies will replicate. Artificial intelligence could do the job better.

Riley Mann

Based on the research of

Yang Yang

Wu Youyou

Brian Uzzi

Whether it’s a breakthrough vaccine or a discovery about the effectiveness of online learning, all good science relies on one thing: consistency.

Reputable scientific journals assume that the research they publish will replicate—that is, deliver the same results even when the experiment is repeated by someone else. But when a group of researchers put this assumption to the test in 2015, they found that 60 percent of randomly selected psychology papers from the highest-quality journals failed to replicate. Similar patterns were found in economics, biology, and medicine, kicking off what has come to be known as a “replication crisis” in science.

How can scientists restore confidence in their findings? Manually repeating all published experiments would be a straightforward solution, but “it’s completely unaffordable,” says Kellogg professor Brian Uzzi. Instead, since 2015, scientists have identified a technique called “prediction markets,” which can forecast replicability with high accuracy. But the process only works on small batches of studies and can take nearly a year to complete.

Uzzi wondered if artificial intelligence could provide a better shortcut.

Recent advances in natural language processing—the ability of computers to analyze the meaning of text—had convinced Uzzi that AI systems had “some superhuman capabilities” that could be applied to the replication crisis. By training one of these systems to read scientific papers, Uzzi, along with Northwestern University collaborators Yang Yang and Wu Youyou, was able to predict replicability as accurately as prediction markets—but much, much faster.

This boost in efficiency could potentially give journal editors—and even the researchers themselves—an early warning system for gauging whether a scientific study will replicate.

“We wanted a system for self-assessment,” Uzzi says. “We begin with the belief that no scientist is trying to publish bad work. A scientist could write a paper and then put it through the algorithm to see what it thinks. And if it gives you a bad answer, maybe you need to go back and retrace your steps, because it’s a clue that something’s not right.”

Hidden Signals

In order to predict whether a scientific study will replicate without literally rerunning the experiment, a reviewer has to assess the study and look for clues. Traditionally, reviewers didn’t pay much attention to the wording of the paper; instead, they inspected the quantitative methods of the experiment itself: the data, models, and sample sizes that the experimenter used. If the methods appeared sound, it seemed reasonable to assume that the study would replicate. But in practice, “it turned out to be not very diagnostic” for weeding out faulty research, Uzzi says.

“We’re currently doing a study to see if our model could help review all these new COVID-19 papers coming out.”

— Brian Uzzi

Prediction markets preserve this basic strategy of assessing methodology, but improve its effectiveness by asking groups of scientists to review batches of studies at once in a way that mimics the stock market. For instance, 100 reviewers might each be asked to make replication predictions for 100 papers by “investing” in each of them from an imaginary budget. Some reviewers will “invest” more in certain papers than others, representing a higher confidence in those papers’ replicability. With an accuracy rate between 71 and 85 percent, these prediction markets represent the current state of the art for predicting replication.

Uzzi and his team, meanwhile, started from a very different intuition for predicting replication.

Instead of examining scientific studies’ methods and measurements, they discarded that information and looked at what Uzzi calls the “narrative” of a research paper—the description
of it in prose.

The idea was inspired by a branch of psychology called discourse analysis, which shows that people unknowingly phrase their sentences differently depending on how confident they are in what they’re saying. Uzzi thought that a similar hidden signal might be present in the wording of scientific papers—and that modern machine-learning techniques could detect it.

“Imagine that a researcher is writing up how an experiment works, and maybe there’s something that they have a concern about that doesn’t reach their consciousness, but nonetheless leaks out in their writing,” Uzzi says. “We thought the machine might be able to pick some of this up.”

Confidence Maps

Computers can’t actually read scientific papers, but they can be trained to spot sophisticated statistical patterns among the words.

So Uzzi and his coauthors had their AI system convert two million scientific abstracts into a massive matrix showing how many times each word appeared next to every other word. The result was a kind of general, numerical “map” of the scientific writing style. Then, in order to train the system to spot potentially problematic studies, the researchers fed it the full text of 96 psychology papers. Sixty percent of those had failed to replicate.

These machine-learned word-association maps capture differences in how scientists write when their research replicates, and when it doesn’t—with a subtlety and precision that human reviewers can’t match. “When humans read text, by the time you’ve read to the seventh word in a sentence, you’ve already potentially forgotten the first and second word,” Uzzi says. “The machine, on the other hand, has essentially an unlimited consciousness when it comes to absorbing text.”

The team then tested the system on hundreds of scientific papers that it hadn’t encountered before. These papers had all been put to the test of manual replication: some succeeded, some failed. When the AI analyzed the word associations within these papers, it correctly predicted the replication outcome 65 to 78 percent of the time.

That’s roughly equivalent to the accuracy of prediction markets—but with one major advantage. For every batch of 100 papers, prediction markets take months to deliver results.

“Our AI model makes predictions in hours,” says Uzzi. For a single paper, it takes mere minutes.

An Improvement, Not a Replacement

Don’t expect computers to replace human peer review anytime soon, though.

Uzzi stresses that his research is very preliminary and needs further validation. Plus, the AI system has a major downside: there’s no way to tell exactly what pattern the machine is using to make its predictions.

“This is still one of the shortcomings of all artificial intelligence: we’re really not sure why it works the way it does,” he says.

Yet while Uzzi and his collaborators couldn’t establish exactly what their system was paying attention to, they were able to show that it wasn’t
falling prey to biases that often afflict human reviewers, such as an author’s gender or the names of prestigious institutions with which they’re affiliated.

To rule out these potential biases, the researchers added this extra information to the AI system’s training data and reran their experiments to see if that information skewed the results. Including these extra details did not sway the system’s predictions—in practical terms, it ignored them. Additionally, they found no evidence that differences in the scientific discipline, for example social psychology vs. cognitive psychology, affected the predicted outcome.

To Uzzi, that sends a good signal about its reliability.

“Okay, so we don’t know what the machine is doing—that’s a limitation. At the same time, quite frankly, it’s easier to take bias out of a machine than it is a human being,” he says.

Additionally, Uzzi sees AI as a way of improving scientists’ ongoing response to the replication crisis. This is particularly important in the age of COVID-19, when some peer-review and replication standards have become more relaxed in an effort to speed the discovery of a vaccine. An AI-powered early warning system for flagging faulty research could help focus the scientific community’s attention on the findings that are important enough to warrant rigorous—and expensive—manual replication tests.

“We’re currently doing a study to see if our model could help review all these new COVID-19 papers coming out,” Uzzi says. “That’ll help us pinpoint those papers that create the strongest foundation for new scientific discoveries in this race to come up with a cure, a therapy, or both.”

Featured Faculty

Previously a Research Assistant Professor at Kellogg

Richard L. Thomas Professor of Leadership and Organizational Change; Co-Director, Northwestern Institute on Complex Systems (NICO); Professor of Industrial Engineering and Management Sciences, McCormick School (Courtesy); Professor of Sociology, Weinberg College (Courtesy)

About the Writer
John Pavlus is a writer and filmmaker focusing on science, technology, and design topics. He lives in Portland, Oregon.
About the Research
Yang, Yang, Wu Youyou, and Brian Uzzi. 2020. “Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence.” PNAS. 117 (20) 10762-10768.

Read the original

Most Popular This Week
  1. Sitting Near a High-Performer Can Make You Better at Your Job
    “Spillover” from certain coworkers can boost our productivity—or jeopardize our employment.
    The spillover effect in offices impacts workers in close physical proximity.
  2. 5 Tips for Growing as a Leader without Burning Yourself Out
    A leadership coach and former CEO on how to take a holistic approach to your career.
    father picking up kids from school
  3. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  4. 2 Factors Will Determine How Much AI Transforms Our Economy
    They’ll also dictate how workers stand to fare.
    robot waiter serves couple in restaurant
  5. Podcast: How to Discuss Poor Performance with Your Employee
    Giving negative feedback is not easy, but such critiques can be meaningful for both parties if you use the right roadmap. Get advice on this episode of The Insightful Leader.
  6. What Should Leaders Make of the Latest AI?
    As ChatGPT flaunts its creative capabilities, two experts discuss the promise and pitfalls of our coexistence with machines.
    person working on computer next to computer working at a computer
  7. Today’s Gig Workers Are Subject to Endless Experimentation
    “It raises the question, do we want to be a society where experimentation is just the norm?”
    gig worker at computer with three scientists studying them through a window
  8. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  9. How to Make Inclusivity More Than Just an Office Buzzword
    Tips for turning good intentions into actions.
    A group of coworkers sit in various chairs.
  10. China’s Youth Unemployment Problem
    If the record-breaking joblessness persists, as seems likely, China will have an even harder time supporting its rapidly aging population.
    college graduate standing before Chinese flag
  11. Will AI Kill Human Creativity?
    What Fake Drake tells us about what’s ahead.
    Rockstars await a job interview.
  12. Why Are We So Quick to Borrow When the Value of Our Home Rises?
    The reason isn’t as simple as just feeling wealthier.
    A homeowner uses the value of their home to buy things.
  13. Take 5: Research-Backed Tips for Scheduling Your Day
    Kellogg faculty offer ideas for working smarter and not harder.
    A to-do list with easy and hard tasks
  14. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  15. How to Manage a Disengaged Employee—and Get Them Excited about Work Again
    Don’t give up on checked-out team members. Try these strategies instead.
    CEO cheering on team with pom-poms
  16. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  17. The Second-Mover Advantage
    A primer on how late-entering companies can compete with pioneers.
  18. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
More in Data Analytics