Is Wikipedia Biased?
Skip to content
The Insightful Leader Live: What to Know about Today’s AI—and Tomorrow’s | Register Now
Strategy Economics Dec 1, 2012

Is Wikipedia Biased?

Verifying the “neutral point of view”

Based on the research of

Shane Greenstein

Feng Zhu

Listening: Interview with Shane Greenstein on Wikipedia
download
0:00 Skip back button Play Skip forward button 10:29

In the eleven years since its founding, the free crowdsourced encyclopedia Wikipedia has grown from a techno-utopian curiosity into an indispensable resource for millions of users. A 2005 study by the journal Nature showed that Wikipedia’s corpus of articles—now totaling more than four million, each written and edited by unpaid volunteers—is about as factually accurate, on average, as the Encyclopædia Britannica.

Maintaining what Wikipedia calls “the neutral point of view” (or NPOV) is relatively easy when writing about science topics or otherwise objectively verifiable subjects. But in other topics, such as politics and history, bias and controversy inevitably arise.

“The neutral point of view is much more of an article of faith in the way Wikipedia is organized than a tested proposition,” says Shane Greenstein, a professor of management and strategy at the Kellogg School of Management. “But you can’t test it without first generating a benchmark for bias and slant.” So Greenstein and his co-author Feng Zhu, an assistant professor at the University of Southern California, applied a method originally designed to define political bias in printed newspapers to set a quantitative baseline for defining favoritism on Wikipedia.

This technique, created by Matthew Gentzkow and Jesse M. Shapiro of the University of Chicago, samples the 2005 Congressional Record for a list of 1,000 “code phrases” used disproportionately by either Democrats or Republicans. The frequency of these phrases can then be used as a signal for political bias when performing statistical analysis on large sets of newspaper articles. Greenstein and Zhu are the first to apply this method to Wikipedia’s online repository.

“At one level it’s a mysterious black box and on another level it’s totally obvious,” Greenstein says. “‘Obamacare,’ ‘death panels,’ ‘civil rights,’ ‘illegal immigration,’ ‘estate taxes’: these phrases are used by the parties deliberately to appeal to their respective constituents very specifically. That’s what makes them such a great signal for measuring bias, because they come laden with so much presumed slant.”

Sifting Through Wikipedia
But in order to do their analysis, Greenstein and Zhu first had to winnow Wikipedia’s six-terabyte collection of articles down to a manageable number. They first searched for articles containing the words “democrat” or “republican,” which produced a set of 111,216 articles; they then filtered out entries concerning non-U.S. politics, resulting in a list of just over 70,000 articles. Analyzing these articles for Gertzkow and Shapiro’s biased phrases “went pretty quick—about ten minutes for a computer program,” Greenstein says.

The authors found that vintages from early in Wikipedia’s history displayed a distinct Democratic slant.

Greenstein and Zhu’s results were limited to what they refer to as “vintages”—that is, the first version of articles that appear on Wikipedia. In aggregate, this provides a static snapshot of the amount of bias present in Wikipedia’s “first draft.” (Forthcoming studies will examine how this bias is affected by Wikipedia’s ongoing revision process.) The authors found that vintages from early in Wikipedia’s history displayed a distinct Democratic slant. Later vintages were less slanted, meaning that the 70,000-article sample exhibited, on average, a “drift” toward NPOV over the course of a decade.

“An article ‘born’ in 2002 turns out, on average, to be very slanted—much more so than an article first entered in 2008,” Greenstein explains. “These ‘vintage effects’ pretty much disappear after 2005, so it’s really the early articles that are heavily slanted.”

Greenstein and Zhu’s paper does not speculate as to why this early-vintage bias exists, but Greenstein offers several possible explanations. “One has to do with who was online in 2002 and 2003, participating in Wikipedia,” he says. “There are obvious biases among college kids, who were online more intensely in that period.” Broadband Internet penetration may also explain some of the bias in early vintages: “Early broadband users tended to come from a specific education group, again mostly college kids with fast on-campus Internet connections.” Or, Greenstein notes, “perhaps it was just the luck of the draw that a group of highly opinionated Democrats were among the first to be contributing to Wikipedia—perhaps because they were more interested in open systems.”

A Benchmark
Whatever the cause of Wikipedia’s political bias, Greenstein and Zhu’s results establish a quantitative benchmark for examining the presence of that bias. But Greenstein cautions that applying Gertzkow and Shapiro’s statistical model to Wikipedia is not without its ambiguities. “Unlike a newspaper corpus, which is made up of new, unique articles every day that you can sample multiple times to determine bias, on Wikipedia you see the same articles over time,” he explains. “So if you don’t find these ‘code phrases’ in an article, is it because there’s really no bias, or is it because the Gertzkow and Shapiro method is uninformative in that instance?” Follow-up studies have indicated that the former—that an absence of code phrases means that the article is politically neutral—is likely to be the case, Greenstein says.

Greenstein and Zhu’s findings also suggest that while Wikipedia’s collection of 70,000 articles on U.S. politics is, on average, converging over time toward NPOV, Wikipedia’s “bottom-up” revision process contributes only slightly to this outcome. Instead, the overall drift toward NPOV has arisen from Wikipedia’s sheer growth: as newer, less biased vintages (or vintages with an opposite political slant) began to outnumber the older ones, the corpus’s formerly Democratic bias has been slowly averaging out. But while Wikipedia’s political content may be trending toward neutrality in the aggregate, individual articles may fall anywhere along the spectrum of political bias.

“If a Wikipedia user was looking to get a complete, unbiased view on a particular topic, it wouldn’t necessarily come from one article,” Greenstein explains. “Most users do read more than one article, but how often do articles with different slants link to one another? That’s an open question motivated by our research.”


Related reading on Kellogg Insight

Why Broadband Prices Haven’t Decreased: Creating the first broadband consumer price index

What Has the Internet Done for the Economy? The puzzling spread of the commercial Internet could explain wage inequalities

All Politics Is Cultural: Cultural not economic vocabularies separate liberals and conservatives

Related case

Greenstein, Shane and Michael Devereux. 2009. Wikipedia in the Spotlight. Case 5-306-507 (KEL253).


Featured Faculty

Member of the Strategy Department faculty until 2015

About the Writer
John Pavlus is a writer and filmmaker focusing on science, technology, and design topics. He lives in Brooklyn, New York.
About the Research

Greenstein, Shane, and Feng Zhu. 2012. “Is Wikipedia Biased?” American Economic Review 102(3): 343–348.

Read the original

Most Popular This Week
  1. Understanding the Pandemic’s Lasting Impact on Real Estate
    Work-from-home has stuck around. What does this mean for residential and commercial real-estate markets?
    realtor showing converted office building to family
  2. What Went Wrong at AIG?
    Unpacking the insurance giant's collapse during the 2008 financial crisis.
    What went wrong during the AIG financial crisis?
  3. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  4. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  5. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  6. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  7. For Students with Disabilities, Discrimination Starts Before They Even Enter School
    Public-school principals are less welcoming to prospective families with disabled children—particularly when they’re Black.
    child in wheelchair facing padlocked school doors
  8. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  9. Leaders, Don’t Be Afraid to Admit Your Flaws
    We prefer to work for people who can make themselves vulnerable, a new study finds. But there are limits.
    person removes mask to show less happy face
  10. Got a Niche Product to Sell? Augmented Reality Might Help.
    Letting customers “try out” products virtually can give customers the confidence to take the plunge.
    person testing virtual reality app on phone
  11. Take 5: How to Improve the Odds of Breakthrough Innovation
    Thorny problems demand novel solutions. Here’s what it takes to move beyond incremental tweaks.
    New invention sits on a shelf unused.
  12. Why Well-Meaning NGOs Sometimes Do More Harm than Good
    Studies of aid groups in Ghana and Uganda show why it’s so important to coordinate with local governments and institutions.
    To succeed, foreign aid and health programs need buy-in and coordination with local partners.
  13. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  14. How Peer Pressure Can Lead Teens to Underachieve—Even in Schools Where It’s “Cool to Be Smart”
    New research offers lessons for administrators hoping to improve student performance.
    Eager student raises hand while other student hesitates.
  15. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  16. How Much Do Campaign Ads Matter?
    Tone is key, according to new research, which found that a change in TV ad strategy could have altered the results of the 2000 presidential election.
    Political advertisements on television next to polling place
  17. Executive Presence Isn’t One-Size-Fits-All. Here’s How to Develop Yours.
    A professor and executive coach unpacks this seemingly elusive trait.
    woman standing confidently
  18. Take 5: How Fear Influences Our Decisions
    Our anxieties about the future can have surprising implications for our health, our family lives, and our careers.
    A CEO's risk aversion encourages underperformance.
Add Insight to your inbox.