In the eleven years since its founding, the free crowdsourced encyclopedia Wikipedia has grown from a techno-utopian curiosity into an indispensable resource for millions of users. A 2005 study by the journal Nature showed that Wikipedia’s corpus of articles—now totaling more than four million, each written and edited by unpaid volunteers—is about as factually accurate, on average, as the Encyclopædia Britannica.
Maintaining what Wikipedia calls “the neutral point of view” (or NPOV) is relatively easy when writing about science topics or otherwise objectively verifiable subjects. But in other topics, such as politics and history, bias and controversy inevitably arise.
“The neutral point of view is much more of an article of faith in the way Wikipedia is organized than a tested proposition,” says Shane Greenstein, a professor of management and strategy at the Kellogg School of Management. “But you can’t test it without first generating a benchmark for bias and slant.” So Greenstein and his co-author Feng Zhu, an assistant professor at the University of Southern California, applied a method originally designed to define political bias in printed newspapers to set a quantitative baseline for defining favoritism on Wikipedia.
This technique, created by Matthew Gentzkow and Jesse M. Shapiro of the University of Chicago, samples the 2005 Congressional Record for a list of 1,000 “code phrases” used disproportionately by either Democrats or Republicans. The frequency of these phrases can then be used as a signal for political bias when performing statistical analysis on large sets of newspaper articles. Greenstein and Zhu are the first to apply this method to Wikipedia’s online repository.
“At one level it’s a mysterious black box and on another level it’s totally obvious,” Greenstein says. “‘Obamacare,’ ‘death panels,’ ‘civil rights,’ ‘illegal immigration,’ ‘estate taxes’: these phrases are used by the parties deliberately to appeal to their respective constituents very specifically. That’s what makes them such a great signal for measuring bias, because they come laden with so much presumed slant.”
Sifting Through Wikipedia
But in order to do their analysis, Greenstein and Zhu first had to winnow Wikipedia’s six-terabyte collection of articles down to a manageable number. They first searched for articles containing the words “democrat” or “republican,” which produced a set of 111,216 articles; they then filtered out entries concerning non-U.S. politics, resulting in a list of just over 70,000 articles. Analyzing these articles for Gertzkow and Shapiro’s biased phrases “went pretty quick—about ten minutes for a computer program,” Greenstein says.
The authors found that vintages from early in Wikipedia’s history displayed a distinct Democratic slant.
Greenstein and Zhu’s results were limited to what they refer to as “vintages”—that is, the first version of articles that appear on Wikipedia. In aggregate, this provides a static snapshot of the amount of bias present in Wikipedia’s “first draft.” (Forthcoming studies will examine how this bias is affected by Wikipedia’s ongoing revision process.) The authors found that vintages from early in Wikipedia’s history displayed a distinct Democratic slant. Later vintages were less slanted, meaning that the 70,000-article sample exhibited, on average, a “drift” toward NPOV over the course of a decade.
“An article ‘born’ in 2002 turns out, on average, to be very slanted—much more so than an article first entered in 2008,” Greenstein explains. “These ‘vintage effects’ pretty much disappear after 2005, so it’s really the early articles that are heavily slanted.”
Greenstein and Zhu’s paper does not speculate as to why this early-vintage bias exists, but Greenstein offers several possible explanations. “One has to do with who was online in 2002 and 2003, participating in Wikipedia,” he says. “There are obvious biases among college kids, who were online more intensely in that period.” Broadband Internet penetration may also explain some of the bias in early vintages: “Early broadband users tended to come from a specific education group, again mostly college kids with fast on-campus Internet connections.” Or, Greenstein notes, “perhaps it was just the luck of the draw that a group of highly opinionated Democrats were among the first to be contributing to Wikipedia—perhaps because they were more interested in open systems.”
Whatever the cause of Wikipedia’s political bias, Greenstein and Zhu’s results establish a quantitative benchmark for examining the presence of that bias. But Greenstein cautions that applying Gertzkow and Shapiro’s statistical model to Wikipedia is not without its ambiguities. “Unlike a newspaper corpus, which is made up of new, unique articles every day that you can sample multiple times to determine bias, on Wikipedia you see the same articles over time,” he explains. “So if you don’t find these ‘code phrases’ in an article, is it because there’s really no bias, or is it because the Gertzkow and Shapiro method is uninformative in that instance?” Follow-up studies have indicated that the former—that an absence of code phrases means that the article is politically neutral—is likely to be the case, Greenstein says.
Greenstein and Zhu’s findings also suggest that while Wikipedia’s collection of 70,000 articles on U.S. politics is, on average, converging over time toward NPOV, Wikipedia’s “bottom-up” revision process contributes only slightly to this outcome. Instead, the overall drift toward NPOV has arisen from Wikipedia’s sheer growth: as newer, less biased vintages (or vintages with an opposite political slant) began to outnumber the older ones, the corpus’s formerly Democratic bias has been slowly averaging out. But while Wikipedia’s political content may be trending toward neutrality in the aggregate, individual articles may fall anywhere along the spectrum of political bias.
“If a Wikipedia user was looking to get a complete, unbiased view on a particular topic, it wouldn’t necessarily come from one article,” Greenstein explains. “Most users do read more than one article, but how often do articles with different slants link to one another? That’s an open question motivated by our research.”
Related reading on Kellogg Insight
Why Broadband Prices Haven’t Decreased: Creating the first broadband consumer price index
What Has the Internet Done for the Economy? The puzzling spread of the commercial Internet could explain wage inequalities
All Politics Is Cultural: Cultural not economic vocabularies separate liberals and conservatives
Greenstein, Shane and Michael Devereux. 2009. Wikipedia in the Spotlight. Case 5-306-507 (KEL253).
Greenstein, Shane, and Feng Zhu. 2012. “Is Wikipedia Biased?” American Economic Review 102(3): 343–348.
Understanding the Pandemic’s Lasting Impact on Real EstateWork-from-home has stuck around. What does this mean for residential and commercial real-estate markets?
What Went Wrong at AIG?Unpacking the insurance giant's collapse during the 2008 financial crisis.
Will AI Eventually Replace Doctors?Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
How Are Black–White Biracial People Perceived in Terms of Race?Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
Which Form of Government Is Best?Democracies may not outlast dictatorships, but they adapt better.
What Happens to Worker Productivity after a Minimum Wage Increase?A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
For Students with Disabilities, Discrimination Starts Before They Even Enter SchoolPublic-school principals are less welcoming to prospective families with disabled children—particularly when they’re Black.
Why Do Some People Succeed after Failing, While Others Continue to Flounder?A new study dispels some of the mystery behind success after failure.
Leaders, Don’t Be Afraid to Admit Your FlawsWe prefer to work for people who can make themselves vulnerable, a new study finds. But there are limits.
Got a Niche Product to Sell? Augmented Reality Might Help.Letting customers “try out” products virtually can give customers the confidence to take the plunge.
Take 5: How to Improve the Odds of Breakthrough InnovationThorny problems demand novel solutions. Here’s what it takes to move beyond incremental tweaks.
Why Well-Meaning NGOs Sometimes Do More Harm than GoodStudies of aid groups in Ghana and Uganda show why it’s so important to coordinate with local governments and institutions.
How Has Marketing Changed over the Past Half-Century?Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
How Peer Pressure Can Lead Teens to Underachieve—Even in Schools Where It’s “Cool to Be Smart”New research offers lessons for administrators hoping to improve student performance.
Immigrants to the U.S. Create More Jobs than They TakeA new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
How Much Do Campaign Ads Matter?Tone is key, according to new research, which found that a change in TV ad strategy could have altered the results of the 2000 presidential election.
Executive Presence Isn’t One-Size-Fits-All. Here’s How to Develop Yours.A professor and executive coach unpacks this seemingly elusive trait.
Take 5: How Fear Influences Our DecisionsOur anxieties about the future can have surprising implications for our health, our family lives, and our careers.