Is Wikipedia Biased?
Skip to content
Strategy Economics Dec 1, 2012

Is Wikipedia Biased?

Verifying the “neutral point of view”

Based on the research of

Shane Greenstein

Feng Zhu

Listening: Interview with Shane Greenstein on Wikipedia

download
0:00 Skip back button Play Skip forward button 10:29

In the eleven years since its founding, the free crowdsourced encyclopedia Wikipedia has grown from a techno-utopian curiosity into an indispensable resource for millions of users. A 2005 study by the journal Nature showed that Wikipedia’s corpus of articles—now totaling more than four million, each written and edited by unpaid volunteers—is about as factually accurate, on average, as the Encyclopædia Britannica.

Add Insight
to your inbox.

We’ll send you one email a week with content you actually want to read, curated by the Insight team.

Main­tain­ing what Wikipedia calls the neu­tral point of view” (or NPOV) is rel­a­tive­ly easy when writ­ing about sci­ence top­ics or oth­er­wise objec­tive­ly ver­i­fi­able sub­jects. But in oth­er top­ics, such as pol­i­tics and his­to­ry, bias and con­tro­ver­sy inevitably arise.

The neu­tral point of view is much more of an arti­cle of faith in the way Wikipedia is orga­nized than a test­ed propo­si­tion,” says Shane Green­stein, a pro­fes­sor of man­age­ment and strat­e­gy at the Kel­logg School of Man­age­ment. But you can’t test it with­out first gen­er­at­ing a bench­mark for bias and slant.” So Green­stein and his co-author Feng Zhu, an assis­tant pro­fes­sor at the Uni­ver­si­ty of South­ern Cal­i­for­nia, applied a method orig­i­nal­ly designed to define polit­i­cal bias in print­ed news­pa­pers to set a quan­ti­ta­tive base­line for defin­ing favoritism on Wikipedia.

This tech­nique, cre­at­ed by Matthew Gentzkow and Jesse M. Shapiro of the Uni­ver­si­ty of Chica­go, sam­ples the 2005 Con­gres­sion­al Record for a list of 1,000 code phras­es” used dis­pro­por­tion­ate­ly by either Democ­rats or Repub­li­cans. The fre­quen­cy of these phras­es can then be used as a sig­nal for polit­i­cal bias when per­form­ing sta­tis­ti­cal analy­sis on large sets of news­pa­per arti­cles. Green­stein and Zhu are the first to apply this method to Wikipedia’s online repository.

At one lev­el it’s a mys­te­ri­ous black box and on anoth­er lev­el it’s total­ly obvi­ous,” Green­stein says. “‘Oba­macare,’ death pan­els,’ civ­il rights,’ ille­gal immi­gra­tion,’ estate tax­es’: these phras­es are used by the par­ties delib­er­ate­ly to appeal to their respec­tive con­stituents very specif­i­cal­ly. That’s what makes them such a great sig­nal for mea­sur­ing bias, because they come laden with so much pre­sumed slant.”

Sift­ing Through Wikipedia
But in order to do their analy­sis, Green­stein and Zhu first had to win­now Wikipedia’s six-ter­abyte col­lec­tion of arti­cles down to a man­age­able num­ber. They first searched for arti­cles con­tain­ing the words demo­c­rat” or repub­li­can,” which pro­duced a set of 111,216 arti­cles; they then fil­tered out entries con­cern­ing non-U.S. pol­i­tics, result­ing in a list of just over 70,000 arti­cles. Ana­lyz­ing these arti­cles for Gertzkow and Shapiro’s biased phras­es went pret­ty quick — about ten min­utes for a com­put­er pro­gram,” Green­stein says.

The authors found that vin­tages from ear­ly in Wikipedia’s his­to­ry dis­played a dis­tinct Demo­c­ra­t­ic slant.

Green­stein and Zhu’s results were lim­it­ed to what they refer to as vin­tages” — that is, the first ver­sion of arti­cles that appear on Wikipedia. In aggre­gate, this pro­vides a sta­t­ic snap­shot of the amount of bias present in Wikipedia’s first draft.” (Forth­com­ing stud­ies will exam­ine how this bias is affect­ed by Wikipedia’s ongo­ing revi­sion process.) The authors found that vin­tages from ear­ly in Wikipedia’s his­to­ry dis­played a dis­tinct Demo­c­ra­t­ic slant. Lat­er vin­tages were less slant­ed, mean­ing that the 70,000-article sam­ple exhib­it­ed, on aver­age, a drift” toward NPOV over the course of a decade.

An arti­cle born’ in 2002 turns out, on aver­age, to be very slant­ed — much more so than an arti­cle first entered in 2008,” Green­stein explains. These vin­tage effects’ pret­ty much dis­ap­pear after 2005, so it’s real­ly the ear­ly arti­cles that are heav­i­ly slanted.”

Green­stein and Zhu’s paper does not spec­u­late as to why this ear­ly-vin­tage bias exists, but Green­stein offers sev­er­al pos­si­ble expla­na­tions. One has to do with who was online in 2002 and 2003, par­tic­i­pat­ing in Wikipedia,” he says. There are obvi­ous bias­es among col­lege kids, who were online more intense­ly in that peri­od.” Broad­band Inter­net pen­e­tra­tion may also explain some of the bias in ear­ly vin­tages: Ear­ly broad­band users tend­ed to come from a spe­cif­ic edu­ca­tion group, again most­ly col­lege kids with fast on-cam­pus Inter­net con­nec­tions.” Or, Green­stein notes, per­haps it was just the luck of the draw that a group of high­ly opin­ion­at­ed Democ­rats were among the first to be con­tribut­ing to Wikipedia — per­haps because they were more inter­est­ed in open systems.”

A Bench­mark
What­ev­er the cause of Wikipedia’s polit­i­cal bias, Green­stein and Zhu’s results estab­lish a quan­ti­ta­tive bench­mark for exam­in­ing the pres­ence of that bias. But Green­stein cau­tions that apply­ing Gertzkow and Shapiro’s sta­tis­ti­cal mod­el to Wikipedia is not with­out its ambi­gu­i­ties. Unlike a news­pa­per cor­pus, which is made up of new, unique arti­cles every day that you can sam­ple mul­ti­ple times to deter­mine bias, on Wikipedia you see the same arti­cles over time,” he explains. So if you don’t find these code phras­es’ in an arti­cle, is it because there’s real­ly no bias, or is it because the Gertzkow and Shapiro method is unin­for­ma­tive in that instance?” Fol­low-up stud­ies have indi­cat­ed that the for­mer — that an absence of code phras­es means that the arti­cle is polit­i­cal­ly neu­tral — is like­ly to be the case, Green­stein says.

Green­stein and Zhu’s find­ings also sug­gest that while Wikipedia’s col­lec­tion of 70,000 arti­cles on U.S. pol­i­tics is, on aver­age, con­verg­ing over time toward NPOV, Wikipedia’s bot­tom-up” revi­sion process con­tributes only slight­ly to this out­come. Instead, the over­all drift toward NPOV has arisen from Wikipedia’s sheer growth: as new­er, less biased vin­tages (or vin­tages with an oppo­site polit­i­cal slant) began to out­num­ber the old­er ones, the corpus’s for­mer­ly Demo­c­ra­t­ic bias has been slow­ly aver­ag­ing out. But while Wikipedia’s polit­i­cal con­tent may be trend­ing toward neu­tral­i­ty in the aggre­gate, indi­vid­ual arti­cles may fall any­where along the spec­trum of polit­i­cal bias.

If a Wikipedia user was look­ing to get a com­plete, unbi­ased view on a par­tic­u­lar top­ic, it wouldn’t nec­es­sar­i­ly come from one arti­cle,” Green­stein explains. Most users do read more than one arti­cle, but how often do arti­cles with dif­fer­ent slants link to one anoth­er? That’s an open ques­tion moti­vat­ed by our research.”


Relat­ed read­ing on Kel­logg Insight

Why Broad­band Prices Haven’t Decreased: Cre­at­ing the first broad­band con­sumer price index

What Has the Inter­net Done for the Econ­o­my? The puz­zling spread of the com­mer­cial Inter­net could explain wage inequalities

All Pol­i­tics Is Cul­tur­al: Cul­tur­al not eco­nom­ic vocab­u­lar­ies sep­a­rate lib­er­als and conservatives

Relat­ed case

Green­stein, Shane and Michael Dev­ereux. 2009. Wikipedia in the Spot­light. Case 5306507 (KEL253).


Featured Faculty

Shane Greenstein

Member of the Strategy Department faculty until 2015

About the Writer

John Pavlus is a writer and filmmaker focusing on science, technology, and design topics. He lives in Brooklyn, New York.

About the Research

Greenstein, Shane, and Feng Zhu. 2012. “Is Wikipedia Biased?” American Economic Review 102(3): 343–348.

Read the original

Suggested For You

Most Popular

Organizations

How Are Black – White Bira­cial Peo­ple Per­ceived in Terms of Race?

Under­stand­ing the answer — and why black and white Amer­i­cans’ respons­es may dif­fer — is increas­ing­ly impor­tant in a mul­tira­cial society.

Careers

Don’t Let Com­pla­cen­cy Derail Your Career

How to hone your learn­ing agili­ty and take good risks.

Most Popular Podcasts

Careers

Pod­cast: Our Most Pop­u­lar Advice on Improv­ing Rela­tion­ships with Colleagues

Cowork­ers can make us crazy. Here’s how to han­dle tough situations.

Social Impact

Pod­cast: How You and Your Com­pa­ny Can Lend Exper­tise to a Non­prof­it in Need

Plus: Four ques­tions to con­sid­er before becom­ing a social-impact entrepreneur.

Careers

Pod­cast: Attract Rock­star Employ­ees — or Devel­op Your Own

Find­ing and nur­tur­ing high per­form­ers isn’t easy, but it pays off.

Marketing

Pod­cast: How Music Can Change Our Mood

A Broad­way song­writer and a mar­ket­ing pro­fes­sor dis­cuss the con­nec­tion between our favorite tunes and how they make us feel.