One, Two, Three Stats and More at the Old Ballgame
Skip to content
Marketing Jul 5, 2011

One, Two, Three Stats and More at the Old Ballgame

Iden­ti­fy­ing baseball’s best play­ers and most reli­able statistics

Based on the research of

James Piette

Shane T. Jensen

Blakeley B. McShane

Alexander Braunstein

Abraham J. Wyner

Listening: Interview with Blake McShane

0:00 Skip back button Play Skip forward button 24:48

The crack of the bat and thwack of the ball in a catcher’s mitt are two sounds that warm the hearts of sta­tis­ti­cians across the Unit­ed States. Long regard­ed as America’s pas­time, base­ball counts among its fans a group that is both pas­sion­ate and math­e­mat­i­cal­ly inclined. Its mem­bers dis­sect every pitch, swing, and hit of every game on record. 

Add Insight
to your inbox.

We’ll send you one email a week with content you actually want to read, curated by the Insight team.

Known today as saber­me­tri­cians — after SABR, the acronym for the Soci­ety for Amer­i­can Base­ball Research — they have pro­duced an abun­dance of mea­sures and mod­els by which fans and man­agers can rate the per­for­mance of teams and indi­vid­ual play­ers. Their enthu­si­asm is noth­ing new — the prac­tice of record­ing the sta­tis­tics of the game has been around since the 1800s — but the pro­fu­sion of met­rics in recent years is. Saber­met­rics has become bur­dened under its own suc­cess. The num­ber of ways that play­ers’ skills are mea­sured is stag­ger­ing, and many have dubi­ous pre­dic­tive pow­er. It’s not like fifty years ago when pitch­ing could be dis­tilled down to some­thing like earned run aver­age, and hit­ting could be dis­tilled down to bat­ting aver­age and num­ber of home runs,” says Blake McShane, an assis­tant pro­fes­sor of mar­ket­ing at the Kel­logg School of Man­age­ment and co-author of a series of papers on sabermetrics.

Nowa­days in the saber­met­rics era, we have fifty met­rics for every­thing,” McShane con­tin­ues. If your car’s dash­board had fifty things on it, you wouldn’t use it. The rea­son your dash­board is use­ful is because you only have three or four things on it. And not only that, these things are inde­pen­dent of one anoth­er.” For base­ball man­agers, too many met­rics can be too much of a good thing. Over­bur­dened by infor­ma­tion, they can have a dif­fi­cult time spot­ting tal­ent or field­ing the right play­er at the right time.

A New Approach

To help sep­a­rate the wheat from the chaff, McShane and his co-authors used a sta­tis­ti­cal approach known as Bayesian infer­ence, which dif­fers from more typ­i­cal fre­quen­tist infer­ence on a num­ber of lev­els. In this case, Bayesian sta­tis­tics has two sig­nif­i­cant advan­tages. First, it makes com­plex math­e­mat­i­cal mod­els far sim­pler to esti­mate, allow­ing McShane and his col­leagues to use an iter­a­tive sim­u­la­tion tech­nique to nar­row in on the solu­tions. Sec­ond, it allows them to keep an eye on uncer­tain­ty as the data works its way through the mod­el. When the results are spit out on the oth­er end, McShane and his col­leagues have an accu­rate pic­ture of the error (or uncer­tain­ty) of their estimates.

Being able to track uncer­tain­ty in pre­dic­tions is an advan­tage when try­ing to fore­cast indi­vid­ual play­ers’ per­for­mance levels.

To see the impor­tance of track­ing the error, imag­ine a typ­i­cal polit­i­cal poll. Results are often report­ed with a mar­gin of error. If your can­di­date is up in the polls 53 per­cent to 47 per­cent and the mar­gin of error is plus or minus 2 per­cent, then you can feel com­fort­able that he or she is win­ning the race. How­ev­er, if that mar­gin of error is plus or minus 10 per­cent, you have lit­tle rea­son to be excit­ed. The errors of the esti­mates McShane and his col­leagues pro­duced are sim­i­lar­ly impor­tant. The nar­row­er the mar­gin of error, the more cer­tain they would be of their results.

Being able to track uncer­tain­ty in pre­dic­tions is an advan­tage when try­ing to fore­cast indi­vid­ual play­ers’ per­for­mance lev­els. In that case, it is impor­tant to get a good idea of not only a player’s gen­er­al per­for­mance, but also how vari­able that per­for­mance is over time. McShane, along with Shane T. Jensen and Abra­ham J. Wyn­er, both asso­ciate pro­fes­sors at the Whar­ton School, set out to build a mod­el that not only was accu­rate in its pre­dic­tions of hit­ting per­for­mance, but also offered a win­dow into play­er con­sis­ten­cy. A player’s vari­abil­i­ty, it turns out, can be near­ly as impor­tant as over­all per­for­mance when it comes to build­ing a roster.

Sup­pose you’re a small mar­ket team,” McShane says. You would love to hire some­one like all-star Albert Pujols who, in terms of home run rate, is high and con­sis­tent. But you’re a small mar­ket team so you can’t afford him.” Under those cir­cum­stances, many teams would sign an above-aver­age play­er who may be con­sis­tent but sel­dom has a stand-out sea­son. It is a safe bet, if only because most mod­els say noth­ing about a player’s vari­abil­i­ty. How­ev­er, if num­bers relat­ing to vari­abil­i­ty are avail­able, the team may decide to pick up a play­er who may play below aver­age, but may spo­rad­i­cal­ly rival top per­form­ers like Pujols.

If you’re a small mar­ket team fac­ing bud­get con­straints, this mod­el is going to allow you to do some inter­est­ing things with your ros­ter,” McShane says. Play­ers with vari­abil­i­ty have an inher­ent option val­ue’ that allows man­age­ment to trade off max­i­miz­ing the num­ber of wins or runs scored per sea­son — what you get from an above-aver­age con­sis­tent play­er — for max­i­miz­ing the prob­a­bil­i­ty of spe­cial events like a play­off berth or even a World Series ring — what you might just get from a high­ly vari­able play­er if you’re lucky.”

Anoth­er key part of the new mod­el is its abil­i­ty to make pre­dic­tions using pub­licly-avail­able data. There have been, of course, many attempts at using acces­si­ble data before, though most have met with mid­dling suc­cess. The cur­rent indus­try stan­dard, PECO­TA, uses piles of pri­vate data and requires gobs of fine tun­ing by hand. Its pre­dic­tions are per­haps the most accu­rate avail­able, but access­ing them requires going through the model’s gate­keep­er, Base­ball Prospectus.

McShane, Jensen, and Wyner’s mod­el not only offers key infor­ma­tion on play­er con­sis­ten­cy, its main pre­dic­tions are com­plete­ly auto­mat­ed and accu­rate enough to chal­lenge PECO­TA. Using one mea­sure of error — mean absolute error — their mod­el is the hands-down win­ner. Using anoth­er — root-mean-square error — PECO­TA wins; the pecu­liar­i­ties of root-mean-square error means that McShane and his col­leagues’ mod­el makes a few big miss­es on a small num­ber of play­ers. For most play­ers, their mod­el out­per­forms PECO­TA. Part of the key to that suc­cess is break­ing the league’s play­ers into two class­es — elite and every­one else. That sim­ple divi­sion makes the mod­el a lot more pre­dic­tive,” McShane says, by as much as 30 percent.

Their mod­el also excels where PECO­TA falls fur­thest short — in pre­dict­ing the per­for­mance of young play­ers. Play­ers under the age of 26 gen­er­al­ly do not have long his­to­ries with­in the league, caus­ing most mod­els to choke when gen­er­at­ing pre­dic­tions for them. McShane and his col­leagues’ mod­el address­es this prob­lem by incor­po­rat­ing infor­ma­tion from a player’s posi­tion as well. Since dif­fer­ent posi­tions often dic­tate dif­fer­ent play­er char­ac­ter­is­tics — first base­men tend to be tall, for exam­ple — this new approach can give accu­rate fore­casts for rel­a­tive­ly untest­ed play­ers — a real boon for managers.

Mea­sur­ing the Met­rics

McShane’s oth­er two papers focus on select­ing the most infor­ma­tive met­rics, one for pitch­ing and one for bat­ting, two of the most impor­tant parts of the game. Both of these papers were co-authored with Jensen; Alexan­der Braun­stein, a sta­tis­ti­cian for­mer­ly at Google who now works at a start­up named Chomp; and James Piette, a doc­tor­al can­di­date at the Whar­ton School.

Togeth­er, they took twen­ty pitch­ing met­rics and fifty hit­ting met­rics and shuf­fled them through a series of Bayesian equa­tions. The pitch­ing analy­sis revealed that one set of met­rics worked well for starters and anoth­er bet­ter suit­ed reliev­ers. Starters’ per­for­mance was best pre­dict­ed by field­ing inde­pen­dent pitch­ing (a sophis­ti­cat­ed way of remov­ing non-pitch­ing play­ers’ defen­sive per­for­mance from a pitcher’s sta­tis­tics), the num­ber of home runs hit off them per nine innings, walks per nine innings, and earned run aver­age (the short­hand for those met­rics are FIP, HR/9, BB/9, and ERA, respec­tive­ly). Reliev­ers, on the oth­er hand, were best described by the per­cent­age of ground balls hit, per­cent­age of fly balls hit, and strike­outs per nine innings (GB%, FB%, and K/9, respectively).

The pitch­ing results, McShane says, are very, very relat­ed to how the game is played in prac­tice.” That dif­fer­ent mea­sures would describe start­ing pitch­ers and reliev­ers would not sur­prise avid base­ball fans. Starters are often a team’s work­horse pitch­ers, play­ing through six, even sev­en innings. But as games come down to the wire and the starters begin to tire, man­agers bring in their relief pitch­ers. A lot of times you bring them in for spe­cial sit­u­a­tions,” McShane points out. You bring in a reliev­er to face maybe one or two bat­ters, so it’s very impor­tant to know how to eval­u­ate a reliev­er in such micro-lev­el sit­u­a­tions. A reliev­er who faces one or two bat­ters needs to spe­cial­ize in find­ing a way to get­ting those one or two out, and the typ­i­cal meth­ods are by strike­out, infield ground­balls, or pop flies.” Starters, on the oth­er hand, need to focus on more big-pic­ture out­comes like pre­vent­ing home runs or min­i­miz­ing the num­ber of play­ers they walk.

Hit­ting also requires a cer­tain strat­e­gy, but since line­ups and bat­ting orders are set before the first pitch is thrown, man­agers must plan care­ful­ly based on avail­able data. McShane and his col­leagues’ paper again rec­om­mends five met­rics, the likes of which close­ly hew to the fun­da­men­tals of hit­ting. The first two — strike out rate (K/PA) and walk rate (BB/PA) — tell us some­thing about how a play­er han­dles him­self when at bat. Is he cool and col­lect­ed? Is he apt to swing at a ball that is out­side his strike zone? The next, iso­lat­ed pow­er (ISO), gives us a clue as to how many times a play­er hits a dou­ble or bet­ter, while speed (SPD) tells us how quick­ly he is able to tra­verse the bases. The last one, ground ball rate (GB/BIP), is anoth­er impor­tant offen­sive trait, since ground balls are hard­er to clean­ly field than fly balls and more often result in a man on base.

A New Look

Togeth­er, these three papers could change the way base­ball man­agers, front offices, and fans think about their teams. Rather than jug­gle fifty met­rics or more in a quest to bal­ance the bat­ting order or sift through dozens of stats to pick a relief pitch­er, they need only look at a hand­ful. And McShane and his col­leagues’ mod­el could help teams build a pen­nant-ready ros­ter. Behind our mod­els lie a notion of how base­ball as a game works,” McShane says. They allow us to tell a rich­er and more real­is­tic sto­ry about the play­ers and the game.”

Relat­ed read­ing on Kel­logg Insight

Golf Lessons: Com­pet­ing with super­stars adverse­ly affects performance

About the Writer

Tim De Chant was science writer and editor of Kellogg Insight between 2009 and 2012.

About the Research

Jensen, S. T., Blake McShane, and A. J. Wyner. 2009. “Hierarchical Bayesian Modeling of Hitting Performance in Baseball.” Bayesian Analysis. 4(4): 631-652.

Piette, J., A. Braunstein, Blake McShane, and S. T. Jensen. 2010. “A Point-Mass Mixture Random Effects Model for Pitching Metrics.” Journal of Quantitative Analysis in Sports. 6(3): Article 8.

McShane, Blake, A. Braunstein, J. Piette, and S. T. Jensen. 2011. A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics. Journal of Quantitative Analysis in Sports, 7(4): Article 2.

Suggested For You

Most Popular


How Are Black – White Bira­cial Peo­ple Per­ceived in Terms of Race?

Under­stand­ing the answer — and why black and white Amer­i­cans’ respons­es may dif­fer — is increas­ing­ly impor­tant in a mul­tira­cial society.


Pod­cast: Our Most Pop­u­lar Advice on Advanc­ing Your Career

Here’s how to con­nect with head­hunters, deliv­er with data, and ensure you don’t plateau professionally.

Most Popular Podcasts


Pod­cast: Our Most Pop­u­lar Advice on Improv­ing Rela­tion­ships with Colleagues

Cowork­ers can make us crazy. Here’s how to han­dle tough situations.

Social Impact

Pod­cast: How You and Your Com­pa­ny Can Lend Exper­tise to a Non­prof­it in Need

Plus: Four ques­tions to con­sid­er before becom­ing a social-impact entrepreneur.


Pod­cast: Attract Rock­star Employ­ees — or Devel­op Your Own

Find­ing and nur­tur­ing high per­form­ers isn’t easy, but it pays off.


Pod­cast: How Music Can Change Our Mood

A Broad­way song­writer and a mar­ket­ing pro­fes­sor dis­cuss the con­nec­tion between our favorite tunes and how they make us feel.