One, Two, Three Stats and More at the Old Ballgame
Skip to content
Marketing Jul 5, 2011

One, Two, Three Stats and More at the Old Ballgame

Identifying baseball’s best players and most reliable statistics

Based on the research of

James Piette

Shane T. Jensen

Blakeley B. McShane

Alexander Braunstein

Abraham J. Wyner

Listening: Interview with Blake McShane
download
0:00 Skip back button Play Skip forward button 24:48

The crack of the bat and thwack of the ball in a catcher’s mitt are two sounds that warm the hearts of statisticians across the United States. Long regarded as America’s pastime, baseball counts among its fans a group that is both passionate and mathematically inclined. Its members dissect every pitch, swing, and hit of every game on record.

Known today as sabermetricians—after SABR, the acronym for the Society for American Baseball Research—they have produced an abundance of measures and models by which fans and managers can rate the performance of teams and individual players. Their enthusiasm is nothing new—the practice of recording the statistics of the game has been around since the 1800s—but the profusion of metrics in recent years is. Sabermetrics has become burdened under its own success. The number of ways that players’ skills are measured is staggering, and many have dubious predictive power. “It’s not like fifty years ago when pitching could be distilled down to something like earned run average, and hitting could be distilled down to batting average and number of home runs,” says Blake McShane, an assistant professor of marketing at the Kellogg School of Management and co-author of a series of papers on sabermetrics.

“Nowadays in the sabermetrics era, we have fifty metrics for everything,” McShane continues. “If your car’s dashboard had fifty things on it, you wouldn’t use it. The reason your dashboard is useful is because you only have three or four things on it. And not only that, these things are independent of one another.” For baseball managers, too many metrics can be too much of a good thing. Overburdened by information, they can have a difficult time spotting talent or fielding the right player at the right time.

A New Approach

To help separate the wheat from the chaff, McShane and his co-authors used a statistical approach known as Bayesian inference, which differs from more typical frequentist inference on a number of levels. In this case, Bayesian statistics has two significant advantages. First, it makes complex mathematical models far simpler to estimate, allowing McShane and his colleagues to use an iterative simulation technique to narrow in on the solutions. Second, it allows them to keep an eye on uncertainty as the data works its way through the model. When the results are spit out on the other end, McShane and his colleagues have an accurate picture of the error (or uncertainty) of their estimates.

Being able to track uncertainty in predictions is an advantage when trying to forecast individual players’ performance levels.

To see the importance of tracking the error, imagine a typical political poll. Results are often reported with a margin of error. If your candidate is up in the polls 53 percent to 47 percent and the margin of error is plus or minus 2 percent, then you can feel comfortable that he or she is winning the race. However, if that margin of error is plus or minus 10 percent, you have little reason to be excited. The errors of the estimates McShane and his colleagues produced are similarly important. The narrower the margin of error, the more certain they would be of their results.

Being able to track uncertainty in predictions is an advantage when trying to forecast individual players’ performance levels. In that case, it is important to get a good idea of not only a player’s general performance, but also how variable that performance is over time. McShane, along with Shane T. Jensen and Abraham J. Wyner, both associate professors at the Wharton School, set out to build a model that not only was accurate in its predictions of hitting performance, but also offered a window into player consistency. A player’s variability, it turns out, can be nearly as important as overall performance when it comes to building a roster.

“Suppose you’re a small market team,” McShane says. “You would love to hire someone like all-star Albert Pujols who, in terms of home run rate, is high and consistent. But you’re a small market team so you can’t afford him.” Under those circumstances, many teams would sign an above-average player who may be consistent but seldom has a stand-out season. It is a safe bet, if only because most models say nothing about a player’s variability. However, if numbers relating to variability are available, the team may decide to pick up a player who may play below average, but may sporadically rival top performers like Pujols.

“If you’re a small market team facing budget constraints, this model is going to allow you to do some interesting things with your roster,” McShane says. “Players with variability have an inherent ‘option value’ that allows management to trade off maximizing the number of wins or runs scored per season—what you get from an above-average consistent player—for maximizing the probability of special events like a playoff berth or even a World Series ring—what you might just get from a highly variable player if you’re lucky.”

Another key part of the new model is its ability to make predictions using publicly-available data. There have been, of course, many attempts at using accessible data before, though most have met with middling success. The current industry standard, PECOTA, uses piles of private data and requires gobs of fine tuning by hand. Its predictions are perhaps the most accurate available, but accessing them requires going through the model’s gatekeeper, Baseball Prospectus.

McShane, Jensen, and Wyner’s model not only offers key information on player consistency, its main predictions are completely automated and accurate enough to challenge PECOTA. Using one measure of error—mean absolute error—their model is the hands-down winner. Using another—root-mean-square error—PECOTA wins; the peculiarities of root-mean-square error means that McShane and his colleagues’ model makes a few big misses on a small number of players. For most players, their model outperforms PECOTA. Part of the key to that success is breaking the league’s players into two classes—elite and everyone else. That simple division “makes the model a lot more predictive,” McShane says, by as much as 30 percent.

Their model also excels where PECOTA falls furthest short—in predicting the performance of young players. Players under the age of 26 generally do not have long histories within the league, causing most models to choke when generating predictions for them. McShane and his colleagues’ model addresses this problem by incorporating information from a player’s position as well. Since different positions often dictate different player characteristics—first basemen tend to be tall, for example—this new approach can give accurate forecasts for relatively untested players—a real boon for managers.

Measuring the Metrics

McShane’s other two papers focus on selecting the most informative metrics, one for pitching and one for batting, two of the most important parts of the game. Both of these papers were co-authored with Jensen; Alexander Braunstein, a statistician formerly at Google who now works at a startup named Chomp; and James Piette, a doctoral candidate at the Wharton School.

Together, they took twenty pitching metrics and fifty hitting metrics and shuffled them through a series of Bayesian equations. The pitching analysis revealed that one set of metrics worked well for starters and another better suited relievers. Starters’ performance was best predicted by fielding independent pitching (a sophisticated way of removing non-pitching players’ defensive performance from a pitcher’s statistics), the number of home runs hit off them per nine innings, walks per nine innings, and earned run average (the shorthand for those metrics are FIP, HR/9, BB/9, and ERA, respectively). Relievers, on the other hand, were best described by the percentage of ground balls hit, percentage of fly balls hit, and strikeouts per nine innings (GB%, FB%, and K/9, respectively).

The pitching results, McShane says, are “very, very related to how the game is played in practice.” That different measures would describe starting pitchers and relievers would not surprise avid baseball fans. Starters are often a team’s workhorse pitchers, playing through six, even seven innings. But as games come down to the wire and the starters begin to tire, managers bring in their relief pitchers. “A lot of times you bring them in for special situations,” McShane points out. “You bring in a reliever to face maybe one or two batters, so it’s very important to know how to evaluate a reliever in such micro-level situations. A reliever who faces one or two batters needs to specialize in finding a way to getting those one or two out, and the typical methods are by strikeout, infield groundballs, or pop flies.” Starters, on the other hand, need to focus on more big-picture outcomes like preventing home runs or minimizing the number of players they walk.

Hitting also requires a certain strategy, but since lineups and batting orders are set before the first pitch is thrown, managers must plan carefully based on available data. McShane and his colleagues’ paper again recommends five metrics, the likes of which closely hew to the fundamentals of hitting. The first two—strike out rate (K/PA) and walk rate (BB/PA)—tell us something about how a player handles himself when at bat. Is he cool and collected? Is he apt to swing at a ball that is outside his strike zone? The next, isolated power (ISO), gives us a clue as to how many times a player hits a double or better, while speed (SPD) tells us how quickly he is able to traverse the bases. The last one, ground ball rate (GB/BIP), is another important offensive trait, since ground balls are harder to cleanly field than fly balls and more often result in a man on base.

A New Look

Together, these three papers could change the way baseball managers, front offices, and fans think about their teams. Rather than juggle fifty metrics or more in a quest to balance the batting order or sift through dozens of stats to pick a relief pitcher, they need only look at a handful. And McShane and his colleagues’ model could help teams build a pennant-ready roster. “Behind our models lie a notion of how baseball as a game works,” McShane says. “They allow us to tell a richer and more realistic story about the players and the game.”

Related reading on Kellogg Insight

Golf Lessons: Competing with superstars adversely affects performance

About the Writer
Tim De Chant was science writer and editor of Kellogg Insight between 2009 and 2012.
About the Research

Jensen, S. T., Blake McShane, and A. J. Wyner. 2009. “Hierarchical Bayesian Modeling of Hitting Performance in Baseball.” Bayesian Analysis. 4(4): 631-652.

Piette, J., A. Braunstein, Blake McShane, and S. T. Jensen. 2010. “A Point-Mass Mixture Random Effects Model for Pitching Metrics.” Journal of Quantitative Analysis in Sports. 6(3): Article 8.

McShane, Blake, A. Braunstein, J. Piette, and S. T. Jensen. 2011. A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics. Journal of Quantitative Analysis in Sports, 7(4): Article 2.

Add Insight to your inbox.
More in Marketing