One, Two, Three Stats and More at the Old Ballgame
Skip to content
Marketing Jul 5, 2011

One, Two, Three Stats and More at the Old Ballgame

Identifying baseball’s best players and most reliable statistics

Based on the research of

James Piette

Shane T. Jensen

Blakeley B. McShane

Alexander Braunstein

Abraham J. Wyner

Listening: Interview with Blake McShane
0:00 Skip back button Play Skip forward button 24:48

The crack of the bat and thwack of the ball in a catcher’s mitt are two sounds that warm the hearts of statisticians across the United States. Long regarded as America’s pastime, baseball counts among its fans a group that is both passionate and mathematically inclined. Its members dissect every pitch, swing, and hit of every game on record.

Add Insight
to your inbox.

Known today as sabermetricians—after SABR, the acronym for the Society for American Baseball Research—they have produced an abundance of measures and models by which fans and managers can rate the performance of teams and individual players. Their enthusiasm is nothing new—the practice of recording the statistics of the game has been around since the 1800s—but the profusion of metrics in recent years is. Sabermetrics has become burdened under its own success. The number of ways that players’ skills are measured is staggering, and many have dubious predictive power. “It’s not like fifty years ago when pitching could be distilled down to something like earned run average, and hitting could be distilled down to batting average and number of home runs,” says Blake McShane, an assistant professor of marketing at the Kellogg School of Management and co-author of a series of papers on sabermetrics.

“Nowadays in the sabermetrics era, we have fifty metrics for everything,” McShane continues. “If your car’s dashboard had fifty things on it, you wouldn’t use it. The reason your dashboard is useful is because you only have three or four things on it. And not only that, these things are independent of one another.” For baseball managers, too many metrics can be too much of a good thing. Overburdened by information, they can have a difficult time spotting talent or fielding the right player at the right time.

A New Approach

To help separate the wheat from the chaff, McShane and his co-authors used a statistical approach known as Bayesian inference, which differs from more typical frequentist inference on a number of levels. In this case, Bayesian statistics has two significant advantages. First, it makes complex mathematical models far simpler to estimate, allowing McShane and his colleagues to use an iterative simulation technique to narrow in on the solutions. Second, it allows them to keep an eye on uncertainty as the data works its way through the model. When the results are spit out on the other end, McShane and his colleagues have an accurate picture of the error (or uncertainty) of their estimates.

Being able to track uncertainty in predictions is an advantage when trying to forecast individual players’ performance levels.

To see the importance of tracking the error, imagine a typical political poll. Results are often reported with a margin of error. If your candidate is up in the polls 53 percent to 47 percent and the margin of error is plus or minus 2 percent, then you can feel comfortable that he or she is winning the race. However, if that margin of error is plus or minus 10 percent, you have little reason to be excited. The errors of the estimates McShane and his colleagues produced are similarly important. The narrower the margin of error, the more certain they would be of their results.

Being able to track uncertainty in predictions is an advantage when trying to forecast individual players’ performance levels. In that case, it is important to get a good idea of not only a player’s general performance, but also how variable that performance is over time. McShane, along with Shane T. Jensen and Abraham J. Wyner, both associate professors at the Wharton School, set out to build a model that not only was accurate in its predictions of hitting performance, but also offered a window into player consistency. A player’s variability, it turns out, can be nearly as important as overall performance when it comes to building a roster.

“Suppose you’re a small market team,” McShane says. “You would love to hire someone like all-star Albert Pujols who, in terms of home run rate, is high and consistent. But you’re a small market team so you can’t afford him.” Under those circumstances, many teams would sign an above-average player who may be consistent but seldom has a stand-out season. It is a safe bet, if only because most models say nothing about a player’s variability. However, if numbers relating to variability are available, the team may decide to pick up a player who may play below average, but may sporadically rival top performers like Pujols.

“If you’re a small market team facing budget constraints, this model is going to allow you to do some interesting things with your roster,” McShane says. “Players with variability have an inherent ‘option value’ that allows management to trade off maximizing the number of wins or runs scored per season—what you get from an above-average consistent player—for maximizing the probability of special events like a playoff berth or even a World Series ring—what you might just get from a highly variable player if you’re lucky.”

Another key part of the new model is its ability to make predictions using publicly-available data. There have been, of course, many attempts at using accessible data before, though most have met with middling success. The current industry standard, PECOTA, uses piles of private data and requires gobs of fine tuning by hand. Its predictions are perhaps the most accurate available, but accessing them requires going through the model’s gatekeeper, Baseball Prospectus.

McShane, Jensen, and Wyner’s model not only offers key information on player consistency, its main predictions are completely automated and accurate enough to challenge PECOTA. Using one measure of error—mean absolute error—their model is the hands-down winner. Using another—root-mean-square error—PECOTA wins; the peculiarities of root-mean-square error means that McShane and his colleagues’ model makes a few big misses on a small number of players. For most players, their model outperforms PECOTA. Part of the key to that success is breaking the league’s players into two classes—elite and everyone else. That simple division “makes the model a lot more predictive,” McShane says, by as much as 30 percent.

Their model also excels where PECOTA falls furthest short—in predicting the performance of young players. Players under the age of 26 generally do not have long histories within the league, causing most models to choke when generating predictions for them. McShane and his colleagues’ model addresses this problem by incorporating information from a player’s position as well. Since different positions often dictate different player characteristics—first basemen tend to be tall, for example—this new approach can give accurate forecasts for relatively untested players—a real boon for managers.

Measuring the Metrics

McShane’s other two papers focus on selecting the most informative metrics, one for pitching and one for batting, two of the most important parts of the game. Both of these papers were co-authored with Jensen; Alexander Braunstein, a statistician formerly at Google who now works at a startup named Chomp; and James Piette, a doctoral candidate at the Wharton School.

Together, they took twenty pitching metrics and fifty hitting metrics and shuffled them through a series of Bayesian equations. The pitching analysis revealed that one set of metrics worked well for starters and another better suited relievers. Starters’ performance was best predicted by fielding independent pitching (a sophisticated way of removing non-pitching players’ defensive performance from a pitcher’s statistics), the number of home runs hit off them per nine innings, walks per nine innings, and earned run average (the shorthand for those metrics are FIP, HR/9, BB/9, and ERA, respectively). Relievers, on the other hand, were best described by the percentage of ground balls hit, percentage of fly balls hit, and strikeouts per nine innings (GB%, FB%, and K/9, respectively).

The pitching results, McShane says, are “very, very related to how the game is played in practice.” That different measures would describe starting pitchers and relievers would not surprise avid baseball fans. Starters are often a team’s workhorse pitchers, playing through six, even seven innings. But as games come down to the wire and the starters begin to tire, managers bring in their relief pitchers. “A lot of times you bring them in for special situations,” McShane points out. “You bring in a reliever to face maybe one or two batters, so it’s very important to know how to evaluate a reliever in such micro-level situations. A reliever who faces one or two batters needs to specialize in finding a way to getting those one or two out, and the typical methods are by strikeout, infield groundballs, or pop flies.” Starters, on the other hand, need to focus on more big-picture outcomes like preventing home runs or minimizing the number of players they walk.

Hitting also requires a certain strategy, but since lineups and batting orders are set before the first pitch is thrown, managers must plan carefully based on available data. McShane and his colleagues’ paper again recommends five metrics, the likes of which closely hew to the fundamentals of hitting. The first two—strike out rate (K/PA) and walk rate (BB/PA)—tell us something about how a player handles himself when at bat. Is he cool and collected? Is he apt to swing at a ball that is outside his strike zone? The next, isolated power (ISO), gives us a clue as to how many times a player hits a double or better, while speed (SPD) tells us how quickly he is able to traverse the bases. The last one, ground ball rate (GB/BIP), is another important offensive trait, since ground balls are harder to cleanly field than fly balls and more often result in a man on base.

A New Look

Together, these three papers could change the way baseball managers, front offices, and fans think about their teams. Rather than juggle fifty metrics or more in a quest to balance the batting order or sift through dozens of stats to pick a relief pitcher, they need only look at a handful. And McShane and his colleagues’ model could help teams build a pennant-ready roster. “Behind our models lie a notion of how baseball as a game works,” McShane says. “They allow us to tell a richer and more realistic story about the players and the game.”

Related reading on Kellogg Insight

Golf Lessons: Competing with superstars adversely affects performance

About the Writer
Tim De Chant was science writer and editor of Kellogg Insight between 2009 and 2012.
About the Research

Jensen, S. T., Blake McShane, and A. J. Wyner. 2009. “Hierarchical Bayesian Modeling of Hitting Performance in Baseball.” Bayesian Analysis. 4(4): 631-652.

Piette, J., A. Braunstein, Blake McShane, and S. T. Jensen. 2010. “A Point-Mass Mixture Random Effects Model for Pitching Metrics.” Journal of Quantitative Analysis in Sports. 6(3): Article 8.

McShane, Blake, A. Braunstein, J. Piette, and S. T. Jensen. 2011. A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics. Journal of Quantitative Analysis in Sports, 7(4): Article 2.

Most Popular This Week
  1. 3 Tips for Reinventing Your Career After a Layoff
    It’s crucial to reassess what you want to be doing instead of jumping at the first opportunity.
    woman standing confidently
  2. College Campuses Are Becoming More Diverse. But How Much Do Students from Different Backgrounds Actually Interact?
    Increasing diversity has been a key goal, “but far less attention is paid to what happens after we get people in the door.”
    College quad with students walking away from the center
  3. When Do Open Borders Make Economic Sense?
    A new study provides a window into the logic behind various immigration policies.
    How immigration affects the economy depends on taxation and worker skills.
  4. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  5. Podcast: Does Your Life Reflect What You Value?
    On this episode of The Insightful Leader, a former CEO explains how to organize your life around what really matters—instead of trying to do it all.
  6. 5 Ways to Improve Diversity Training, According to a New Study
    All too often, these programs are ineffective and short-lived. But they don’t have to be.
    diversity training session
  7. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  8. Your Team Doesn’t Need You to Be the Hero
    Too many leaders instinctively try to fix a crisis themselves. A U.S. Army colonel explains how to curb this tendency in yourself and allow your teams to flourish.
    person with red cape trying to put out fire while firefighters stand by.
  9. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  10. Podcast: China’s Economy Is in Flux. Here’s What American Businesses Need to Know.
    On this episode of The Insightful Leader: the end of “Zero Covid,” escalating geopolitical tensions, and China’s potentially irreplaceable role in the global supply chain.
  11. What Went Wrong at AIG?
    Unpacking the insurance giant's collapse during the 2008 financial crisis.
    What went wrong during the AIG financial crisis?
  12. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  13. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  14. Why Well-Meaning NGOs Sometimes Do More Harm than Good
    Studies of aid groups in Ghana and Uganda show why it’s so important to coordinate with local governments and institutions.
    To succeed, foreign aid and health programs need buy-in and coordination with local partners.
  15. How Much Do Campaign Ads Matter?
    Tone is key, according to new research, which found that a change in TV ad strategy could have altered the results of the 2000 presidential election.
    Political advertisements on television next to polling place
  16. How Experts Make Complex Decisions
    By studying 200 million chess moves, researchers shed light on what gives players an advantage—and what trips them up.
    two people playing chess
  17. Jeff Ubben Explains His “Anti-ESG ESG” Investment Strategy
    In a recent conversation with Kellogg’s Robert Korajczyk, the hedge-fund leader breaks down his unique approach to mission-driven investing.
    smokestacks, wind turbine, solar panel
  18. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
More in Marketing