Should I Bring an Umbrella?
Skip to content
Economics Strategy Nov 1, 2011

Should I Bring an Umbrella?

Rating forecasters is difficult, but not impossible

Based on the research of

Dean Foster

Rakesh Vohra

Listening: Interview with Rakesh Vohra on Forecasting
0:00 Skip back button Play Skip forward button 17:38

Everyone has second-guessed the weatherman at least once. Questioning the wisdom of the daily weather forecast interpretations is a time-honored tradition in some parts, but our liberal interpretations of accuracy are not fair measures of meteorologists’ true predictive abilities. For that, we turn to game theorists like Rakesh Vohra.

Add Insight
to your inbox.

Vohra is a professor of managerial economics and decision sciences at the Kellogg School of Management, and one of his specialties is assessing such prognostications, whether they be in meteorology or the bond markets. He and Dean Foster, a professor at the Wharton School, sought a robust way to tell whether someone is making accurate forecasts based on knowledge, or whether their apparent prescience is just plain luck. It turns out there is more to judging forecasts than sticking your head out the window to see if it is raining.

“If I ask you to predict yes or no whether it’s going to rain, it’s obvious to check whether you made a mistake,” Vohra says. “But if I ask you to predict what’s the probability of rain, it’s not obvious how to check whether you’ve made a mistake, right, because if you say there’s a 30 percent chance of rain and it rains, that’s not wrong, right?”

While the weather may be the most straightforward and accessible example, there are myriad other examples of situations where assessing the accuracy of a forecast is important. “There are a variety of people who provide probability forecasts,” Vohra continues. “Meteorologists are the ones that everyone is familiar with, but if you go to prediction markets, which are very popular these days, the securities that they trade, the price at which these securities trade are to be interpreted as probabilities.”

A Flawed Measure

The most popular method for judging the accuracy of a probability forecast is based on calibration. This is one of the ways the National Weather Service judges its meteorologists. They look at all of the days on which a meteorologist forecasts a 30 percent chance of rain. If on those days it rained about 30 percent of the time, then the forecaster is said to be well calibrated on those days. Saying that a meteorologist is well calibrated overall means his or her predictions match with the probability estimates on most days.

Such calibration is intuitively appealing, but flawed. To demonstrate this, Vohra and Foster came up with a forecasting algorithm that was guaranteed to be well calibrated in all environments. In other words, they showed that anyone can generate probability forecasts of rain that are well calibrated without knowing anything about weather. To see where the problem lies, imagine a meteorologist who says there will be a 30 percent chance of rain for each of the next ten days. Let’s say it rained three days of the ten. Under calibration, then the meteorologist would be judged 100 percent accurate.

Now say another meteorologist forecast a 90 percent chance of rain for the three days it rained and a 10 percent chance for the seven it did not. Under the calibration criterion, this person’s forecasting ability would be considered worse despite being more accurate than that of the person who predicted a 30 percent chance of rain every day. “That says maybe the problem is calibration,” Vohra notes.

While weather forecasts are important, and while a certain amount of economic activity relies on them, the stakes are higher when forecasting turns to other matters. Hedge funds, for example, lure investors with promises of returns. Some may turn out to be wildly successful, others not. Past returns are no guarantee of future performance, reads the disclaimer. Investors need a way of knowing with which fund to invest, so people have tried to devise tests to rate which funds rise to the top.

“I think the practical lesson is that if you don’t know anything, there is no test that you can construct that would tell whether someone else knows anything.”

The best test is experience, Vohra says. Invest $1 with a number of hedge funds for an extended period and see which provides an average growth rate that beats the market. Then, pool your money with that fund. Unfortunately, you will not find a manager willing to accept that contract, and those contracts they will accept can be gamed. Like the meteorologist who forecasts 30 percent chance of rain every day, a badly run hedge fund could still be 100 percent calibrated but far from accurate. To mask their inaccuracy, managers may use overly complex formulas with variables that serve to boost apparent accuracy but add little explanatory power.

Tilting at Windmills?

At this point, it may seem like rating forecasters is a quixotic battle. And it is, if you do not have any knowledge of the underlying process that is being forecasted. Calibration relies entirely on analyzing data and has no information about the underlying environment. All it cares about is whether a forecast statistically matches an outcome.

Properly assessing forecasts requires knowledge of the forecasting scheme and some insight about the process being forecasted. Data alone is not enough. For all the reams of data we have in this world, it is still a finite amount. However, the number of ways to predict the future are infinite—everyone can have their own theories. The collision of finite data and infinite theories is where calibration falls apart. “A finite amount of data cannot possibly sort through an infinite number of theories,” Vohra points out.

The key, then, is to know something about the underlying process being forecasted. “I think the practical lesson is that if you don’t know anything, there is no test that you can construct that would tell whether someone else knows anything,” Vohra says. “There’s no substitute for knowledge.”

“When evaluating a forecaster, you don’t just look at the outcomes,” he concludes. “You also ask how complex is their forecasting scheme. If it’s a simple forecasting scheme, and it has a good calibration score, then this person probably knows something. But if it’s an enormously complicated forecasting scheme and it has a good calibration score, they’re probably trying to pull a fast one.”

Related reading on Kellogg Insight

Expert or Charlatan? A test to tell the difference between authentic experts and flimflam artists

Firming Up the Foundations of Game Theory: Elucidating the role of information in strategic interactions

Featured Faculty

Faculty member in the Department of Managerial Economics & Decision Sciences until 2013

About the Writer
Tim De Chant was science writer and editor of Kellogg Insight between 2009 and 2012.
About the Research

Foster, Dean and Rakesh Vohra. 2013. “Calibration: Respice, Adspice, Prospice.” In Advances in Economics and Econometrics. Tenth World Congress. Volume 1, Economic Theory, edited by Daron Acemoglu, Manuel Arellano and Eddie Dekel. Cambridge University Press.

Read the original

Most Popular This Week
  1. 3 Tips for Reinventing Your Career After a Layoff
    It’s crucial to reassess what you want to be doing instead of jumping at the first opportunity.
    woman standing confidently
  2. College Campuses Are Becoming More Diverse. But How Much Do Students from Different Backgrounds Actually Interact?
    Increasing diversity has been a key goal, “but far less attention is paid to what happens after we get people in the door.”
    College quad with students walking away from the center
  3. When Do Open Borders Make Economic Sense?
    A new study provides a window into the logic behind various immigration policies.
    How immigration affects the economy depends on taxation and worker skills.
  4. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  5. Podcast: Does Your Life Reflect What You Value?
    On this episode of The Insightful Leader, a former CEO explains how to organize your life around what really matters—instead of trying to do it all.
  6. 5 Ways to Improve Diversity Training, According to a New Study
    All too often, these programs are ineffective and short-lived. But they don’t have to be.
    diversity training session
  7. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  8. Your Team Doesn’t Need You to Be the Hero
    Too many leaders instinctively try to fix a crisis themselves. A U.S. Army colonel explains how to curb this tendency in yourself and allow your teams to flourish.
    person with red cape trying to put out fire while firefighters stand by.
  9. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  10. Podcast: China’s Economy Is in Flux. Here’s What American Businesses Need to Know.
    On this episode of The Insightful Leader: the end of “Zero Covid,” escalating geopolitical tensions, and China’s potentially irreplaceable role in the global supply chain.
  11. What Went Wrong at AIG?
    Unpacking the insurance giant's collapse during the 2008 financial crisis.
    What went wrong during the AIG financial crisis?
  12. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  13. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  14. Why Well-Meaning NGOs Sometimes Do More Harm than Good
    Studies of aid groups in Ghana and Uganda show why it’s so important to coordinate with local governments and institutions.
    To succeed, foreign aid and health programs need buy-in and coordination with local partners.
  15. How Much Do Campaign Ads Matter?
    Tone is key, according to new research, which found that a change in TV ad strategy could have altered the results of the 2000 presidential election.
    Political advertisements on television next to polling place
  16. How Experts Make Complex Decisions
    By studying 200 million chess moves, researchers shed light on what gives players an advantage—and what trips them up.
    two people playing chess
  17. Jeff Ubben Explains His “Anti-ESG ESG” Investment Strategy
    In a recent conversation with Kellogg’s Robert Korajczyk, the hedge-fund leader breaks down his unique approach to mission-driven investing.
    smokestacks, wind turbine, solar panel
  18. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
More in Economics