Should I Bring an Umbrella?
Skip to content
Economics Strategy Nov 1, 2011

Should I Bring an Umbrella?

Rating forecasters is difficult, but not impossible

Based on the research of

Dean Foster

Rakesh Vohra

Listening: Interview with Rakesh Vohra on Forecasting
download
0:00 Skip back button Play Skip forward button 17:38

Everyone has second-guessed the weatherman at least once. Questioning the wisdom of the daily weather forecast interpretations is a time-honored tradition in some parts, but our liberal interpretations of accuracy are not fair measures of meteorologists’ true predictive abilities. For that, we turn to game theorists like Rakesh Vohra.

Vohra is a professor of managerial economics and decision sciences at the Kellogg School of Management, and one of his specialties is assessing such prognostications, whether they be in meteorology or the bond markets. He and Dean Foster, a professor at the Wharton School, sought a robust way to tell whether someone is making accurate forecasts based on knowledge, or whether their apparent prescience is just plain luck. It turns out there is more to judging forecasts than sticking your head out the window to see if it is raining.

“If I ask you to predict yes or no whether it’s going to rain, it’s obvious to check whether you made a mistake,” Vohra says. “But if I ask you to predict what’s the probability of rain, it’s not obvious how to check whether you’ve made a mistake, right, because if you say there’s a 30 percent chance of rain and it rains, that’s not wrong, right?”

While the weather may be the most straightforward and accessible example, there are myriad other examples of situations where assessing the accuracy of a forecast is important. “There are a variety of people who provide probability forecasts,” Vohra continues. “Meteorologists are the ones that everyone is familiar with, but if you go to prediction markets, which are very popular these days, the securities that they trade, the price at which these securities trade are to be interpreted as probabilities.”

A Flawed Measure

The most popular method for judging the accuracy of a probability forecast is based on calibration. This is one of the ways the National Weather Service judges its meteorologists. They look at all of the days on which a meteorologist forecasts a 30 percent chance of rain. If on those days it rained about 30 percent of the time, then the forecaster is said to be well calibrated on those days. Saying that a meteorologist is well calibrated overall means his or her predictions match with the probability estimates on most days.

Such calibration is intuitively appealing, but flawed. To demonstrate this, Vohra and Foster came up with a forecasting algorithm that was guaranteed to be well calibrated in all environments. In other words, they showed that anyone can generate probability forecasts of rain that are well calibrated without knowing anything about weather. To see where the problem lies, imagine a meteorologist who says there will be a 30 percent chance of rain for each of the next ten days. Let’s say it rained three days of the ten. Under calibration, then the meteorologist would be judged 100 percent accurate.

Now say another meteorologist forecast a 90 percent chance of rain for the three days it rained and a 10 percent chance for the seven it did not. Under the calibration criterion, this person’s forecasting ability would be considered worse despite being more accurate than that of the person who predicted a 30 percent chance of rain every day. “That says maybe the problem is calibration,” Vohra notes.

While weather forecasts are important, and while a certain amount of economic activity relies on them, the stakes are higher when forecasting turns to other matters. Hedge funds, for example, lure investors with promises of returns. Some may turn out to be wildly successful, others not. Past returns are no guarantee of future performance, reads the disclaimer. Investors need a way of knowing with which fund to invest, so people have tried to devise tests to rate which funds rise to the top.

“I think the practical lesson is that if you don’t know anything, there is no test that you can construct that would tell whether someone else knows anything.”

The best test is experience, Vohra says. Invest $1 with a number of hedge funds for an extended period and see which provides an average growth rate that beats the market. Then, pool your money with that fund. Unfortunately, you will not find a manager willing to accept that contract, and those contracts they will accept can be gamed. Like the meteorologist who forecasts 30 percent chance of rain every day, a badly run hedge fund could still be 100 percent calibrated but far from accurate. To mask their inaccuracy, managers may use overly complex formulas with variables that serve to boost apparent accuracy but add little explanatory power.

Tilting at Windmills?

At this point, it may seem like rating forecasters is a quixotic battle. And it is, if you do not have any knowledge of the underlying process that is being forecasted. Calibration relies entirely on analyzing data and has no information about the underlying environment. All it cares about is whether a forecast statistically matches an outcome.

Properly assessing forecasts requires knowledge of the forecasting scheme and some insight about the process being forecasted. Data alone is not enough. For all the reams of data we have in this world, it is still a finite amount. However, the number of ways to predict the future are infinite—everyone can have their own theories. The collision of finite data and infinite theories is where calibration falls apart. “A finite amount of data cannot possibly sort through an infinite number of theories,” Vohra points out.

The key, then, is to know something about the underlying process being forecasted. “I think the practical lesson is that if you don’t know anything, there is no test that you can construct that would tell whether someone else knows anything,” Vohra says. “There’s no substitute for knowledge.”

“When evaluating a forecaster, you don’t just look at the outcomes,” he concludes. “You also ask how complex is their forecasting scheme. If it’s a simple forecasting scheme, and it has a good calibration score, then this person probably knows something. But if it’s an enormously complicated forecasting scheme and it has a good calibration score, they’re probably trying to pull a fast one.”

Related reading on Kellogg Insight

Expert or Charlatan? A test to tell the difference between authentic experts and flimflam artists

Firming Up the Foundations of Game Theory: Elucidating the role of information in strategic interactions

Featured Faculty

Faculty member in the Department of Managerial Economics & Decision Sciences until 2013

About the Writer
Tim De Chant was science writer and editor of Kellogg Insight between 2009 and 2012.
About the Research

Foster, Dean and Rakesh Vohra. 2013. “Calibration: Respice, Adspice, Prospice.” In Advances in Economics and Econometrics. Tenth World Congress. Volume 1, Economic Theory, edited by Daron Acemoglu, Manuel Arellano and Eddie Dekel. Cambridge University Press.

Read the original

Add Insight to your inbox.
This website uses cookies and similar technologies to analyze and optimize site usage. By continuing to use our websites, you consent to this. For more information, please read our Privacy Statement.
More in Policy & the Economy Economics
close-thin