Should I Bring an Umbrella?
Skip to content
Economics Strategy Nov 1, 2011

Should I Bring an Umbrella?

Rating forecasters is difficult, but not impossible

Based on the research of

Dean Foster

Rakesh Vohra

Listening: Interview with Rakesh Vohra on Forecasting
0:00 Skip back button Play Skip forward button 17:38

Everyone has second-guessed the weatherman at least once. Questioning the wisdom of the daily weather forecast interpretations is a time-honored tradition in some parts, but our liberal interpretations of accuracy are not fair measures of meteorologists’ true predictive abilities. For that, we turn to game theorists like Rakesh Vohra.

Vohra is a professor of managerial economics and decision sciences at the Kellogg School of Management, and one of his specialties is assessing such prognostications, whether they be in meteorology or the bond markets. He and Dean Foster, a professor at the Wharton School, sought a robust way to tell whether someone is making accurate forecasts based on knowledge, or whether their apparent prescience is just plain luck. It turns out there is more to judging forecasts than sticking your head out the window to see if it is raining.

“If I ask you to predict yes or no whether it’s going to rain, it’s obvious to check whether you made a mistake,” Vohra says. “But if I ask you to predict what’s the probability of rain, it’s not obvious how to check whether you’ve made a mistake, right, because if you say there’s a 30 percent chance of rain and it rains, that’s not wrong, right?”

While the weather may be the most straightforward and accessible example, there are myriad other examples of situations where assessing the accuracy of a forecast is important. “There are a variety of people who provide probability forecasts,” Vohra continues. “Meteorologists are the ones that everyone is familiar with, but if you go to prediction markets, which are very popular these days, the securities that they trade, the price at which these securities trade are to be interpreted as probabilities.”

A Flawed Measure

The most popular method for judging the accuracy of a probability forecast is based on calibration. This is one of the ways the National Weather Service judges its meteorologists. They look at all of the days on which a meteorologist forecasts a 30 percent chance of rain. If on those days it rained about 30 percent of the time, then the forecaster is said to be well calibrated on those days. Saying that a meteorologist is well calibrated overall means his or her predictions match with the probability estimates on most days.

Such calibration is intuitively appealing, but flawed. To demonstrate this, Vohra and Foster came up with a forecasting algorithm that was guaranteed to be well calibrated in all environments. In other words, they showed that anyone can generate probability forecasts of rain that are well calibrated without knowing anything about weather. To see where the problem lies, imagine a meteorologist who says there will be a 30 percent chance of rain for each of the next ten days. Let’s say it rained three days of the ten. Under calibration, then the meteorologist would be judged 100 percent accurate.

Now say another meteorologist forecast a 90 percent chance of rain for the three days it rained and a 10 percent chance for the seven it did not. Under the calibration criterion, this person’s forecasting ability would be considered worse despite being more accurate than that of the person who predicted a 30 percent chance of rain every day. “That says maybe the problem is calibration,” Vohra notes.

While weather forecasts are important, and while a certain amount of economic activity relies on them, the stakes are higher when forecasting turns to other matters. Hedge funds, for example, lure investors with promises of returns. Some may turn out to be wildly successful, others not. Past returns are no guarantee of future performance, reads the disclaimer. Investors need a way of knowing with which fund to invest, so people have tried to devise tests to rate which funds rise to the top.

“I think the practical lesson is that if you don’t know anything, there is no test that you can construct that would tell whether someone else knows anything.”

The best test is experience, Vohra says. Invest $1 with a number of hedge funds for an extended period and see which provides an average growth rate that beats the market. Then, pool your money with that fund. Unfortunately, you will not find a manager willing to accept that contract, and those contracts they will accept can be gamed. Like the meteorologist who forecasts 30 percent chance of rain every day, a badly run hedge fund could still be 100 percent calibrated but far from accurate. To mask their inaccuracy, managers may use overly complex formulas with variables that serve to boost apparent accuracy but add little explanatory power.

Tilting at Windmills?

At this point, it may seem like rating forecasters is a quixotic battle. And it is, if you do not have any knowledge of the underlying process that is being forecasted. Calibration relies entirely on analyzing data and has no information about the underlying environment. All it cares about is whether a forecast statistically matches an outcome.

Properly assessing forecasts requires knowledge of the forecasting scheme and some insight about the process being forecasted. Data alone is not enough. For all the reams of data we have in this world, it is still a finite amount. However, the number of ways to predict the future are infinite—everyone can have their own theories. The collision of finite data and infinite theories is where calibration falls apart. “A finite amount of data cannot possibly sort through an infinite number of theories,” Vohra points out.

The key, then, is to know something about the underlying process being forecasted. “I think the practical lesson is that if you don’t know anything, there is no test that you can construct that would tell whether someone else knows anything,” Vohra says. “There’s no substitute for knowledge.”

“When evaluating a forecaster, you don’t just look at the outcomes,” he concludes. “You also ask how complex is their forecasting scheme. If it’s a simple forecasting scheme, and it has a good calibration score, then this person probably knows something. But if it’s an enormously complicated forecasting scheme and it has a good calibration score, they’re probably trying to pull a fast one.”

Related reading on Kellogg Insight

Expert or Charlatan? A test to tell the difference between authentic experts and flimflam artists

Firming Up the Foundations of Game Theory: Elucidating the role of information in strategic interactions

Featured Faculty

Faculty member in the Department of Managerial Economics & Decision Sciences until 2013

About the Writer
Tim De Chant was science writer and editor of Kellogg Insight between 2009 and 2012.
About the Research

Foster, Dean and Rakesh Vohra. 2013. “Calibration: Respice, Adspice, Prospice.” In Advances in Economics and Econometrics. Tenth World Congress. Volume 1, Economic Theory, edited by Daron Acemoglu, Manuel Arellano and Eddie Dekel. Cambridge University Press.

Read the original

Most Popular This Week
  1. One Key to a Happy Marriage? A Joint Bank Account.
    Merging finances helps newlyweds align their financial goals and avoid scorekeeping.
    married couple standing at bank teller's window
  2. Take 5: Yikes! When Unintended Consequences Strike
    Good intentions don’t always mean good results. Here’s why humility, and a lot of monitoring, are so important when making big changes.
    People pass an e-cigarette billboard
  3. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  4. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  5. Entrepreneurship Through Acquisition Is Still Entrepreneurship
    ETA is one of the fastest-growing paths to entrepreneurship. Here's how to think about it.
    An entrepreneur strides toward a business for sale.
  6. Take 5: Research-Backed Tips for Scheduling Your Day
    Kellogg faculty offer ideas for working smarter and not harder.
    A to-do list with easy and hard tasks
  7. How to Manage a Disengaged Employee—and Get Them Excited about Work Again
    Don’t give up on checked-out team members. Try these strategies instead.
    CEO cheering on team with pom-poms
  8. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  9. What Went Wrong at AIG?
    Unpacking the insurance giant's collapse during the 2008 financial crisis.
    What went wrong during the AIG financial crisis?
  10. The Appeal of Handmade in an Era of Automation
    This excerpt from the book “The Power of Human" explains why we continue to equate human effort with value.
    person, robot, and elephant make still life drawing.
  11. 2 Factors Will Determine How Much AI Transforms Our Economy
    They’ll also dictate how workers stand to fare.
    robot waiter serves couple in restaurant
  12. When Do Open Borders Make Economic Sense?
    A new study provides a window into the logic behind various immigration policies.
    How immigration affects the economy depends on taxation and worker skills.
  13. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  14. Sitting Near a High-Performer Can Make You Better at Your Job
    “Spillover” from certain coworkers can boost our productivity—or jeopardize our employment.
    The spillover effect in offices impacts workers in close physical proximity.
  15. How the Wormhole Decade (2000–2010) Changed the World
    Five implications no one can afford to ignore.
    The rise of the internet resulted in a global culture shift that changed the world.
  16. What’s at Stake in the Debt-Ceiling Standoff?
    Defaulting would be an unmitigated disaster, quickly felt by ordinary Americans.
    two groups of politicians negotiate while dangling upside down from the ceiling of a room
  17. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  18. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  19. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  20. 3 Traits of Successful Market-Creating Entrepreneurs
    Creating a market isn’t for the faint of heart. But a dose of humility can go a long way.
    man standing on hilltop overlooking city
Add Insight to your inbox.
More in Economics