Numeric Performance Reviews Can Be Biased Against Women
Skip to content
Organizations Careers Aug 1, 2019

Numeric Performance Reviews Can Be Biased Against Women

The ten-point scale favors men in some situations. But a simple change to the rating system can level the playing field.

Women are subject to greater gender bias when evaluated using certain ratings scales.

Lisa Röper

Based on the research of

Lauren Rivera

András Tilcsik

On a scale of one to ten, how surprised would you be to learn that a common professional evaluation tool is biased against women? Given the many inequalities in the modern workplace—from pay disparity to freezing-cold conference rooms—some people might rate their disbelief at yet another inequity at a zero.

Add Insight
to your inbox.

The culprit this time? The ten-point rating scale itself.

Numeric performance ratings of employees are “staples of the modern workplace,” says Kellogg’s Lauren Rivera. In some industries, such as consulting, it’s common for employees to be rated after every project they complete. And the stakes are high. How a worker measures up can influence short-term compensation decisions and long-term career trajectories.

Previous studies show that, in general, men have a leg up in performance evaluations and are consistently deemed more able, likeable, and worthy than women, even when their work is identical. But no one had examined whether numeric rating systems themselves—what Rivera calls “the architecture of evaluation”—might be contributing to the problem. So Rivera and coauthor András Tilcsik of the University of Toronto decided to explore the question.

They found that the rating system can be biased against women. The ten-point scale places women, especially those in male-dominated fields, at a significant disadvantage. But, crucially, that disadvantage vanishes when men and women are evaluated on a six-point scale.

Although we think such systems are objective, Rivera says, “they’re not neutral instruments at all.” For any employer that uses numeric evaluations, “the rating system you choose matters, so choose it wisely.”

Understanding When Gender Bias Creeps into Performance Evaluations

Rivera and Tilcsik had an unexpected bit of luck when they began their research: they learned that a professional school at a North American university was planning to switch its instructor evaluation system from a ten-point scale to a six-point scale.

The school’s decision had nothing to do with gender. Administrators suspected students were mentally converting the ten-point scale into percentages and letter-grade scores, making them hesitant to give their instructors ratings below a seven. The school’s leaders theorized that a different scale might yield more varied and accurate results.

The switch created an ideal natural experiment. Rivera and Tilcsik could compare how the same instructors teaching the same courses fared under different numerical rating systems. They weren’t sure what they would find.

Because of stereotypes that associate men, but not women, with brilliance and excellence, it’s more difficult for women to get the top rating on any evaluation. For that reason, it seemed possible that scales with fewer points might ultimately disadvantage women, since a five on a six-point scale is a lower assessment than a nine on a ten-point scale. Yet they also wondered if the ubiquity of the ten-point scale, and the strong cultural association of the number ten with excellence, might make the ten-point scale an especially unbalanced instrument.

“Due to gender stereotypes of competence, we just don’t think women are perfect. We are more likely to scrutinize women and their performance.”

The researchers collected course evaluations for 29 academic terms: 20 before the switch from a ten- to a six-point scale, and nine after that change. The sample included 105,304 ratings of 369 instructors. The researchers also identified four areas of study that were particularly male dominated, in which women made up less than 15 percent of instructors.

They were heartened to discover that women teaching in non-male-dominated fields were evaluated about the same as men under both rating systems. The average rating and distribution of ratings for men and women were nearly identical under the ten-point system, they found. And switching to a six-point scale did not affect the frequency or distribution of ratings in these fields, nor did it affect the likelihood of women receiving a perfect rating.

But it was an entirely different story for women in male-dominated fields.

In these areas, 31.4 percent of male instructors’ ratings were perfect tens, but only 19.5 percent of female instructors’ ratings were tens. In fact, for men in male-dominated fields, a ten was the most common rating. Women’s most common rating was an eight. Male instructors had an average rating of 8.2; for women, the average rating was half a point lower, at 7.7.

Yet these differences vanished with the six-point rating scale. Male and female instructors received a perfect score of six at almost the same frequency: 41.2 percent for men and 41.7 for women. The gap in men’s and women’s average ratings narrowed too: 4.91 for men and 5.01 for women.

This came as a surprise to Rivera and Tilcsik. “I was expecting the gap between men and women to narrow, but it was pretty striking that it eliminated the gender gap in ratings,” Rivera says.

They were especially struck by how much of a difference it made for women being evaluated in male-dominated fields. “What that suggested to us is that potentially the real opportunity for intervention is in those stereotypically male-dominated arenas, as opposed to ones that might be more gender mixed,” Rivera says.

Why “John” Is Brilliant, but “Julie” Is Just Smart

Although Rivera and Tilcsik were intrigued by the result of the natural experiment, they wanted to make sure it wasn’t limited to the one school they studied. So they conducted a complementary online survey.

They recruited 400 professional-school students from across the United States, who all read the same lecture about the social and economic implications of technological change. Some participants were told that the lecture was delivered by Professor John Anderson, while the rest were told it was from Professor Julie Anderson. Then, participants were asked to rate the instructor on either a ten- or six-point scale, and to list the words that first came to mind when they thought of the instructor’s performance.

The results echoed what the researchers saw in the field experiment. Under the ten-point scale, Professor John received an average rating of 7.8, while Professor Julie’s average was 7.1. Once again, the gap in average ratings shrank when the instructors were ranked on a six-point scale—4.9 for John and 4.8 for Julie.

And, just as the researchers saw in the field experiment, participants were more willing to give female instructors the top rating on the six-point scale than on the ten-point scale. Julie received ten out of ten in only 13 percent of cases, while John got a perfect ten 22 percent of the time. But they got perfect sixes at nearly equal rates—25 for John and 24 percent for Julie.

“We’re finding a way to turn down the volume on gender stereotypes.”

Still, there were noticeable differences in how participants described the professors. Superlatives like “brilliant,” “genius,” and “amazing” were applied much more frequently to John than to Julie, whose teaching was more often characterized by participants as simply good.

It was clear to Rivera and Tilcsik that participants seemed to have different expectations for a “perfect ten” and “perfect six”: among participants who gave a ten out of ten rating, 54.2 percent used superlative language to describe the professor. Among participants who gave a perfect six, only 28.6 percent used such language.

The Not-So-Perfect Ten

So, why does the ten-point scale disadvantage women? Rivera thinks the loaded cultural language of the “perfect ten” may be partly to blame.

“The number ten carries this cultural connotation of perfection,” she says. “Research shows that, due to gender stereotypes of competence, we just don’t think women are perfect. We are more likely to scrutinize women and their performance.”

This difference helps explain why the effect of the ten-point scale was so pronounced in male-dominated fields, where stereotypes of male brilliance are especially strong. When people imagine the standouts in a stereotypically male field—the perfect tens—the figures that come to mind most readily are men. It’s much easier, then, for raters to associate a man than a woman with this preconceived idea of excellence.

Of course, there’s nothing magical about six-point scales in and of themselves. But Rivera believes the change from ten to six could be useful in any field that uses numeric evaluations. It can act as a “bias interrupter”: by removing the familiar and culturally fraught concept of the “perfect ten,” and creating a more neutral mindset for raters, “we’re finding a way to turn down the volume on gender stereotypes,” she says.

Yet moving away from ten-point scales in performance evaluations isn’t a panacea, Rivera points out.

“We should work on changing the images that people get from a very young age, how we structure work, how we recognize others, the messages we see in the media,” she says. But while we undertake the long, slow work of cultural change, it’s important to put small, meaningful changes into place. “I think interventions such as this have a lot of power to reduce the effect of biased evaluations on people’s career opportunities.”

Featured Faculty

Professor of Management & Organizations; Professor of Sociology, Weinberg College of Arts & Sciences (Courtesy)

About the Writer
Susie Allen is a freelance writer in Chicago.
About the Research
Rivera, Lauren A., and András Tilcsik. 2019. “Scaling Down Inequality: Rating Scales, Gender Bias, and the Architecture of Evaluation.” American Sociological Review. 84(2): 248–274.
Most Popular This Week
  1. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  2. 3 Tips for Reinventing Your Career After a Layoff
    It’s crucial to reassess what you want to be doing instead of jumping at the first opportunity.
    woman standing confidently
  3. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  4. 6 Takeaways on Inflation and the Economy Right Now
    Are we headed into a recession? Kellogg’s Sergio Rebelo breaks down the latest trends.
    inflatable dollar sign tied down with mountains in background
  5. What Is the Purpose of a Corporation Today?
    Has anything changed in the three years since the Business Roundtable declared firms should prioritize more than shareholders?
    A city's skyscrapers interspersed with trees and rooftop gardens
  6. How to Get the Ear of Your CEO—And What to Say When You Have It
    Every interaction with the top boss is an audition for senior leadership.
    employee presents to CEO in elevator
  7. Why We Can’t All Get Away with Wearing Designer Clothes
    In certain professions, luxury goods can send the wrong signal.​
    Man wearing luxury-brand clothes walks with a cold wind behind him, chilling three people he passes.
  8. Why You Should Skip the Easy Wins and Tackle the Hard Task First
    New research shows that you and your organization lose out when you procrastinate on the difficult stuff.
    A to-do list with easy and hard tasks
  9. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  10. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  11. When Do Open Borders Make Economic Sense?
    A new study provides a window into the logic behind various immigration policies.
    How immigration affects the economy depends on taxation and worker skills.
  12. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  13. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  14. How Old Are Successful Tech Entrepreneurs?
    A definitive new study dispels the myth of the Silicon Valley wunderkind.
    successful entrepreneurs are most often middle aged
  15. How Offering a Product for Free Can Backfire
    It seems counterintuitive, but there are times customers would rather pay a small amount than get something for free.
    people in grocery store aisle choosing cheap over free option of same product.
  16. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  17. College Campuses Are Becoming More Diverse. But How Much Do Students from Different Backgrounds Actually Interact?
    Increasing diversity has been a key goal, “but far less attention is paid to what happens after we get people in the door.”
    College quad with students walking away from the center
  18. How Peer Pressure Can Lead Teens to Underachieve—Even in Schools Where It’s “Cool to Be Smart”
    New research offers lessons for administrators hoping to improve student performance.
    Eager student raises hand while other student hesitates.
More in Organizations