Numeric Performance Reviews Can Be Biased Against Women
Skip to content
Organizations Careers Aug 1, 2019

Numeric Performance Reviews Can Be Biased Against Women

The ten-point scale favors men in some situations. But a simple change to the rating system can level the playing field.

Women are subject to greater gender bias when evaluated using certain ratings scales.

Lisa Röper

Based on the research of

Lauren Rivera

András Tilcsik

On a scale of one to ten, how surprised would you be to learn that a common professional evaluation tool is biased against women? Given the many inequalities in the modern workplace—from pay disparity to freezing-cold conference rooms—some people might rate their disbelief at yet another inequity at a zero.

The culprit this time? The ten-point rating scale itself.

Numeric performance ratings of employees are “staples of the modern workplace,” says Kellogg’s Lauren Rivera. In some industries, such as consulting, it’s common for employees to be rated after every project they complete. And the stakes are high. How a worker measures up can influence short-term compensation decisions and long-term career trajectories.

Previous studies show that, in general, men have a leg up in performance evaluations and are consistently deemed more able, likeable, and worthy than women, even when their work is identical. But no one had examined whether numeric rating systems themselves—what Rivera calls “the architecture of evaluation”—might be contributing to the problem. So Rivera and coauthor András Tilcsik of the University of Toronto decided to explore the question.

They found that the rating system can be biased against women. The ten-point scale places women, especially those in male-dominated fields, at a significant disadvantage. But, crucially, that disadvantage vanishes when men and women are evaluated on a six-point scale.

Although we think such systems are objective, Rivera says, “they’re not neutral instruments at all.” For any employer that uses numeric evaluations, “the rating system you choose matters, so choose it wisely.”

Understanding When Gender Bias Creeps into Performance Evaluations

Rivera and Tilcsik had an unexpected bit of luck when they began their research: they learned that a professional school at a North American university was planning to switch its instructor evaluation system from a ten-point scale to a six-point scale.

The school’s decision had nothing to do with gender. Administrators suspected students were mentally converting the ten-point scale into percentages and letter-grade scores, making them hesitant to give their instructors ratings below a seven. The school’s leaders theorized that a different scale might yield more varied and accurate results.

The switch created an ideal natural experiment. Rivera and Tilcsik could compare how the same instructors teaching the same courses fared under different numerical rating systems. They weren’t sure what they would find.

Because of stereotypes that associate men, but not women, with brilliance and excellence, it’s more difficult for women to get the top rating on any evaluation. For that reason, it seemed possible that scales with fewer points might ultimately disadvantage women, since a five on a six-point scale is a lower assessment than a nine on a ten-point scale. Yet they also wondered if the ubiquity of the ten-point scale, and the strong cultural association of the number ten with excellence, might make the ten-point scale an especially unbalanced instrument.

“Due to gender stereotypes of competence, we just don’t think women are perfect. We are more likely to scrutinize women and their performance.”

The researchers collected course evaluations for 29 academic terms: 20 before the switch from a ten- to a six-point scale, and nine after that change. The sample included 105,304 ratings of 369 instructors. The researchers also identified four areas of study that were particularly male dominated, in which women made up less than 15 percent of instructors.

They were heartened to discover that women teaching in non-male-dominated fields were evaluated about the same as men under both rating systems. The average rating and distribution of ratings for men and women were nearly identical under the ten-point system, they found. And switching to a six-point scale did not affect the frequency or distribution of ratings in these fields, nor did it affect the likelihood of women receiving a perfect rating.

But it was an entirely different story for women in male-dominated fields.

In these areas, 31.4 percent of male instructors’ ratings were perfect tens, but only 19.5 percent of female instructors’ ratings were tens. In fact, for men in male-dominated fields, a ten was the most common rating. Women’s most common rating was an eight. Male instructors had an average rating of 8.2; for women, the average rating was half a point lower, at 7.7.

Yet these differences vanished with the six-point rating scale. Male and female instructors received a perfect score of six at almost the same frequency: 41.2 percent for men and 41.7 for women. The gap in men’s and women’s average ratings narrowed too: 4.91 for men and 5.01 for women.

This came as a surprise to Rivera and Tilcsik. “I was expecting the gap between men and women to narrow, but it was pretty striking that it eliminated the gender gap in ratings,” Rivera says.

They were especially struck by how much of a difference it made for women being evaluated in male-dominated fields. “What that suggested to us is that potentially the real opportunity for intervention is in those stereotypically male-dominated arenas, as opposed to ones that might be more gender mixed,” Rivera says.

Why “John” Is Brilliant, but “Julie” Is Just Smart

Although Rivera and Tilcsik were intrigued by the result of the natural experiment, they wanted to make sure it wasn’t limited to the one school they studied. So they conducted a complementary online survey.

They recruited 400 professional-school students from across the United States, who all read the same lecture about the social and economic implications of technological change. Some participants were told that the lecture was delivered by Professor John Anderson, while the rest were told it was from Professor Julie Anderson. Then, participants were asked to rate the instructor on either a ten- or six-point scale, and to list the words that first came to mind when they thought of the instructor’s performance.

The results echoed what the researchers saw in the field experiment. Under the ten-point scale, Professor John received an average rating of 7.8, while Professor Julie’s average was 7.1. Once again, the gap in average ratings shrank when the instructors were ranked on a six-point scale—4.9 for John and 4.8 for Julie.

And, just as the researchers saw in the field experiment, participants were more willing to give female instructors the top rating on the six-point scale than on the ten-point scale. Julie received ten out of ten in only 13 percent of cases, while John got a perfect ten 22 percent of the time. But they got perfect sixes at nearly equal rates—25 for John and 24 percent for Julie.

“We’re finding a way to turn down the volume on gender stereotypes.”

Still, there were noticeable differences in how participants described the professors. Superlatives like “brilliant,” “genius,” and “amazing” were applied much more frequently to John than to Julie, whose teaching was more often characterized by participants as simply good.

It was clear to Rivera and Tilcsik that participants seemed to have different expectations for a “perfect ten” and “perfect six”: among participants who gave a ten out of ten rating, 54.2 percent used superlative language to describe the professor. Among participants who gave a perfect six, only 28.6 percent used such language.

The Not-So-Perfect Ten

So, why does the ten-point scale disadvantage women? Rivera thinks the loaded cultural language of the “perfect ten” may be partly to blame.

“The number ten carries this cultural connotation of perfection,” she says. “Research shows that, due to gender stereotypes of competence, we just don’t think women are perfect. We are more likely to scrutinize women and their performance.”

This difference helps explain why the effect of the ten-point scale was so pronounced in male-dominated fields, where stereotypes of male brilliance are especially strong. When people imagine the standouts in a stereotypically male field—the perfect tens—the figures that come to mind most readily are men. It’s much easier, then, for raters to associate a man than a woman with this preconceived idea of excellence.

Of course, there’s nothing magical about six-point scales in and of themselves. But Rivera believes the change from ten to six could be useful in any field that uses numeric evaluations. It can act as a “bias interrupter”: by removing the familiar and culturally fraught concept of the “perfect ten,” and creating a more neutral mindset for raters, “we’re finding a way to turn down the volume on gender stereotypes,” she says.

Yet moving away from ten-point scales in performance evaluations isn’t a panacea, Rivera points out.

“We should work on changing the images that people get from a very young age, how we structure work, how we recognize others, the messages we see in the media,” she says. But while we undertake the long, slow work of cultural change, it’s important to put small, meaningful changes into place. “I think interventions such as this have a lot of power to reduce the effect of biased evaluations on people’s career opportunities.”

Featured Faculty

Professor of Management & Organizations; Professor of Sociology, Weinberg College of Arts & Sciences (Courtesy)

About the Writer
Susie Allen is a freelance writer in Chicago.
About the Research
Rivera, Lauren A., and András Tilcsik. 2019. “Scaling Down Inequality: Rating Scales, Gender Bias, and the Architecture of Evaluation.” American Sociological Review. 84(2): 248–274. https://journals.sagepub.com/doi/10.1177/0003122419833601
Most Popular This Week
  1. Sitting Near a High-Performer Can Make You Better at Your Job
    “Spillover” from certain coworkers can boost our productivity—or jeopardize our employment.
    The spillover effect in offices impacts workers in close physical proximity.
  2. Podcast: How to Discuss Poor Performance with Your Employee
    Giving negative feedback is not easy, but such critiques can be meaningful for both parties if you use the right roadmap. Get advice on this episode of The Insightful Leader.
  3. 2 Factors Will Determine How Much AI Transforms Our Economy
    They’ll also dictate how workers stand to fare.
    robot waiter serves couple in restaurant
  4. Will AI Kill Human Creativity?
    What Fake Drake tells us about what’s ahead.
    Rockstars await a job interview.
  5. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  6. 5 Tips for Growing as a Leader without Burning Yourself Out
    A leadership coach and former CEO on how to take a holistic approach to your career.
    father picking up kids from school
  7. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  8. What Should Leaders Make of the Latest AI?
    As ChatGPT flaunts its creative capabilities, two experts discuss the promise and pitfalls of our coexistence with machines.
    person working on computer next to computer working at a computer
  9. Today’s Gig Workers Are Subject to Endless Experimentation
    “It raises the question, do we want to be a society where experimentation is just the norm?”
    gig worker at computer with three scientists studying them through a window
  10. How to Make Inclusivity More Than Just an Office Buzzword
    Tips for turning good intentions into actions.
    A group of coworkers sit in various chairs.
  11. China’s Youth Unemployment Problem
    If the record-breaking joblessness persists, as seems likely, China will have an even harder time supporting its rapidly aging population.
    college graduate standing before Chinese flag
  12. The Psychological Factor That Helps Shape Our Moral Decision-Making
    We all have a preferred motivation style. When that aligns with how we’re approaching a specific goal, it can impact how ethical we are in sticky situations.
    a person puts donuts into a bag next to a sign that reads "limit one"
  13. How to Manage a Disengaged Employee—and Get Them Excited about Work Again
    Don’t give up on checked-out team members. Try these strategies instead.
    CEO cheering on team with pom-poms
  14. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  15. Why Are We So Quick to Borrow When the Value of Our Home Rises?
    The reason isn’t as simple as just feeling wealthier.
    A homeowner uses the value of their home to buy things.
  16. One Key to a Happy Marriage? A Joint Bank Account.
    Merging finances helps newlyweds align their financial goals and avoid scorekeeping.
    married couple standing at bank teller's window
  17. What’s at Stake in the Debt-Ceiling Standoff?
    Defaulting would be an unmitigated disaster, quickly felt by ordinary Americans.
    two groups of politicians negotiate while dangling upside down from the ceiling of a room
  18. Take 5: Research-Backed Tips for Scheduling Your Day
    Kellogg faculty offer ideas for working smarter and not harder.
    A to-do list with easy and hard tasks
  19. The Second-Mover Advantage
    A primer on how late-entering companies can compete with pioneers.
More in Organizations