Strategy Aug 4, 2020
To Find the Best Incentives for Employees, Start with a Simple A/B Test
Keeping people motivated can be tough. New research shows that a simple experiment can lead to big productivity gains.
By now you’re familiar with the power of A/B testing: running simple experiments to see which of two variants is more effective at achieving a desired outcome. Marketers use A/B tests to pit different call-to-action messages against one another, or to find out which of two images will lead to the most sales on an e-commerce site.
to your inbox.
But could managers use A/B tests in service of something very different—like designing the best way to motivate their employees?
Two researchers at the Kellogg School of Management think so.
In a new study, George Georgiadis and Michael Powell, both associate professors of strategy, develop a model that shows how organizations can use A/B testing to find more effective ways of structuring performance incentives. They determine that even a single A/B test can provide a surprising amount of information about how employees will respond to a range of incentive strategies. And they offer a framework for using A/B testing data to maximum advantage.
“We want to understand: If you have this kind of data, how can you use it to improve your employees’ incentive plans? How far can you go with just a simple A/B test?” says Georgiadis. “In principle, to put in place the ‘optimal’ incentive plan, you would need infinitely many experiments. What we argue is that with one experiment, you can actually go pretty far.”
This is important, he explains, because employers are understandably very reluctant to experiment with incentive schemes, as they don’t want to risk upsetting employees.
“If we’re talking about changing the way we pay people, we don’t want to do a lot of these experiments,” says Powell. “If you’re working on a website and you’re trying to figure out what color to make your button, it’s a lot easier to do a lot more tests.”
The Right Incentives
Organizations rely on a wide range of incentive schemes to motivate their employees to work hard.
Some schemes are fairly basic: think of an employee who receives a base salary as well as a bonus if a certain sales target is hit, or a transcriber paid based on the number of documents completed. Other schemes are far more complex and might involve tools such as profit-sharing or restricted stock.
But all of them involve critical decisions—with critical trade-offs. For instance, should that bonus be easily obtainable but modest? Or hard to get but very lucrative? Some employees might find the latter option more motivating; others, merely frustrating. And what of the base salary? Too high relative to the bonus and it might encourage complacency; too low and employees who prefer stability might balk.
Moreover, depending on the nature of the work, as well as employees’ individual preferences, an incentive scheme that works well in one organization might fail dismally in another. This means that, practically speaking, one of the only ways for managers to know whether there is a better incentive scheme for their organization is by modifying their existing scheme for a limited period of time—perhaps just in one part of the organization—and then seeing what actually happens to performance.
So Georgiadis and Powell set out to determine just how much employers could learn from a single tweak.
The researchers built a mathematical model to analyze interactions between an employer and its employees. The employer has an existing incentive scheme in place and collects data on how productive its employees are under that incentive scheme. Then the employer tweaks the incentive scheme in some arbitrary way—perhaps lowering the threshold for receiving a bonus, or increasing the per-piece pay—for some or all of its employees and collects data on how productive the employees are under that contract.
Then, the researchers explored how just how well the data generated from the previous A/B test could be used to create a new—and more effective—incentive contract.
“Suppose we figure out a way to get employees to work a little bit harder” under a new set of contract terms, says Powell. “We can see that, on average, this change in compensation increased output or productivity by a certain amount. But it turns out there’s a lot more information contained in that experiment. Namely, we know what happened not just to average output, but what happened to the probability of producing low output and high output. That’s very informative.”
Importantly, an employer can use the data from employees’ distribution of responses to predict how employee productivity—and by extension, the employer’s profits—will change given any change in the contract.
“If you’re willing to do an A/B test, you don’t have to know that much. You just observe how [employees] react.”
— Michael Powell
How so? For instance, by looking at the distribution of outputs in response to the two contracts, employers can learn whether an increase in the average output is driven by employees being less likely to slack versus more likely to work hard. The difference sounds subtle, but it is actually quite powerful.
If employees slacking less will increase productivity in a particular environment, “then what this tells us is that we would like to punish low output,” says Powell. So for instance, employers could pay employees less per task if they completed an unusually low number of tasks. Or employers could offer a very low base salary with the possibility of earning a bonus if employees are modestly, but acceptably, productive.
On the other hand, if employees working harder will increase productivity in a particular environment, this suggests that employers should “pay people more when high output is realized,” says Powell.
In practice, this could mean paying employees more per task if they complete an ambitiously high number of tasks, or offering an average base salary with the possibility of earning a bonus only if employees are extremely productive.
Put to the Test
To put the accuracy of their model to the test using productivity data generated by real participants, the researchers turned to previously published data from participants who completed a simple online task under six different payment schemes.
They wanted to understand just how well their model could use real performance data from any two payment schemes to predict how participants would perform under another, completely different scheme.
The model was able to predict performance under other incentive contracts with a high degree of accuracy. “On average, the gap between predicted and actual productivity is just less than 2 percent, says Georgiadis. “Mike and I were very surprised by how accurate the predictions are.”
The researchers also used the real productivity data to test their model’s ability to design a better contract. They wondered: How close would that contract get to optimal?
They found that, on average, using data from any two contracts would enable an employer to construct a third contract that obtained just over two-thirds of the gains that it would obtain if it could design a truly optimal contract.
“You’re not doing the ‘optimal’ contract because you don’t have all the information,” says Georgiadis. Still, “in the setting of this online experiment, a single A/B test can get you two-thirds of the way to optimality.”
Benefits of A/B Testing
Powell and Georgiadis’s framework has a number of benefits that make it practical for organizations to use. For one, unlike a lot of previous economic research into incentives, it doesn’t require that the employer understand in advance anything about their employees’ preferences, such as how much they dislike working at a faster pace. It also doesn’t require them to fully understand how much effort goes into being more productive in a given work environment.
“What we’re arguing is, if you’re willing to do an A/B test, you don’t have to know that much,” says Powell. “You just observe how they react.”
Their approach can be applied to organizations of different sizes, though organizations that can run a larger experiment, which will generate more data points, will learn more from their test.
Another benefit is that the researchers’ article includes all of the steps that an organization would need to take to actually use the data from an A/B test to generate an incentive scheme that is close to optimal. This in itself is key, because the procedure is hardly obvious.
The researchers point out that this work on A/B tests was originally inspired by students in their organizational strategy class. “We used to teach basic principles of incentive theory, and we would always get the question of, ‘Well, literally, what should I do? My parents have a factory and their workers are on a piece rate. How should we change the piece rate?’ And the existing tools weren’t well suited to answer that question,” says Powell.
This tool is. And it even has a final benefit: familiarity.
“Firms nowadays use experimentation for various purposes,” says Georgiadis. “We find that it can be quite useful for designing incentives as well.”
Your Team Doesn’t Need You to Be the HeroToo many leaders instinctively try to fix a crisis themselves. A U.S. Army colonel explains how to curb this tendency in yourself and allow your teams to flourish.
What Triggers a Career Hot Streak?New research reveals a recipe for success.
What’s the Secret to Successful Innovation?Hint: it’s not the product itself.
Which Form of Government Is Best?Democracies may not outlast dictatorships, but they adapt better.
How Much Do Campaign Ads Matter?Tone is key, according to new research, which found that a change in TV ad strategy could have altered the results of the 2000 presidential election.
What Went Wrong with FTX—and What’s Next for Crypto?One key issue will be introducing regulation without strangling innovation, a fintech expert explains.
How Are Black–White Biracial People Perceived in Terms of Race?Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
Immigrants to the U.S. Create More Jobs than They TakeA new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
How Experts Make Complex DecisionsBy studying 200 million chess moves, researchers shed light on what gives players an advantage—and what trips them up.
Yes, Consumers Care if Your Product Is EthicalNew research shows that morality matters—but it’s in the eye of the beholder.
Why Well-Meaning NGOs Sometimes Do More Harm than GoodStudies of aid groups in Ghana and Uganda show why it’s so important to coordinate with local governments and institutions.
Product Q&A Forums Hold a Lot of Promise. Here’s How to Make Them Work.The key to these online communities, where users can ask and answer questions, is how many questions get useful answers.
What Went Wrong at AIG?Unpacking the insurance giant's collapse during the 2008 financial crisis.
When Do Open Borders Make Economic Sense?A new study provides a window into the logic behind various immigration policies.
What the New Climate Bill Means for the U.S.—and the WorldThe Inflation Reduction Act won’t reverse inflation or halt climate change, but it's still a big deal.
Post-War Reconstruction Is a Good InvestmentUkraine’s European neighbors will need to make a major financial commitment to help rebuild its economy after the war. Fortunately, as the legacy of the post–World War II Marshall Plan shows, investing in Ukraine's future will also serve Europe's own long-term interests.
How Has Marketing Changed over the Past Half-Century?Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
The Political Divide in America Goes Beyond Polarization and TribalismThese days, political identity functions a lot like religious identity.