Associate Professor of Strategy
Associate Professor of Strategy
By now you’re familiar with the power of A/B testing: running simple experiments to see which of two variants is more effective at achieving a desired outcome. Marketers use A/B tests to pit different call-to-action messages against one another, or to find out which of two images will lead to the most sales on an e-commerce site.
But could managers use A/B tests in service of something very different—like designing the best way to motivate their employees?
Two researchers at the Kellogg School of Management think so.
In a new study, George Georgiadis and Michael Powell, both associate professors of strategy, develop a model that shows how organizations can use A/B testing to find more effective ways of structuring performance incentives. They determine that even a single A/B test can provide a surprising amount of information about how employees will respond to a range of incentive strategies. And they offer a framework for using A/B testing data to maximum advantage.
“We want to understand: If you have this kind of data, how can you use it to improve your employees’ incentive plans? How far can you go with just a simple A/B test?” says Georgiadis. “In principle, to put in place the ‘optimal’ incentive plan, you would need infinitely many experiments. What we argue is that with one experiment, you can actually go pretty far.”
This is important, he explains, because employers are understandably very reluctant to experiment with incentive schemes, as they don’t want to risk upsetting employees.
“If we’re talking about changing the way we pay people, we don’t want to do a lot of these experiments,” says Powell. “If you’re working on a website and you’re trying to figure out what color to make your button, it’s a lot easier to do a lot more tests.”
Organizations rely on a wide range of incentive schemes to motivate their employees to work hard.
Some schemes are fairly basic: think of an employee who receives a base salary as well as a bonus if a certain sales target is hit, or a transcriber paid based on the number of documents completed. Other schemes are far more complex and might involve tools such as profit-sharing or restricted stock.
But all of them involve critical decisions—with critical trade-offs. For instance, should that bonus be easily obtainable but modest? Or hard to get but very lucrative? Some employees might find the latter option more motivating; others, merely frustrating. And what of the base salary? Too high relative to the bonus and it might encourage complacency; too low and employees who prefer stability might balk.
Moreover, depending on the nature of the work, as well as employees’ individual preferences, an incentive scheme that works well in one organization might fail dismally in another. This means that, practically speaking, one of the only ways for managers to know whether there is a better incentive scheme for their organization is by modifying their existing scheme for a limited period of time—perhaps just in one part of the organization—and then seeing what actually happens to performance.
So Georgiadis and Powell set out to determine just how much employers could learn from a single tweak.
The researchers built a mathematical model to analyze interactions between an employer and its employees. The employer has an existing incentive scheme in place and collects data on how productive its employees are under that incentive scheme. Then the employer tweaks the incentive scheme in some arbitrary way—perhaps lowering the threshold for receiving a bonus, or increasing the per-piece pay—for some or all of its employees and collects data on how productive the employees are under that contract.
Then, the researchers explored how just how well the data generated from the previous A/B test could be used to create a new—and more effective—incentive contract.
“Suppose we figure out a way to get employees to work a little bit harder” under a new set of contract terms, says Powell. “We can see that, on average, this change in compensation increased output or productivity by a certain amount. But it turns out there’s a lot more information contained in that experiment. Namely, we know what happened not just to average output, but what happened to the probability of producing low output and high output. That’s very informative.”
Importantly, an employer can use the data from employees’ distribution of responses to predict how employee productivity—and by extension, the employer’s profits—will change given any change in the contract.
“If you’re willing to do an A/B test, you don’t have to know that much. You just observe how [employees] react.”
— Michael Powell
How so? For instance, by looking at the distribution of outputs in response to the two contracts, employers can learn whether an increase in the average output is driven by employees being less likely to slack versus more likely to work hard. The difference sounds subtle, but it is actually quite powerful.
If employees slacking less will increase productivity in a particular environment, “then what this tells us is that we would like to punish low output,” says Powell. So for instance, employers could pay employees less per task if they completed an unusually low number of tasks. Or employers could offer a very low base salary with the possibility of earning a bonus if employees are modestly, but acceptably, productive.
On the other hand, if employees working harder will increase productivity in a particular environment, this suggests that employers should “pay people more when high output is realized,” says Powell.
In practice, this could mean paying employees more per task if they complete an ambitiously high number of tasks, or offering an average base salary with the possibility of earning a bonus only if employees are extremely productive.
To put the accuracy of their model to the test using productivity data generated by real participants, the researchers turned to previously published data from participants who completed a simple online task under six different payment schemes.
They wanted to understand just how well their model could use real performance data from any two payment schemes to predict how participants would perform under another, completely different scheme.
The model was able to predict performance under other incentive contracts with a high degree of accuracy. “On average, the gap between predicted and actual productivity is just less than 2 percent, says Georgiadis. “Mike and I were very surprised by how accurate the predictions are.”
The researchers also used the real productivity data to test their model’s ability to design a better contract. They wondered: How close would that contract get to optimal?
They found that, on average, using data from any two contracts would enable an employer to construct a third contract that obtained just over two-thirds of the gains that it would obtain if it could design a truly optimal contract.
“You’re not doing the ‘optimal’ contract because you don’t have all the information,” says Georgiadis. Still, “in the setting of this online experiment, a single A/B test can get you two-thirds of the way to optimality.”
Powell and Georgiadis’s framework has a number of benefits that make it practical for organizations to use. For one, unlike a lot of previous economic research into incentives, it doesn’t require that the employer understand in advance anything about their employees’ preferences, such as how much they dislike working at a faster pace. It also doesn’t require them to fully understand how much effort goes into being more productive in a given work environment.
“What we’re arguing is, if you’re willing to do an A/B test, you don’t have to know that much,” says Powell. “You just observe how they react.”
Their approach can be applied to organizations of different sizes, though organizations that can run a larger experiment, which will generate more data points, will learn more from their test.
Another benefit is that the researchers’ article includes all of the steps that an organization would need to take to actually use the data from an A/B test to generate an incentive scheme that is close to optimal. This in itself is key, because the procedure is hardly obvious.
The researchers point out that this work on A/B tests was originally inspired by students in their organizational strategy class. “We used to teach basic principles of incentive theory, and we would always get the question of, ‘Well, literally, what should I do? My parents have a factory and their workers are on a piece rate. How should we change the piece rate?’ And the existing tools weren’t well suited to answer that question,” says Powell.
This tool is. And it even has a final benefit: familiarity.
“Firms nowadays use experimentation for various purposes,” says Georgiadis. “We find that it can be quite useful for designing incentives as well.”