Featured Faculty
Charles E. Morrison Professor of Decision Sciences; Professor of Operations; Co-Director of MMM Program
Jesús Escudero
Are you A/B testing right?
At first glance, the question hardly makes sense. After all, the whole point of A/B testing is to land on the “right” approach. Businesses expose consumers to Option A or Option B—say, for a landing page or digital ad—then measure which performs better on a key variable, such as engagement or purchase size. This helps in making informed decisions about the future.
But while this has become standard practice across industries, a key question remains: Is A/B testing carried out in the most efficient, cost-effective way?
That’s a question of interest to Achal Bassamboo, Kellogg professor of operations, and Vikas Deep, a Kellogg doctoral student advised by Bassamboo. “Companies like Google, Amazon, and many others run thousands of A/B experiments to optimize their product design,” Deep says. “And the testing is very costly.”
So Bassamboo, Deep, and Sandeep Juneja at Ashoka University developed a mathematical model to understand the most efficient way—that is, the way using the fewest consumer observations—to determine whether one option works better than another and by how much.
They found that by examining the variation in how consumers respond to both options, decision-makers can reduce the number of consumers needed significantly, drastically reducing costs. “How beneficial could it be?” Bassamboo says. “It could bring the number of observations needed down by 50 percent.”
“We’ve solved the problem of minimizing the cost or length of experiment while still having confidence in the A/B test results,” Deep says.
When running A/B testing, the most common policy has been to implement a randomized controlled trial (RCT), where consumers are randomly assigned to option A or B—such as different landing pages, as mentioned earlier—with equal odds of assignment to either. “The simplicity of this policy is you don’t have to think about anything when assigning the consumer to one option versus another,” Bassamboo says. “It’s a static policy.”
But the researchers wanted to look at a different assignment rule, one that takes into account a key factor: variation on the measure of interest. In the example, it might be engagement, as measured by how much time the consumer spends on the website after seeing one of the landing pages. So, is the variation in engagement different between the two pages—and can that difference in variation be harnessed to more efficiently allocate users to each page?
“How beneficial could it be? It could bring the number of observations needed down by 50 percent.”
—
Achal Bassamboo
To answer that question, the researchers built a mathematical model of A/B allocation that takes into account that key variation factor. “It takes the arrivals coming in and tries to learn something about variation on each of the arms to understand how to allocate consumers to the options in the future,” Bassamboo says. “So it’s adaptive.”
According to their model, this “smarter,” adaptive approach can reduce the length or size of the experiment by as much as 50 percent, saving significant cost. “If you have the money, it’s fine to go with an RCT,” Bassamboo says. “But this is a way to use your budget for experimenting more carefully.”
More specifically, RCTs perform well when the standard deviation, or amount of variation surrounding the mean, in observations from the two options is similar. But if Landing Page Option A, to continue our example, results in a wide spread of engagement time (perhaps resonating with some users while frustrating others) while Option B yields a much narrower distribution, then the researchers’ adaptive policy is best. That’s because it will assign a higher proportion of site visitors to the option with more variability—Option A, in this case—to better estimate which landing page yields more time on the site and by how much.
The researchers stress that, to utilize the proposed policy, the only change needs to be the proportion of incoming consumers assigned to the two options.
The researchers also point out that using the adaptive model for A/B allocation optimally requires knowing what you’re trying to learn from the testing. “It all depends on what question is at the heart of the puzzle you’re trying to solve,” Bassamboo says.
Specifically, are you simply trying to understand which option is better—Landing Page A or B? Or is your goal to get an accurate measure of how much better a given option is?
“Neither type of question is always harder to answer than the other,” Bassamboo notes, and pursuing one question will eventually get you to the other. But given your goal, the optimal approach “may vary quite a bit.”
In most settings, he explains, if the goal is an accurate measure of how much better Option A is relative to B, the amount of observations allocated to option A will be proportional to the standard deviation of the outcome of Option A. However, “this might be suboptimal if the objective is to simply find the better option,” he says.
“You need to optimize the allocation policy for the objective you have in mind to use it to full effect,” Deep says.
Sachin Waikar is a freelance writer based in Evanston, Illinois.
Deep, Vikas, Achal Bassamboo, and Sandeep Kumar Juneja. 2024. “Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing.” ICML.