Marketing Jan 1, 2024
Here’s a Cost-Effective Way to Tell If Your Digital Ads Are Working
Running even a small number of experiments can reveal a lot, a new study finds.
Summary Marketers often struggle to understand the return on investment from digital ads run on platforms like Google and Facebook. The best way to get at the causal impact of digital ads is though randomized controlled trials (RCTs), but they are costly to run at scale. New research finds a way around this problem by combining other industry-standard ad-measurement techniques with data from a very small number of RCTs. The approach enables advertisers to predict incremental ad impact for campaigns that are not implemented as RCTs.
A fiendishly difficult problem.
That’s how Florian Zettelmeyer describes a key challenge businesses face with their marketing efforts: understanding the return on investment from advertising, especially for digital ads run on platforms like Google and Facebook.
“Advertisers fundamentally want to know what happens to somebody who sees the ad compared to somebody who doesn’t—that’s the causal effect of the ad, which directly translates to return on investment for the money put in,” explains Zettelmeyer, a professor of marketing at the Kellogg School of Management. “But the problem is that because of algorithmic targeting, the people who see the ads are super different from those who don’t.”
He and fellow Kellogg professor of marketing Brett Gordon have long recognized that traditional advertising-measurement techniques won’t work because of the confounding effects of ad targeting online. For example, new car ads might be targeted to people who’ve recently searched online for specific car models or features, which suggests these consumers are already considering a purchase and makes the impact of the ad itself difficult to tease out.
The best way to get at the causal impact of digital ads is though randomized controlled trials (RCTs), in which a randomly chosen group of consumers is shown an ad and is compared with a randomly chosen control group that doesn’t see the ad. The difference in response between these groups reflects the ad’s impact. But RCTs are costly to run at scale because advertisers have to exclude large numbers of potential buyers from being exposed to their campaign by placing them in control groups for each ad. “You can lose a significant share of the addressable audience to control groups,” Zettelmeyer says.
In a recent study, Gordon, Zettelmeyer, and collaborator Robert Moakler of Meta devised a potential solution to this problem. They designed and tested a model that predicts the causal impact of an ad based on a small number of RCTs by combining those results with data on industry-standard measures like last-click conversion counts. That enables advertisers to predict incremental ad impact for campaigns that are not implemented as RCTs.
“Our model allows advertisers to use the data they do have for every campaign to predict how well the campaign would do in terms of its true causal effect,” Zettelmeyer says.
Predicting ad impact
It turns out that just a small number of RCTs can take advertisers pretty far when it comes to estimating the return on advertising dollars.
Gordon, Zettelmeyer, and Moakler reached that conclusion by studying data from nearly 1,000 RCTs run on a random subset of Facebook ads from November 2019 to March 2020 that targeted at least 1 million users. Products advertised included those from retail, financial services, consumer goods, and other sectors. The researchers focused on conversions across the sales funnel measuring outcomes such as viewing an advertiser’s website, adding an item to a digital cart, or making a purchase.
The researchers then took that RCT data on ad effectiveness—the most reliable estimate of conversions caused by an ad—and compared it with the effectiveness predictions generated for those same ads via leading proxy measures, such as last-click. When considering ads on a single platform, such as Meta, last-click measures refer to counting purchases or other outcome events that occurred within a set timeframe, such as one day or week, from the last exposure to an ad on that platform.
Their goal was to calibrate the effectiveness estimates given by proxy measures, which are less reliable, against RCT results, which are much more reliable. If the RCT results and other measures correlated in predictable ways, it would allow advertisers to get highly accurate assessments of ad efficacy without having to run RCTs on every ad.
“Our model allows advertisers to use the data they do have for every campaign to predict how well the campaign would do in terms of its true causal effect.”
And that’s exactly what they discovered. The model showed that measures like last-click generally tended to over- or underestimate effectiveness—but in a reliable way. That means that, even though those proxy measures do not always have much predictive value on their own, they can be very useful once they are correlated to RCT results using even simple statistical measures.
That finding lets advertisers and platforms gauge ad effectiveness using just a small number of RCTs. The RCTs are used to generate the calibration factor, which is then applied to the estimates generated from last-click and other measures.
Gordon points out that many advertisers have already been thinking along these lines: “They might run an RCT and estimate an incremental effect, but then their last-click metric shows that it’s double that. ... So they take all their last-click measures and divide them by two, even when they don’t run an experiment. But it’s not systematic or rigorous.” The approach the researchers formulated delivers that needed rigor.
A convenient, cost-effective approach
The good news for advertisers and platforms is that most already have usable data from proxy metrics like last-click on hand. This means that they may not have to spend much on RCTs to make impact predictions at scale.
“The last-click and other proxy metrics are simple and intuitive to most advertisers,” Gordon says. “They have very low data or analytic requirements, and even as privacy policies change, they are typically going to retain some access to these metrics.”
So how many RCTs do you need to deliver good ad-impact predictions?
It depends on whether you are an advertising platform or advertiser. “For the platforms, we’d suggest running thousands of RCTs and using a machine-learning model to make predictions. And if you are one advertiser, then we suggest you run at least a handful of RCTs, and then see whether you can establish a very simple calibration factor between the RCT result and the last-click result,” Zettelmeyer says.
Similarly, a consortium of advertisers could pool RCT and last-click results from multiple ad campaigns to create a shared predictive model yielding calibration factors to apply to proxy measures of ad effectiveness.
As Zettelmeyer concludes, “We want advertisers to know that to improve their marketing measurement, they don’t need to run randomized experiments all the time but should run them some of the time. It won’t provide the same precision if they ran an experiment for everything, but [it gets you much of the way there] with a heck of a lot less effort and costs.”
About the Writer
Sachin Waikar is a freelance writer based in Evanston, Illinois.
About the Research
Gordon, Brett, Robert Moakler, and Florian Zettelmeyer. 2023. “Predictive Incrementality by Experimentation (PIE) for Ad Measurement.” Working paper.