Is Your Digital-Advertising Campaign Working?
Skip to content
Data Analytics Marketing Mar 11, 2016

Is Your Digital-Advertising Campaign Working?

If you are not running a randomized controlled experiment, you probably don’t know.

Measuring the success of a digital advertising campaign.


Based on the research of

Brett Gordon

Florian Zettelmeyer

Neha Bhargava

Dan Chapsky

Today a digital advertising campaign can reach a potential customer as she skims the news on her morning train ride, takes a break from emailing at her desk, scours restaurant reviews over cocktails, and admires a friend’s vacation photos after dessert.

A proliferation of digital platforms and devices makes this kind of campaign possible. But it also makes measuring the campaign’s success challenging.

The gold standard for measuring an advertisement’s “lift”—that is, its direct effect on a user’s probability of converting—is a true experiment, or “randomized controlled trial” (RCT), as data scientists call it. But Florian Zettelmeyer, a professor of marketing at the Kellogg School, explains, “RCTs are hard to run. They can be expensive to run. They can take a lot of coordination to run.” So many marketers rely on a litany of alternative methods, easier to implement and often capable of drawing conclusions from data that have already been collected.

Just how well do these alternative methods work?

Zettelmeyer and Brett Gordon, an associate professor of marketing at Kellogg, recently coauthored a whitepaper with Facebook researchers Neha Bhargava and Dan Chapsky in an effort to find out. The upshot: even across a single platform, and using the exact same advertising studies, these alternative methods tend to be inaccurate—sometimes wildly so.

Benefits of a True Experiment

To accurately measure the lift of an ad placed on Facebook, Google, or another digital platform, it is not enough for marketers to calculate how likely it is that someone who sees the ad will “convert,” the industry’s lingo for an event the advertiser cares about, for example, a purchase, a registration, or page visit. They also must determine whether the conversion happened because of the ad—that is, whether the ad caused the conversion. But causality is surprisingly difficult to pin down. A perfect test would require the impossible: two parallel worlds, identical except that in the first world someone sees an ad and in the second that same person sees no ad.

“The degree of variation was stunning.” - Florian Zettelmeyer

Because parallel worlds remain stubbornly unavailable to researchers, the next best thing is an RCT, where individuals are randomly divided into a treatment group, which sees the ads, and a control group, which does not. Randomization ensures that the two groups do not differ in important ways, like demographics, life style, or personality.

And yet, for a variety of reasons, many marketers do not use RCTs.

For one, until recently many platforms did not offer the capability (and some still do not). RCTs can be time-consuming to implement correctly, requiring hours of additional work from engineers and data scientists without necessarily generating any additional income. Nor is it obvious that improved accuracy would work in a given platform’s favor, as advertising is not always particularly effective. “A lot of people in the industry simply aren’t incentivized to make sure you get the right estimate,” says Gordon.

In addition, many businesses are already convinced of the effectiveness of their campaigns. “If you believe your ads work, then running an experiment looks like you are wasting money,” says Zettelmeyer.

Finally, there is a broad consensus among businesses that less costly observational methods work—if not perfectly, at least well enough. These methods offer workarounds for not having a properly randomized control group, like matching two groups of users across a variety of demographic characteristics, or comparing the same group before and after a campaign.

Do they work well enough? It is this assumption that the authors put to the test.

On behalf of clients, Facebook conducted RCTs to measure the effectiveness of twelve different advertising campaigns, each of which ran in the United States beginning in January 2015. The campaigns were large, involving over a million users each (for a total of 1.4 billion impressions), and spanned a variety of companies and industries. The Kellogg and Facebook researchers analyzed these campaigns.

Because Facebook requires users to log in across browsers and devices, the authors were able to reliably follow a person’s journey from ad to purchase, even if they were moving between phone and computer in the process. Additionally the authors were able to use anonymized demographic information in their estimations.

“They can track the two key things we care about,” explains Gordon, “which is when you get exposed to an ad, and when you convert on any of the devices.”

Powerful Forces Working Against Observational Methods

With accurate measurements from the RCTs in hand, the authors then tested a variety of observational methods to see how they stacked up.

The most straightforward observational method consists of simply comparing the conversion rates of those who see an ad and those who do not. But unfortunately, these two groups tend to differ in ways that go beyond advertising. For instance, users who rarely log into Facebook are less likely to be shown an ad—and they are probably also less likely to make online purchases, perhaps because they are less likely to be online in the first place.

“Even though the ad did nothing, the person who saw the ad is going to look like they purchased more than the person who didn’t see the ad,” says Zettelmeyer.

Moreover, advertisers put an enormous amount of effort into ensuring that ads are targeted to the people most likely to respond to them. An advertiser might initially target women between the ages of 18 and 49—but if Facebook’s ad-targeting algorithm learns that conversion rates are higher for younger women, it will fine-tune the target audience to get the most bang for their client’s buck. This further muddles efforts to discern causality when not using an RCT: Did seeing the ad make people buy, or were the people who buy simply more likely to see the ad?

“There are really, really powerful forces that make these two [exposed and unexposed] groups not the same,” says Zettelmeyer.

Indeed, the authors found that this comparison tended to wildly inflate lift: measuring it at 416%, for instance, when an RCT suggested a lift closer to 77%.

Other observational methods attempt to counteract these powerful forces: comparing the conversion rates of exposed and unexposed users only after the groups have been matched for a variety of traits, using sophisticated “propensity scoring” to adjust for differences between the groups, conducting matched-market tests, or comparing conversion rates for the same group of users before and after a campaign.

But when put to the test, no clear winner emerged. Moreover, not a single observational method performed reliably well.

“Sometimes, in some studies, they do pretty well,” says Gordon. “In other studies they don’t just perform a little bit badly, they do horribly.”

In fact, just how poorly the observational methods fared was a surprise to even the authors. This was particularly true for the promising matched-market test, where large, demographically similar geographic markets are paired up, and one is randomly assigned to be targeted in a campaign, while the other is held as a control. In a sense, the matched-market test is a true experiment, but at the level of the market, instead of the individual.

And yet it produced results that were overly dependent on which matched market ended up in which condition.

“The degree of variation was stunning,” says Zettelmeyer.

No Good Substitutes

The results suggest that, in the absence of an RCT, it is difficult to determine an ad’s lift with any degree of accuracy. It is also nearly impossible to predict in advance just how inaccurate a particular observational technique will be. This means that companies cannot get away with running a single RCT, determining how much their favorite way of measuring is “off”—by a factor of two, for instance—and then adjusting their future measurements by that amount.

“Dividing by two is almost certainly wrong over time and across studies,” says Zettelmeyer. “It’s not constant. It’s not a good rule-of-thumb.”

The results also highlight the challenge of finding an appropriate control group outside of the context of a true experiment—a takeaway that applies beyond marketing.

As academic researchers, “we’re used to making lots of assumptions when we create models,” says Gordon. “What we’re not always forced to do is to really think long and hard about whether the assumptions are actually correct or not.”

The authors acknowledge that observational methods are not going anywhere. Nor should they, as they work well in many settings outside of advertising. Their sheer convenience allows them to provide data scientists with larger and richer datasets than RCTs are likely to provide anytime soon.

“It just happens to be that, despite our best efforts at this point, we can’t say that for advertising measurement these methods actually work well as a substitute for running RCTs,” says Zettelmeyer.

His takeaway for marketers? “Look, we understand that in many cases you can’t run an RCT. But please, if you can run it, for heaven’s sake, do.”

Featured Faculty

Professor of Marketing

Nancy L. Ertle Professor of Marketing; Faculty Director, Program on Data Analytics at Kellogg

About the Writer
Jessica Love is editor in chief of Kellogg Insight.
About the Research

Gordon, Brett, Florian Zettelmeyer, Neha Bhargava, and Dan Chapsky. 2016. “A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook.” White paper, Kellogg School of Management, Northwestern University.

Read the original

Most Popular This Week
  1. One Key to a Happy Marriage? A Joint Bank Account.
    Merging finances helps newlyweds align their financial goals and avoid scorekeeping.
    married couple standing at bank teller's window
  2. Take 5: Yikes! When Unintended Consequences Strike
    Good intentions don’t always mean good results. Here’s why humility, and a lot of monitoring, are so important when making big changes.
    People pass an e-cigarette billboard
  3. How Are Black–White Biracial People Perceived in Terms of Race?
    Understanding the answer—and why black and white Americans may percieve biracial people differently—is increasingly important in a multiracial society.
    How are biracial people perceived in terms of race
  4. Will AI Eventually Replace Doctors?
    Maybe not entirely. But the doctor–patient relationship is likely to change dramatically.
    doctors offices in small nodules
  5. Entrepreneurship Through Acquisition Is Still Entrepreneurship
    ETA is one of the fastest-growing paths to entrepreneurship. Here's how to think about it.
    An entrepreneur strides toward a business for sale.
  6. Take 5: Research-Backed Tips for Scheduling Your Day
    Kellogg faculty offer ideas for working smarter and not harder.
    A to-do list with easy and hard tasks
  7. How to Manage a Disengaged Employee—and Get Them Excited about Work Again
    Don’t give up on checked-out team members. Try these strategies instead.
    CEO cheering on team with pom-poms
  8. Which Form of Government Is Best?
    Democracies may not outlast dictatorships, but they adapt better.
    Is democracy the best form of government?
  9. What Went Wrong at AIG?
    Unpacking the insurance giant's collapse during the 2008 financial crisis.
    What went wrong during the AIG financial crisis?
  10. The Appeal of Handmade in an Era of Automation
    This excerpt from the book “The Power of Human" explains why we continue to equate human effort with value.
    person, robot, and elephant make still life drawing.
  11. 2 Factors Will Determine How Much AI Transforms Our Economy
    They’ll also dictate how workers stand to fare.
    robot waiter serves couple in restaurant
  12. When Do Open Borders Make Economic Sense?
    A new study provides a window into the logic behind various immigration policies.
    How immigration affects the economy depends on taxation and worker skills.
  13. Why Do Some People Succeed after Failing, While Others Continue to Flounder?
    A new study dispels some of the mystery behind success after failure.
    Scientists build a staircase from paper
  14. Sitting Near a High-Performer Can Make You Better at Your Job
    “Spillover” from certain coworkers can boost our productivity—or jeopardize our employment.
    The spillover effect in offices impacts workers in close physical proximity.
  15. How the Wormhole Decade (2000–2010) Changed the World
    Five implications no one can afford to ignore.
    The rise of the internet resulted in a global culture shift that changed the world.
  16. What’s at Stake in the Debt-Ceiling Standoff?
    Defaulting would be an unmitigated disaster, quickly felt by ordinary Americans.
    two groups of politicians negotiate while dangling upside down from the ceiling of a room
  17. What Happens to Worker Productivity after a Minimum Wage Increase?
    A pay raise boosts productivity for some—but the impact on the bottom line is more complicated.
    employees unload pallets from a truck using hand carts
  18. Immigrants to the U.S. Create More Jobs than They Take
    A new study finds that immigrants are far more likely to found companies—both large and small—than native-born Americans.
    Immigrant CEO welcomes new hires
  19. How Has Marketing Changed over the Past Half-Century?
    Phil Kotler’s groundbreaking textbook came out 55 years ago. Sixteen editions later, he and coauthor Alexander Chernev discuss how big data, social media, and purpose-driven branding are moving the field forward.
    people in 1967 and 2022 react to advertising
  20. 3 Traits of Successful Market-Creating Entrepreneurs
    Creating a market isn’t for the faint of heart. But a dose of humility can go a long way.
    man standing on hilltop overlooking city
More in Data Analytics