Podcast: Did That Online Sneaker Ad Entice You to Buy? It’s Hard for Marketers to Tell.
Skip to content
Data Analytics Marketing Jul 18, 2016

Pod­cast: Did That Online Sneak­er Ad Entice You to Buy? It’s Hard for Mar­keters to Tell.

Many mea­sure­ment tech­niques are flawed. Kel­logg and Face­book researchers share what can be done.

A blindfolded advertiser struggles to measure advertising effectiveness.

Yevgenia Nayberg

Based on the research and insights of

Brett Gordon

Florian Zettelmeyer

Neha Bhargava

Dan Chapsky

Listening: Facebook Advertising Effectiveness

0:00 Skip back button Play Skip forward button 15:52

Adver­tis­ers pay good mon­ey to reach poten­tial cus­tomers as they read the news on their phones, or shop on their tablets, or skim their Face­book feeds at work.

But deter­min­ing whether those dig­i­tal ads actu­al­ly lead to new pur­chas­es? That can be sur­pris­ing­ly tricky.

In this Insight in Per­son pod­cast, we chat with Kel­logg fac­ul­ty mem­bers, as well as researchers from Face­book, to find out why many of the tech­niques that adver­tis­ers use to mea­sure effec­tive­ness are so inac­cu­rate. We also learn what adver­tis­ers should be doing when­ev­er possible.

Add Insight
to your inbox.

We’ll send you one email a week with content you actually want to read, curated by the Insight team.

Pod­cast transcript

Fred SCHMALZ: What ads do you see when you look at Facebook?

[music pre­lude]

Emi­ly STONE: I have an ad about keep­ing my pets safe with an ID tag … even though I actu­al­ly cur­rent­ly have no pets. So that’s confusing.

Kendra BUSSE: Pri​ma​ry​.com. Baby stuff! I don’t have babies, but these are real­ly cute onesies.

Devin RAP­SON: I’m just get­ting a series of whiskies and deodor­ants, which just says to me they think I’m a very typ­i­cal young man. How wrong they are.

BUSSE: I’ve got an ad for a new movie with Michael Fass­binder. Face­book scored on that one!

[music inter­lude]

SCHMALZ: We know the ads that show up in our Insta­gram or Twit­ter or Face­book feeds didn’t get there by acci­dent. Maybe we are the right age and gen­der; maybe some­thing about our pre­vi­ous online behav­ior says: You might be inter­est­ed in this product.

But have you ever won­dered how effec­tive these ads real­ly are? How like­ly they are to cause us to buy a prod­uct or vis­it a site that we oth­er­wise wouldn’t have?

It turns out that deter­min­ing the effec­tive­ness of online ads is real­ly hard. Adver­tis­ers them­selves don’t always know how to do it. And the stakes are high.

Brett GOR­DON: Adver­tis­ing pow­ers the Inter­net. The mon­ey of adver­tis­ing pow­ers the Internet.

SCHMALZ: Wel­come to Insight In Per­son, the month­ly pod­cast of the Kel­logg School of Man­age­ment. I’m your host, Fred Schmalz.

On this month’s episode, we explore the sur­pris­ing­ly com­plex and ever-evolv­ing world of dig­i­tal advertising.

It’s a top­ic that two Kel­logg School pro­fes­sors recent­ly tack­led by team­ing up with sci­en­tists from Face­book. Togeth­er, they explored a very inter­est­ing ques­tion: What do adver­tis­ers actu­al­ly know about the effec­tive­ness of their cam­paigns — and what do they just think they know?

So, stay with us….

[music inter­lude]

Want to learn more? In this video inter­view, the Kel­logg and Face­book researchers dis­cuss their meth­ods and the impli­ca­tions of their work.

GOR­DON: When Inter­net adver­tis­ing came about first — and we think around 1994 was the first Inter­net dis­play ad — ini­tial­ly peo­ple thought that this would her­ald a new age of adver­tis­ing measurement.

SCHMALZ: That’s Brett Gor­don, an asso­ciate pro­fes­sor of mar­ket­ing at the Kel­logg School.

His­tor­i­cal­ly, mea­sur­ing the val­ue of adver­tis­ing has been excep­tion­al­ly difficult.

For TV ads, for instance, mar­keters can know rough­ly how many peo­ple in a giv­en mar­ket were exposed. They can also deter­mine how many peo­ple made a pur­chase. But there is no good way to tell whether these two groups of peo­ple are the same.

Mar­keters hoped online ads would be different.

GOR­DON: If I see you click, I can typ­i­cal­ly fol­low you through to some kind of out­come, like buy­ing some­thing, putting some­thing in your shop­ping cart, and buy­ing it.

SCHMALZ: That all sounds great. Dig­i­tal adver­tis­ing for the win! But, Gor­don explains, the sto­ry is rapid­ly chang­ing. It’s no longer a giv­en that adver­tis­ers will be able to track a cus­tomer along their journey.

GOR­DON: You have your lap­top, you have your desk­top, you have your work desk­top , you have your home desk­top, you have your mobile phone for work, you have your mobile phone for home, you have iPad, your Sam­sung device, et cetera, and the prob­lem is that most mar­keters don’t know if it’s one per­son across all of these devices or if it’s ten peo­ple across all of these devices.

It’s true that we can see some­one maybe con­vert on one device, but to link that to what they were exposed to on anoth­er device is very difficult.

SCHMALZ: In oth­er words, online adver­tis­ing has a lot of the same mea­sure­ment prob­lems as any oth­er platform.

One of the thorni­est of those prob­lems is deter­min­ing whether see­ing an ad actu­al­ly caused some­one to pur­chase. Maybe your friend Bob was going to buy a new hot tub any­way, and see­ing that ad had noth­ing to do with his decision.

Here’s Flo­ri­an Zettelmey­er, a pro­fes­sor of mar­ket­ing at Kellogg.

Flo­ri­an ZETTELMEY­ER: In order to deter­mine the effect of the ad, you need to know what would have hap­pened if some­body had actu­al­ly nev­er seen the ad.

SCHMALZ: So what you’d real­ly want to do, he says, is take a sit­u­a­tion — a spe­cif­ic set of cus­tomers in a spe­cif­ic set­ting at a spe­cif­ic moment in time — and then clone it, cre­at­ing two iden­ti­cal worlds, so to speak.

ZETTELMEY­ER: What we would then like to do is, if these two worlds were com­plete­ly the same, we would like to show in one world an ad and in the oth­er world we would like to not show an ad. If there is a dif­fer­ence in out­come, like a pur­chase or a web page view or some­thing else that you care about as an adver­tis­er, it has to be dri­ven by the ad since there are no oth­er dif­fer­ences between these two worlds.

SCHMALZ: But obvi­ous­ly cloning worlds just isn’t pos­si­ble — even for marketers.

So researchers in a vari­ety of sci­en­tif­ic fields have come up with a next-best alter­na­tive: a ran­dom­ized con­trolled trial.

The basic idea? Take a group of peo­ple and ran­dom­ly assign them to either a test group — which in an adver­tis­ing study would be peo­ple tar­get­ed for an ad — and a con­trol group, which is not targeted.

Zettelmey­er explains.

ZETTELMEY­ER: You see, this doesn’t exact­ly repli­cate this idea of hav­ing two cloned worlds, because obvi­ous­ly the two groups con­sists of dif­fer­ent indi­vid­u­als and so they can’t real­ly be clones of each oth­er. But by ran­dom­ly mak­ing peo­ple show up in one group or the oth­er group, we cre­ate what we refer to as prob­a­bilis­ti­cal­ly equiv­a­lent groups, and what that basi­cal­ly means in lay terms is you cre­ate as close as you can pos­si­bly get a true apples-to-apples comparison.

SCHMALZ: If the groups are big enough, this ran­dom­iza­tion should make them comparable.

Ran­dom­ized con­trolled tri­als, or RCTs for short, are con­sid­ered the gold stan­dard” for deter­min­ing causal­i­ty across the sciences.

Yet despite that, you don’t always see a lot of adver­tis­ers using them.

In fact, there isn’t real­ly a stan­dard” way of mea­sur­ing effec­tive­ness at all.

Here’s Neha Bhar­ga­va, adver­tis­ing research man­ag­er at Face­book, whose role gives her unique insight into what adver­tis­ers actu­al­ly do.

Neha BHAR­GA­VA: There’s basi­cal­ly a bajil­lion dif­fer­ent types of ways that peo­ple try to mea­sure ad effec­tive­ness, so part of it depends on what data you actu­al­ly have at your fingertips.

SCHMALZ: Build­ing a tru­ly ran­dom­ized con­trol group requires fore­sight, addi­tion­al resources, and tech­ni­cal capa­bil­i­ties that not all adver­tis­ing plat­forms have.

Remem­ber that whole track­ing problem?

Here’s Bhargava’s col­league, Face­book data sci­en­tist Daniel Chap­sky, to explain.

Daniel CHAP­SKY: If I am serv­ing an ad on some dis­play net­work and then I see the same ad both on my desk­top and on my mobile device, there’s a good chance that who­ev­er is serv­ing that ad can­not link my mobile device to my desk­top device, which imme­di­ate­ly can ruin the result of your tests because the con­trolled group might get exposed and vice versa.

So you actu­al­ly need more than just mon­ey. You need an infra­struc­tur­al change in what you can do. Right now, it’s impos­si­ble to run a high-qual­i­ty RCT on every chan­nel that you mar­ket on.

SCHMALZ: So what do adver­tis­ers do instead? Put sim­ply, they have devel­oped tech­niques to get around hav­ing a prop­er con­trol group.

One tech­nique might be to com­pare the pur­chas­es of peo­ple who were exposed to the ads to the pur­chas­es of peo­ple who were not exposed. But for a vari­ety of rea­sons, the peo­ple who actu­al­ly see an ad can be quite dif­fer­ent from those who don’t.

A more sophis­ti­cat­ed tech­nique might com­pare those who see an add to those who don’t but are sim­i­lar along a vari­ety of dimen­sions, like demo­graph­ics, or zip code, or pre­vi­ous online behavior.

Here’s Zettelmey­er again.

ZETTELMEY­ER: The hope is that you have enough infor­ma­tion about them that you’ve moved from apples-to-oranges to apples-to-apples. The chal­lenge in doing this, of course, is that you can only cre­ate an apples-to-apples com­par­i­son on the stuff you know about peo­ple. And so what these meth­ods have trou­ble deal­ing with is if an impor­tant dif­fer­ence that deter­mines how like­ly you are to buy a prod­uct turns out to be a dif­fer­ence that we don’t know about.

SCHMALZ: There are oth­er meth­ods, too.

Some rely on before – after com­par­isons: run an ad for two weeks, don’t for the next two. Or try a matched-mar­ket test, which is kind of like an RCT, except the ran­dom­iza­tion is done on mar­kets instead of indi­vid­u­als: San Fran­cis­co and Boston ran­dom­ly par­tic­i­pate in a cam­paign; New York and Chica­go don’t.

But just how good are all of these workarounds? Can any of them accu­rate­ly stand in for a ran­dom­ized con­trolled trial?

[music inter­lude]

SCHMALZ: Enter Face­book. The com­pa­ny was the per­fect research part­ner for Zettelmey­er and Gor­don in tack­ling these questions.

Because Face­book requires its users to log in, it doesn’t have the same track­ing prob­lems as oth­er plat­forms — mean­ing Face­book knows it’s you, whether you are on your phone on the train, your tablet in a cof­fee shop, or your desk­top at work. This allows it to con­duct true RCTs. And that means the researchers could com­pare how well ads per­formed in RCTs to how well adver­tis­ers thought they per­formed when they used dif­fer­ent mea­sure­ment methods.

What they found sur­prised even them. Here’s Zettelmeyer.

ZETTELMEY­ER: There was a big dis­crep­an­cy between these com­mon­ly used approach­es, these obser­va­tion­al meth­ods, and the ran­dom­ized con­trol tri­als in our studies.

SCHMALZ: In some cas­es, big dis­crep­an­cy” is quite the under­state­ment. These meth­ods might inflate actu­al esti­mates by 100, 300, even 500 per­cent. Mean­ing com­pa­nies might think their ads are 500 per­cent more effec­tive than they actu­al­ly are.

Even more wor­ry­ing­ly, the same method might be rel­a­tive­ly accu­rate in one study, but wild­ly inac­cu­rate in the next. Here’s Gordon.

GOR­DON: I think this makes it real­ly chal­leng­ing for an adver­tis­er to then take these results and gen­er­ate some kind of rule of thumb that says, well if my obser­va­tion­al meth­ods con­sis­tent­ly seem to be twice that of what the exper­i­men­tal result is, then I don’t need to run any more exper­i­ments, I can always just adjust my results by a fac­tor of two, or divide by two. But the prob­lem is, some­times it might be a fac­tor of two, some­times it might be a fac­tor of ten, and if I was an adver­tis­er, I would feel I think fair­ly uncom­fort­able hav­ing such a great uncer­tain­ty over my even­tu­al ROI.

SCHMALZ: This, of course, presents a dilem­ma for adver­tis­ers. The meth­ods they are using are inac­cu­rate, but they may not always be able to use the more accu­rate RCTs instead.

But that doesn’t mean that that all hope is lost. Bhar­ga­va points out that you might not need to run RCTs in every case. If you can run an RCT once, there may be results that you can extrap­o­late to inform oth­er strate­gic mar­ket­ing decisions.

Have a the­o­ry about the opti­mal fre­quen­cy with which to regale peo­ple like Bob with hot-tub ads? Test it.

BHAR­GA­VA: Should I show him an ad once a week, three times a week, five times a week? And so if you learn some­thing like that on one plat­form, you can start try­ing to fig­ure out how do I then use that same infor­ma­tion across oth­er plat­forms and in my mar­ket­ing more generally.

SCHMALZ: But per­haps the biggest take­away from this study is to recon­sid­er whether you can afford not to do RCTs.

Here’s Zettelmey­er again.

ZETTELMEY­ER: So our rec­om­men­da­tion would be that if you have the option of run­ning a ran­dom­ized con­trol tri­al, then you should real­ly try to do it. If you can’t do it, then you need to be as smart as you can about the meth­ods and the data that you’re going to use to sub­sti­tute for it, but you have to under­stand that you could poten­tial­ly be pret­ty far off from the true adver­tis­ing effectiveness.

SCHMALZ: And if an RCT is off the table, at least make sure you are work­ing with the best data pos­si­ble. Because while none of the alter­na­tive meth­ods that the researchers test­ed tru­ly stacked up, some did per­form bet­ter than oth­ers. And the win­ners tend­ed to take into account rich user data.

Here’s Gor­don.

GOR­DON: If I was a firm and I was think­ing about whether I put more resources into fig­ur­ing out bet­ter meth­ods or pur­chas­ing bet­ter meth­ods rather than just hav­ing bet­ter data to bring to the prob­lem, I would put my bet on data any day.

SCHMALZ: Next for these researchers is get­ting the word out about their find­ings. Because adver­tis­ers may not know what they don’t know.

BHAR­GA­VA: One of the biggest prob­lems in the indus­try is that, nor­mal­ly, regard­less of how it’s being mea­sured, results are being pre­sent­ed back as a causal result from an RCT. And so when an adver­tis­er is shown data from one depart­ment where it was from RCT and a dif­fer­ent depart­ment from the same adver­tis­er where it wasn’t, they’re being com­pared in the exact same way and assumed that they’re both causal and that it’s a true incre­men­tal value.

And so in this com­pli­cat­ed sce­nario, it’s real­ly hard for them to under­stand how to look at those results differently.

SCHMALZ: But though adver­tis­ers would ben­e­fit from pay­ing atten­tion to how var­i­ous mea­sure­ments were derived, not every­one is excit­ed to get the message.

GOR­DON: The recep­tion has been real­ly inter­est­ing. There’s been one group of peo­ple who, when we tell them what the results are, it seems to coin­cide exact­ly with what they were expecting.

Then there’s anoth­er camp, which I think was very much hop­ing that these meth­ods could be rea­son­able sub­sti­tutes. When we pre­sent­ed these results at a very large adver­tis­ing con­fer­ence, a per­son sit­ting next to me who I think worked at an ad agency just said, Real­ly great results, real­ly great research, I hope you’re com­plete­ly wrong.”

SCHMALZ: Because, of course, it takes real work to do RCTs correctly.

Plus, if the results are right, then some adver­tis­ers may not be get­ting the bang for the buck they have come to expect. And it may not just be adver­tis­ers who would pre­fer to keep their heads in the sand a lit­tle longer.

Improve­ments in the abil­i­ty to mea­sure ad effec­tive­ness could have a pret­ty huge effect on the Inter­net itself. Gor­don explains that total dig­i­tal adver­tis­ing spend­ing for 2016 is esti­mat­ed at $68 bil­lion dol­lars — and a lot of online ser­vices and web­sites depend on ad rev­enue to stay in business.

GOR­DON: A lot of news web­sites pro­vide con­tent for free; a lot of oth­er sites present con­tent for free, because they’re ad-sup­port­ed. So adver­tis­ing real­ly pow­ers a lot of the Inter­net, and unfor­tu­nate­ly, it’s unclear how much of that mon­ey would per­sist if every­one could mea­sure the effec­tive­ness of those ads as accu­rate­ly as possible.

SCHMALZ: This doesn’t nec­es­sar­i­ly mean there will be less mon­ey spent on dig­i­tal adver­tis­ing over­all, Gor­don says. That amount might even increase as mar­keters gain a bet­ter under­stand­ing of what is and is not effec­tive. But, a real­lo­ca­tion of spend­ing will like­ly impact what the web looks like for all of us.

GOR­DON: It’ll be inter­est­ing to see going for­ward how some of that maybe real­lo­cates across dif­fer­ent chan­nels, maybe back to TV or more into dis­play instead of search, and that I think will start to have an effect on the types of sites and types of con­tent that’s avail­able to just every­day users.

[music inter­lude]

SCHMALZ: This pro­gram was pro­duced by Jes­si­ca Love, Kate Pro­to, Fred Schmalz, Emi­ly Stone, and Michael Spikes.

Spe­cial thanks to Kel­logg School pro­fes­sors Flo­ri­an Zettelmey­er and Brett Gor­don, as well as Face­book researchers Neha Bhar­ga­va and Dan Chap­sky. Thanks, also, to Kendra Busse and Devin Rap­son who talked with us about the Face­book ads they see.

You can stream or down­load our month­ly pod­cast from iTunes, Google Play, or from our web­site, where you can read more on dig­i­tal mar­ket­ing, adver­tis­ing, and data ana­lyt­ics. Vis­it us at insight​.kel​logg​.north​west​ern​.edu. We’ll be back next month with anoth­er Insight In Per­son podcast.

About the Research

Gordon, Brett, Florian Zettelmeyer, Neha Bhargava, and Dan Chapsky. 2016. “A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook.” White paper, Kellogg School of Management, Northwestern University.

Read the original

Suggested For You

Most Popular


How Are Black – White Bira­cial Peo­ple Per­ceived in Terms of Race?

Under­stand­ing the answer — and why black and white Amer­i­cans’ respons­es may dif­fer — is increas­ing­ly impor­tant in a mul­tira­cial society.


Why Warmth Is the Under­ap­pre­ci­at­ed Skill Lead­ers Need

The case for demon­strat­ing more than just competence.

Most Popular Podcasts


Pod­cast: Our Most Pop­u­lar Advice on Improv­ing Rela­tion­ships with Colleagues

Cowork­ers can make us crazy. Here’s how to han­dle tough situations.

Social Impact

Pod­cast: How You and Your Com­pa­ny Can Lend Exper­tise to a Non­prof­it in Need

Plus: Four ques­tions to con­sid­er before becom­ing a social-impact entrepreneur.


Pod­cast: Attract Rock­star Employ­ees — or Devel­op Your Own

Find­ing and nur­tur­ing high per­form­ers isn’t easy, but it pays off.


Pod­cast: How Music Can Change Our Mood

A Broad­way song­writer and a mar­ket­ing pro­fes­sor dis­cuss the con­nec­tion between our favorite tunes and how they make us feel.