Organizations Politics & Elections Jul 1, 2024
How to Spot Political Deepfakes
AI literacy—and a healthy dose of human intuition—can take us pretty far.
Yevgenia Nayberg
In the United States, the presidential election season is upon us. Voters can expect more of the usual: mailboxes jammed with fliers, nasty television attack ads, chummy-sounding text messages from down-ballot candidates asking for money.
But this election season, we may also have to contend with something new. In January, an AI-generated robocall purporting to be President Joe Biden asked New Hampshire voters to stay home for the primary; if deepfakes become a bigger part of the election landscape, it could sow confusion in an already fractured information ecosystem.
The technology used to generate these deepfakes is improving every day. Matt Groh, an assistant professor of management and organizations at the Kellogg School, is concerned about helping people distinguish between what is real and fake online and avoid getting fooled by deepfakes. In research, he has found that people are adept at identifying deepfakes when they are paying attention.
Identifying deepfakes
The president of the United States is arguably one of the most powerful people in the world. Given the weight his word carries, one would hope that American citizens could reliably recognize him when he speaks. But as the capacity to fake videos gets more sophisticated, Matt Groh wanted to test how well people were able to identify political deepfakes.
In a peer-reviewed paper in Nature Communications, Groh and his coauthors Aruna Sankaranarayanan, Nikhil Singh, Dong Young Kim, Andrew Lippman, and Rosalind Picard from the MIT Media Lab describe a dataset they created for studying deepfakes, which consists of 32 short speeches by President Joe Biden and former President Donald Trump. Of the speeches, 16 are real and 16 fake, and each is about 21 seconds long. Over five experiments, 2,215 participants were randomly served the content of the speeches in seven possible formats: a transcript, an audio clip, a silent video, an audio with subtitles, a silent video with subtitles, a video with audio, and a video with audio and subtitles.
In one experiment, the researchers found that participants presented with just a transcript—the message alone—had the lowest level of accuracy, guessing correctly 58 percent of the time, or only a little better than a coin toss. Those trying to discern real from fake using silent videos with subtitles were 62 percent accurate. Those working with audio only clocked in at 65 percent accuracy. Those viewing video with audio and no subtitles scored the highest accuracy, at 74 percent. “How something is said matters a lot,” says Groh. “The medium in which you communicate is going to matter, in addition to the content.”
Another experiment was designed to test how much the baseline proportion of fakes—think of this as a media ecosystem either relatively free of, or flooded with, fakes—affects people’s judgment. The researchers found that whether they reduced the proportion of deepfakes in the dataset to just 20 percent, or ratcheted it up to 80 percent, participants still performed similarly, with their accuracy increasing as they had access to additional communication modalities such as audio and video.
At their best, when viewing audio with video, humans are around 74 percent accurate, making humans considerably more accurate at weeding out deepfakes than algorithms such as the winner of the Deepfake Detection Challenge.
One advantage humans have over models is the capacity for critical thinking and broad context, Groh notes. So no matter how effectively a deepfake-sniffing benchmark dataset is designed, it is always playing catchup with the next iteration of deepfakes. That suggests that instead of only focusing on training algorithms to spot deepfakes—fighting fire with fire, you might say—we might be better off honing our critical-thinking faculties, in order to fight fire with water.
You can test your own deepfake-detection skills on the latest research Groh and the Human–AI Collaboration lab are working on here.
Read moreIn research he’s conducted with deepfaked videos of Biden and Trump (see sidebar), most people were fairly accurate in distinguishing the real from the fake.
Here, Groh demystifies deepfakes. He explains why we’re unlikely to see a deluge of them anytime soon and offers advice on how to spot them in the wild.
What’s a deepfake, anyway?
The term “deepfake” is a portmanteau of “deep learning” (a reference to a method of artificial intelligence that learns patterns through multilayered data sets) and “fake” (the simulated product of such learning).
So, what exactly counts as a “deepfake”? According to Groh, there’s no precise agreed-upon definition. “It’s a bit of a vibe,” he says. “My working definition is ‘AI-generated media (often a video) that makes someone appear to do or say something they haven’t done,’” but many people simply think of it as the manipulation of reality in media.
But Groh is quick to point out that, while the term is new, the “fake” part has been with us for decades.
Joseph Stalin infamously airbrushed out his enemies and had his pockmarked skin smoothed over in photographs. National Geographic doctored an image of the pyramids of Giza on its cover. Abraham Lincoln’s face was transposed by a creative engraver onto the body of John C. Calhoun. Last year, Slovakian voters heard deepfaked audio of a candidate apparently talking about rigging votes and raising the price of beer.
Understand the scope of political deepfakes
Still, while technology is undeniably making it possible to generate deepfakes, the idea that the internet is currently awash with difficult-to-detect fake imagery and videos is simply not true, Groh says. That’s because convincing video deepfakes are still extremely challenging to make, requiring an immense amount of time, resources, and skill. Face swapping—using technology to swap the faces of people in photos or videos—is relatively straightforward, but creating realistic video deepfakes (beyond talking-head videos like those produced in HeyGen) means a lot of things have to align.
“It’s not just, ‘Oh, here’s a video. I’ll throw it in the algorithm; now it’s a deepfake, and it’s done,” Groh says. “There are many human elements that go into the process.”
For example, even for a voice-only deepfake such as the Biden robocall, a scammer would have to begin with the right audio data set—clips that don’t have background noise and which are spoken in just the right tone—in order to be able to generate a convincing deepfake.
To show how complicated this process is, Groh points to the series of Tom Cruise deepfakes on a TikTok account, deeptomcruise, that now has more than five million followers. They feature a Tom Cruise look-alike actor, months of work training the model on a giant dataset of Tom Cruise’s acting and media appearances, visual effects, and the work of going frame by frame to clean up any inconsistencies, on top of the deepfake algorithm.
“If we understand how these things are created, we can also understand how difficult it is to do so and how much human effort is required for persuasive deception,” Groh said. “If perfect deepfakes take so much effort to create, it’s not going to be a deluge of synthetic media indistinguishable from reality like many people expect.”
Trust your intuition, slow down, and consider context
We may all be familiar with the quote attributed to the 17th century clergyman Thomas Fuller, that “seeing is believing.” But the second half of that quote, which often gets left out, may be even more instructive in the quest to sniff out deepfakes: “Seeing is believing, but feeling is the truth.”
The very act of slowing down to watch or listen to online media more closely gives people a chance to tap into their intuition and reduce the chances of taking the bait on a deepfake.
In a video from the AI software Sora, a short clip generated from the phrase “woman walking down the street of Tokyo” looks startlingly real, until a moment about 15 seconds into the video where the woman’s legs do an odd (and physiologically impossible) gliding swivel as she strides. That, he explains, is an example of how an AI tool that arranges pixels based on rules of pattern detection will ignore the limits of reality. Knowing this makes these blips easier to spot.
“We know rules about how a human probably should behave, whether it’s socially or physiologically or whatever else, but we also know that the model doesn’t necessarily know those rules—it just knows the patterns of those rules,” Groh says. “When that funny business emerges, that’s where being a human with common sense actually comes in handy.”
There have always been liars in the world, Groh says, and we’ve always had to use our human faculties to spot them. One takeaway, then, is that simple verbal or textual content—what you say—may actually be less instructive for spotting deepfakes than the nonverbal and visual cues around that messaging.
“Humans interact in many different ways,” Groh says. “Experience is also how you smell things, how you taste things, how you hear things, how you think critically about things. It’s all those different things that are going to help us construct our reality and recognize reality from fabrication.”
Understand how deepfakes work, including by making your own
When Groh teaches his class on artificial intelligence, he starts by giving his students an easy-to-grasp definition of AI: “solving problems with computers.” “Well, isn’t that basically everything?” a student might counter. That’s the whole point, Groh says—since they’ve presumably worked on computers before, they’re less intimidated by AI technology and more critical of AI marketing.
Similarly, the more time one spends with deepfake technology, the more apparent its limits become. This is why Groh thinks older people may be less adept at spotting deepfakes than their younger peers, who are more likely to have grown up using face swap tools, or apps such as Facetune, which can edit photos or videos.
“Our age is related to how we interact online and what we consume,” Groh says.
Digital literacy training and teaching people how to use AI tools and letting them play around with them can help them understand what the tools are capable of and where they fall short, making people better at spotting fakes when they arise. Hands, for example, are much harder to fake than faces, since there are so many images of faces online that can be used to train AI models, but fewer good images of hands. So, one way to flag deepfakes is to examine the hands of people in the videos for anything unusual—from impossibly long palms to surplus fingers.
“The more that we add AI literacy to media literacy, the less that people are going to be duped,” he says. With coauthors Negar Kamali, Karyn Nakamura, Angelos Chatzimparmpas, and Jessica Hullman from Northwestern University, Groh released a training guide for distinguishing AI-generated images from authentic photographs.
And don’t forget to lean heavily on basic media literacy and critical-thinking skills, too. “What’s the source? What are they trying to convince me of? Why might this be real or fake?” Groh says, “are all things to consider when confronted with political content.”
Anna Louie Sussman is a writer based in New York City.