There is a new media type emerging at the intersection of computer generated images, video, and voice. At present, the results are holographic pop stars that hold concerts in the real world, and Instagram celebrities that don’t actually exist but post selfies with “real” friends. Today the applications are entertaining, but we are also starting to see the first examples of how this new technology can be used to create forged videos and audio clips — showing people saying things they didn’t actually say, and even doing things they didn’t actually do.
This new type of media, with its positive and negative uses, will proliferate as the technology to create it becomes more accessible. At betaworks, we’ve been calling this category Synthetic Media.
Synthetic media is not new. The first photograph was taken in 1826 by Joseph Nicéphore Niépce. In 1902, Levin C. Handy created General Grant at City Point, a composite of three photos taken circa 1864 which combine to create the single photo below, which never actually happened. The Library of Congress notes that “this photograph is a montage or composite of several images and does not actually show General Ulysses S. Grant at City Point.”
General Grant at City Point is a composite of three different images
Is this a manipulation? A forgery? A piece of art? Synthetic media isn’t only created for art and entertainment, although it often starts out that way. It was employed purely for its utility in 1982 when National Geographic altered a photo that depicted the Egyptian Pyramids in order to fit onto its magazine cover. In retrospect it may seem innocuous, but at the time, altering the appearance of the pyramids stirred controversy over how photo manipulation technology should be used. The most nefarious may be for political power, perhaps the most famous being Stalin in Soviet Russia.
Regardless of intent, during the time that the technology is being built and adopted, there is gap between the technological ability to create believable forgeries and the public’s awareness that it’s possible. This means that people may not even know to ask the question about whether what they’re looking at might be a forgery.
Photoshop made it dramatically easier for anyone to forge a photo, but a decade and a half later, the public was still consuming — and media outlets were still reporting — information that would later turn out to be forged. We are entering a similar gap today, where the tools exist to create synthetic media, but it’s unclear that individuals and media outlets are prepared to be sufficiently skeptical of the content they come across, share, and endorse as authentic.
This gap exists because at first, we don’t realize it’s possible to fake a piece of content. The more forgeries that get created and are subsequently exposed, the more people will become aware they should be asking the question of whether given media is authentic.
Early on, we aren’t aware how easy it is to create a forgery
Information consumers often outsource authentication to the media outlets presenting the piece of content, which means that if the technology gets better at deceiving publishers, not only do the images get out, but they’re lent additional credibility by the publishers presenting them.
While we often turn to media outlets to verify the authenticity of a photo or other piece of content, publishers experience this gap as well. It takes longer for media outlets to catch up than one would think, and this amplifies the gap.
As an example, Photoshop was invented in 1990, but this authenticity awareness gap was still present 14 years later. In 2004, a photo surfaced of John Kerry and Jane Fonda together at Rally protesting the Vietnam War. It was later revealed that the picture was a forgery, but the New York Times article Conservatives Shine Spotlight on Kerry’s Antiwar Record featured the photo and talked about its relevance to the election:
“Even so, Jane Fonda still draws the ire of some veterans. She earned the nickname Hanoi Jane for her 1972 trip to North Vietnam, where she criticized the United States government over Hanoi Radio…The photograph with Mr. Kerry was taken two years earlier. But it brings up deep memories…”
This article is not about whether the photo was real, but about the cultural significance of the photo given that it was real. This marks what may be the peak of the authenticity awareness gap for forging photos.
A composite photo the NY Times wrote about assuming it was real.
Importantly, the Kerry photograph does not implicitly carry its own meaning. Media outlets, and ultimately the public, had to fill in the significance of a photo of John Kerry with Jane Fonda. In a Meme, the intended message is not transmitted so subtly.
I don’t think many would consider Memes to be forgeries on their face. A meme is an example of very simple synthetic media that, for the purpose of entertainment, attributes a quotation to someone that the person usually didn’t actually say.
What awards *hasn’t* Adele won, amirite?
A key characteristic of a Meme is that the image stays the same, and the implication (again, only for entertainment) is that the quotation is being said by the person — it’s designed to communicate information, not just the image in the background.
The lines between what we think of “photoshopping” and “meme-ing” have already started to blur. “Meme-ing” a tweet is now trivially simple with sites such as Twitterino, which lets users “create fake tweets with any user.” As more people use this to try to perpetuate hoaxes, sites pop up to fact check. This begins to narrow authenticity awareness gap. There is now an entire section on the site Snopes to learn if a tweet was actually tweeted by the real account. Note that this is not whether the information in the tweet is incorrect, but whether the person actually tweeted what he or she is purported to have shared.
Thanks Bill!
The evolution brings us to a new type of synthetic media. Lil Maquela is an instagram star with over a million followers. She posts photos with “real” people but she’s no more real than a Walt Disney character. She posts selfies and photos with other influencers. Her creators don’t pretend that she is real, and it’s not clear that her followers care that she doesn’t actually exist:
The skateboarder in the center is synthesized celebrity Lil Macquela
Hatsune Miku is a Japanese pop star who appears as a hologram in concert venues across the world. Thousands of people pay to see her “live” in concert (interestingly, her opening act is a human):
What It’s Like to Attend a Hatsune Miku Concert.
One more. Maybe just stop reading this and go watch this video of her concert entrance.
Synthetic celebrities are seemingly innocuous. Their existence is somewhere between cartoon character and professional wrestler. They’re conjured from their creator’s imagination in a fashion as Walt Disney created Mickey Mouse. However, like professional wrestlers, their characters transcend a single medium, extending from TV appearances to Instagram accounts. But wrestlers are “powered” by real people. Synthetic celebrities are somewhere in the middle, not powered by actual people, but as active on social media as if they actually exist.
We are accustomed to forgeries of single channels — we expect that images may be altered, even audio may be cut, clipped, and stitched together to tell a different story. But we still trust video that we see with our own eyes.
When someone does an interview, it’s more difficult for a producer to rearrange a video clip to communicate a materially different message than it is to, say, take a quotation out of context. However, the technology to synthesize video and voice is moving out of academia and becoming accessible to anyone who can code. It won’t take long for the software used to create these to go as mainstream as Photoshop.
As that happens, the authenticity awareness gap increases until we learn to develop the commensurate skepticism.
Researchers at the University of Washington used a neural network to recreate Obama’s mouth, and used existing audio of the president speaking in other contexts. The combination was a brand new video of Obama giving a speech whose audio was real, but whose visual elements were forged:
Actor and producer Jordan Peele teamed up with Buzzfeed to create a “public service announcement” highlighting the ability for anyone who can do an impression of someone’s voice to be able to make a forged video:
While the University of Washington project synthesized the video component of the former President, technologists have also improved the ability to mimic specific voices. This is now commercialized to the point where Lyrebird says it “allows you to create a digital voice that sounds like you with only one minute of audio.” Here is a tweet from Lyrebird that includes a synthetic version of Donald Trump’s voice:
Imagine combining video synthesis with voice synthesis to create an entirely new video showing something that never took place. Although not entirely convincing, Lyrebird has done exactly this with a computer generated advertisement that appears to show Obama endorsing their product:
During this new authenticity awareness gap, the technology will progress, making videos like the ones above will become more convincing, and it will be easier than ever for anyone to combine video and audio to create a multi-channel forgery. So how do we combat it? Technology and Legislation are two tools that have been employed in the past, with various degrees of efficacy.
This weekend, The Washington Post reported that Facebook profile forgeries are being created. A user found her entire profiles was copied with “her full name, photos, home town and old workplace.” The Post says that Facebook has offered “new facial-recognition technology to spot when a phony profile tries to use someone else’s photos.”
Mind The Gap
The type of computer vision technology Facebook and others are building will also likely be able to help recognize the difference between a real video and a synthetic one. While this helps, it’s just a technological game of cat and mouse.
Other technology-based approaches come from those in the crypto community, who are discussing how blockchain technology might help authenticate digital content, from copyrights to certificates of authenticity to proof of ownership. As the technology matures, there may be new approaches to verify that a file hasn’t been altered similarly to how some software is distributed and validated with a hash. I’m skeptical of the current proposed approaches, especially as applied to content from a pre-digital era such as the Kerry photo.
Legislation such as the Truth in Advertising Act of 2014 tries to address the issue from a legal perspective. Its goal was to rein in ads featuring “images that have been altered to materially change the appearance and physical characteristics of the faces and bodies.” This would target commercial photoshopping and forgeries of the future from a legal perspective. But legislation doesn’t move as quickly as technology.
Legislation and technology can be useful tools, but ultimately it’s an awareness gap, so public awareness is probably the most likely strategy to lessen the inevitable impact of synthetic media.
So far, media outlets haven’t yet seriously reported on a video that turned out to be synthetic. If that happens, we could see a repeat of the Kerry photo of 2004, where it’s discovered quickly but disseminated even faster. In the mean time, we’ll begin to experience the authenticity gap as entertainment, from Jordan Peele to Lil Maquela.
I’m a partner at betaworks ventures. You can find me on twitter @matthartman or (and?!) subscribe to my newsletter on voice interfaces hearingvoices. More info at hrt.mn