Here's The AI Dubbing That's Raising Alarms World Over

Recently, the AI dubbing of Argentina President Javier Milei’s speech at the 2024 World Economic Forum by video startup Heygen went viral on social media.

The software not only accurately translated his Spanish words into English but also seamlessly handled lip sync, a task traditionally exclusive to human professionals in the video industry.

Although some observers claim that Heygen’s version of Milei sounds a bit like a 'Bangladeshi guy after a decade in Swansea,” I would dismiss these rumors. What is hard to ignore are the mechanical voice and intonation of the AI-powered Argentina President.

Frankly, the real Milei sounds better and more emotionally expressive than that.

https://www.youtube.com/watch?v=YtegqgKYR-U&embedable=true

Until recently, emotion and intonation have been the biggest challenges for AI startups aiming to disrupt the media industry. Humans are still better at expressing passion, sadness, or anger through their voice. But it looks like things are about to change.

Another recent showcase that went largely unnoticed came from an Amsterdam-based AI dubbing startup called Dubformer. The company claims it developed technology for translating emotions and intonations in songs.

You can judge for yourself:

https://www.youtube.com/watch?v=VMBLs2Zr9NY&embedable=true

The startup localized the most famous version of "House of the Rising Sun," recorded in 1964 by the British rock band The Animals. It looks like this showcase has a human-like quality in capturing the essence and emotional depth of musical expression.

According to Dubformer’s CEO Anton Dvorkovich, the company relies on its in-house proprietary technology, including Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Voice Biometrics.

Micah Berkley, an AI implementation specialist, solutions architect, and educator, said Dubformer’s technology is shaping a future where AI expands the global reach of artistic expression.

Personally, I just can't wrap my head around the idea that AI voices or translations could ever match the emotional expressiveness and engagement of humans. But it seems like we're right at the frontlines of a major transformation.