The question is no longer, “does this sound human?”
It’s, “what do you want it to feel like?”
And that shift might be the most important thing happening in consumer-facing AI today.
Sixty days ago, I wrote about how I unknowingly listened to AI-generated music because of a new model that generated music in a way that was subtle, almost ambient.
What I’ve been listening to this week in the AI speech space is anything but subtle.
Recent releases have taken such a leap forward that the public now has access to voice models that don't just say the words, it performs them. On cue. In real time. With emotional precision that feels… frankly, like something straight out of Mountainhead.
We’ve stopped trying to make machines sound human - that’s done - we’re now training them to act.
Voice That Takes Direction
We’ve entered a phase where AI can actually perform.
It doesn’t just take output speech, it delivers intention. Mood. Emotion.
The new OpenAudio model available at Fish Audio allows you to set up a script that includes emotional or tone markers like (sarcastically), (enthusiastically), or (excited). It can also include special markers like (laughing), (sobbing), or (panting). In less than a minute, I gave it my own voice and generated results that would be good enough to fool anybody. If I spent a little more time, I’m positive I could get this to a scary-good place. I wasn’t surprised to see OpenAudio at the top of Hugging Face’s TTS Arena rankings for expressiveness:
But the actual story is about the shift in creative control. You can whisper, sob, scream, emphasize, trail off.
This is playing out across the AI voice landscape but, interestingly, with each competitor seemingly dialed in on its own segment: ElevenLabs is chasing the audiobook market, Hume is building an AI you can chat with - in a voice that you want to hear, rather than whatever Siri’s been doing to us the last decade. Voicemod is now offering a real-time voice changer, apparently popular with gamers.
What Can You Do With Voice Tools Like This?
If you’re a storyteller, podcaster, game dev, or brand builder, this changes your entire creative stack. You don’t need a studio, a casting session, or even a microphone. You can now:
- Build entire voice casts for your game, each with its own tone, emotional range, and delivery quirks.
- Prototype an audio drama or film script with full emotional delivery before recording a single real actor.
- Test dozens of versions of the same ad with different vibes: upbeat, thoughtful, sarcastic, whispered.
- Generate a branded voice for your startup that doesn’t just sound clear—but sounds like you want the user to feel.
For the solo creator, this isn’t just productivity, it’s leverage.
For accessibility, it’s a breakthrough. You can make your content more inclusive with emotion-aware voiceovers in multiple languages. You can create voices for people who’ve lost theirs. You can give students new ways to experience material in a way that’s performed, not just read.
You can build AI companions that don’t feel uncanny. You can generate synthetic therapy bots that respond not just with accuracy, but with warmth. You can write a children’s book and have it read aloud in the voice of a kind grandmother. Or a space captain.
Or a pirate who really needs a nap.
Whoops, sorry, that nap thing might’ve just been about me.
None of this is hard. It’s $0.80/hour. It runs in real time. And it works today.
Why This Moment Matters
We’ve hit a new threshold, and not just in fidelity. The leap is emotion: we’re shifting from what’s being said to how it’s being felt.
Every interface that involves language - from media to messaging to virtual assistants - is about to be rebuilt with this layer.
Because if you can guide emotion, you can guide behavior. Influence, persuasion, empathy, intent - they’re no longer soft skills. They’re programmable surfaces.
That’s a shift in storytelling. A shift in marketing. A shift in trust.
More than the technical achievement of OpenAudio, this is a signpost.
Emotion is the next frontier of user experience.
And voice is where that battle is already being won.