Inside Personaplex/realtime: Full-Duplex Speech-to-Speech AI by fal-ai

Model overview

personaplex/realtime is a real-time, full-duplex speech-to-speech conversational model from fal-ai that handles natural back-and-forth conversations with simultaneous listening and speaking. The model uses two forms of conditioning to control its behavior: text-based role prompts that define persona attributes like background and scenario context, and audio-based voice prompts that establish vocal characteristics and speaking style. This dual-control approach sets it apart from simpler speech-to-speech systems. Similar models like personaplex-7b-v1 and dia-tts handle speech generation, but personaplex/realtime specifically emphasizes full-duplex interaction where the model can handle interruptions, barge-ins, and rapid turn-taking that feel natural in conversation.

Model inputs and outputs

The model accepts both audio and text inputs to shape how it responds, then generates conversational speech in real-time. Audio streams in continuously while the model simultaneously produces spoken responses, enabling the overlapping and interruption patterns that characterize genuine dialogue.

Inputs

User speech audio - Incoming voice from the conversation participant
Voice prompt - Audio tokens that establish the target vocal characteristics and speaking style
Text prompt - Persona description specifying role, background, and scenario context

Outputs

Agent speech audio - Generated spoken response played back in real-time
Agent text - The underlying text content of the response

Capabilities

The model performs streaming speech understanding and generation in tandem, processing incoming audio while producing outgoing speech without waiting for the user to finish speaking. This enables natural conversational dynamics including interruptions where one speaker breaks in while another is still talking, backchannels like "uh-huh" and "I see," and smooth turn-taking with minimal gaps between speakers. The voice and text conditioning allow you to establish a consistent persona that maintains its characteristics throughout the conversation, whether that persona is a customer service representative, a knowledgeable assistant, or any other role you define.

What can I use it for?

Build interactive voice applications for customer service, virtual assistants, and game characters that respond naturally when interrupted. Create educational tutoring systems where students can speak freely without artificial pauses, develop voice-driven storytelling experiences where characters maintain consistent personalities, or build accessibility features for real-time conversation support. The chatterbox/speech-to-speech and play-dialog models offer alternative speech synthesis approaches, but the real-time bidirectional nature of this model makes it particularly valuable for applications where natural conversation flow matters most.

Things to try

Experiment with different text prompts that define distinct personas and observe how the model adapts its responses while maintaining the voice characteristics you provide through the audio prompt. Test how well the model handles scenarios where you interrupt mid-sentence—this reveals the genuine full-duplex capability that distinguishes it from turn-based systems. Try combining a formal business voice tone with a casual persona text prompt, or vice versa, to see how the two conditioning mechanisms interact. Record real conversation samples as voice prompts to see if the model picks up subtle speaking patterns like pacing, emphasis, or breathiness.

This is a simplified guide to an AI model called personaplex/realtime maintained by fal-ai. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.