Voice Configuration

The Voice section of the agent editor controls how your agent sounds during conversations. From the voice engine and speaker selection to fine-grained controls over speed, expressiveness, and stability, these settings let you craft a natural, on-brand vocal identity.

Voice settings

Voice Model

The Voice Model dropdown selects the underlying engine that synthesizes your agent’s speech. The default is ElevenLabs Turbo v2, which is optimized for English. For other languages, switch to a model that supports multilingual synthesis.

Voice

The Voice selector lets you pick from a library of pre-built voices, import a voice from a third-party provider, or clone a custom voice. Each voice comes with a preview so you can audition it before committing. The voice you choose here works together with the advanced settings below to shape the final output.

Fallback Voice Model

If the primary voice provider experiences an outage, Synthflow automatically switches to an alternative provider to keep the call running. In rare cases the caller may notice a slight change in the agent’s voice, but the conversation will continue without dropping or interruption. This setting is available on Enterprise plans.

Patience Level

Patience Level determines how long the agent waits after the caller finishes speaking before it begins its response. Low makes the agent respond almost immediately, Medium adds a natural pause, and High gives the caller extra time to continue — useful for conversations where callers tend to pause mid-thought.

Speed

The Speed slider adjusts how fast the agent speaks, from slow and deliberate to fast and energetic. The default of 100% mirrors natural conversational pacing. Lowering the value can improve clarity for complex information; raising it keeps the conversation moving.

Volume

The Volume slider sets the output loudness of the agent’s voice. Keep it at 100% for most use cases and lower it if callers report the agent is too loud relative to their own audio level.

Speaker Boost

Speaker Boost amplifies the characteristics that make the selected voice sound like its original speaker. Enabling it increases vocal likeness but may add a small amount of latency.

Advanced Settings

The settings below are collapsed under Advanced Settings by default. They give you precise control over the voice’s tonal characteristics.

Stability

Stability balances expressiveness against consistency. Lower values produce a more dynamic, emotive delivery that varies between utterances. Higher values keep the tone steady and predictable — better for reading structured information like addresses or confirmation numbers.

Style Exaggeration

Style Exaggeration amplifies the stylistic traits of the original voice. A value of 0% keeps the output neutral; increasing it makes the voice more animated. Use sparingly — high values can sound unnatural in certain contexts.

Similarity

Similarity controls how closely the synthesized output matches the original voice sample. Higher values prioritize fidelity to the source recording, while lower values give the model more freedom to optimize for clarity and naturalness.

Voice Intonation / Prompting

The Voice Intonation field accepts free-text descriptors that influence how the agent delivers its lines — for example, “She said fast” or “Speak in a calm, reassuring tone.” This is a powerful way to shape pacing, emotion, and emphasis without changing the prompt itself.