General Configuration

The General section of the agent editor contains the foundational settings that apply to every conversation your agent handles. This is where you choose the language your agent speaks, the AI model powering its responses, and the timezone it uses for date and time references.

General settings

Language

The Language dropdown determines which language your agent uses for both speech recognition and voice synthesis. Changing the language automatically adjusts the underlying transcription engine and voice options to match.

AI Model

The AI Model selector controls which large language model powers your agent’s responses. Available options include:

GPT-4.1 — well-suited for complex tasks where latency is less critical.
GPT-4.1 Nano — lightweight variant of GPT-4.1 optimized for speed and cost.
GPT-4.1 Mini — similar reasoning power with lower cost.
GPT-5 Chat — advanced model for cutting-edge conversational quality.
GPT-5 Mini — compact GPT-5 variant balancing performance and cost.
GPT-5 Nano — lightweight GPT-5 variant optimized for speed and cost.
GPT-5.1 — advanced model with low latency. Pair it with the GPT-5.1 Prompting Guide for best results.
GPT-5.2 — OpenAI’s most advanced model. Pair it with the GPT-5.2 Prompting Guide for best results.
Synthflow — optimized for real-time, natural phone conversations.

Pick the model that matches your latency, cost, and quality requirements.

Timezone

The Timezone dropdown sets the local time context your agent operates in. New agents inherit the default from Settings > Preferences, but you can override it here. Keeping this accurate is critical for call openings, follow-up promises, and calendar booking contexts. The full list of supported values is available on the Timezones reference page.

Additional Settings

The Additional Settings panel houses lower-level controls that fine-tune how your agent listens, responds, and handles speech.

Additional Settings

STT Provider

The STT Provider dropdown selects the speech-to-text engine used to transcribe caller audio. The default is Deepgram, which offers strong English support. For non-English languages, switching to Synthflow STT can improve transcription accuracy.

Speech Recognition

Choose between Faster for lower-latency transcription or High Accuracy for more precise results. Faster is suitable for most real-time conversations; High Accuracy is better when the agent needs to capture exact wording, such as names, addresses, or reference numbers.

Optimize Latency

Optimize Latency balances voice quality against response speed. The scale ranges from None (highest quality, no optimization) through Low, Medium, High, to Max (fastest responses, some quality trade-off). For most use cases, leaving this at None or Low provides a good experience.

Pause Before Speaking

Pause Before Speaking adds a delay (in seconds) before the agent begins talking at the start of a call. A short pause can make the opening feel more natural, especially for inbound calls where the caller expects a brief ring-to-answer transition.

Interruption Sensitivity

Interruption Sensitivity controls how easily the caller can interrupt the agent mid-sentence. The value represents how many words the caller must speak before the agent stops and listens. Setting it to Off disables interruptions entirely; lower values make the agent yield quickly, while higher values let it finish longer phrases before pausing.

Fade Out at Interruption

Fade Out at Interruption sets how many frames the agent’s voice takes to fade out once an interruption is detected. A lower value cuts the agent off sharply; a higher value produces a smoother, more natural fade.

Custom Vocabulary

Custom Vocabulary lets you add domain-specific terms — brand names, product codes, technical jargon — so the speech recognition engine can identify and transcribe them correctly. Type each term and press Enter to add it.

Filter Words

Filter Words strips unwanted tokens from the transcription output. This is useful for removing filler sounds, profanity, or placeholder characters that the STT engine may produce. Existing filter words appear as tags that you can remove individually.

Use Realistic Filler Words

When enabled, the agent inserts natural filler words (like “um” or “uh”) into its responses, making the conversation feel more human and less robotic.