General Configuration
The General section of the agent editor contains the foundational settings that apply to every conversation your agent handles. This is where you choose the language your agent speaks, the AI model powering its responses, and the timezone it uses for date and time references.

Language
The Language dropdown determines which language your agent uses for both speech recognition and voice synthesis. Changing the language automatically adjusts the underlying transcription engine and voice options to match.
AI Model
The AI Model selector controls which large language model powers your agent’s responses. Available options include:
- Synthflow — optimized for real-time, natural phone conversations.
- GPT-4o — high quality, strong context understanding and fast responses.
- GPT-4.1 — well-suited for complex tasks where latency is less critical.
- GPT-4.1 Mini — similar reasoning power with lower cost.
- GPT-5 Chat — advanced model for cutting-edge conversational quality.
- GPT-5.1 — advanced model with low latency. Pair it with the GPT-5.1 Prompting Guide for best results.
- GPT-5.2 — OpenAI’s most advanced model. Pair it with the GPT-5.2 Prompting Guide for best results.
Pick the model that matches your latency, cost, and quality requirements.
Timezone
The Timezone dropdown sets the local time context your agent operates in. New agents inherit the default from Settings > Preferences, but you can override it here. Keeping this accurate is critical for call openings, follow-up promises, and calendar booking contexts. The full list of supported values is available on the Timezones reference page.
Additional Settings
The Additional Settings panel houses lower-level controls that fine-tune how your agent listens, responds, and handles speech.

STT Provider
The STT Provider dropdown selects the speech-to-text engine used to transcribe caller audio. The default is Deepgram, which offers strong English support. For non-English languages, switching to Synthflow STT can improve transcription accuracy.
Speech Recognition
Choose between Faster for lower-latency transcription or High Accuracy for more precise results. Faster is suitable for most real-time conversations; High Accuracy is better when the agent needs to capture exact wording, such as names, addresses, or reference numbers.
Optimize Latency
Optimize Latency balances voice quality against response speed. The scale ranges from None (highest quality, no optimization) through Low, Medium, High, to Max (fastest responses, some quality trade-off). For most use cases, leaving this at None or Low provides a good experience.
Pause Before Speaking
Pause Before Speaking adds a delay (in seconds) before the agent begins talking at the start of a call. A short pause can make the opening feel more natural, especially for inbound calls where the caller expects a brief ring-to-answer transition.
Interruption Sensitivity
Interruption Sensitivity controls how easily the caller can interrupt the agent mid-sentence. The value represents how many words the caller must speak before the agent stops and listens. Setting it to Off disables interruptions entirely; lower values make the agent yield quickly, while higher values let it finish longer phrases before pausing.
Fade Out at Interruption
Fade Out at Interruption sets how many frames the agent’s voice takes to fade out once an interruption is detected. A lower value cuts the agent off sharply; a higher value produces a smoother, more natural fade.
Custom Vocabulary
Custom Vocabulary lets you add domain-specific terms — brand names, product codes, technical jargon — so the speech recognition engine can identify and transcribe them correctly. Type each term and press Enter to add it.
Filter Words
Filter Words strips unwanted tokens from the transcription output. This is useful for removing filler sounds, profanity, or placeholder characters that the STT engine may produce. Existing filter words appear as tags that you can remove individually.
Use Realistic Filler Words
When enabled, the agent inserts natural filler words (like “um” or “uh”) into its responses, making the conversation feel more human and less robotic.