Create a custom evaluation

Custom evaluations allow you to define and track your own criteria for what makes a successful call or conversation. Since every business and use case is different, you can set unique success metrics that matter most to you—whether that’s booking an appointment, providing accurate information, or ensuring customer satisfaction. This flexibility means you get actionable insights based on your own definition of success, not a generic standard.

Custom evaluations are for you, if:

You want to measure AI agent performance against custom goals
Your business has unique workflows or compliance needs
You need to track outcomes that can’t be inferred automatically.

Set up custom evaluations

Step 1. Create a custom evaluation

On the left-hand sidebar, click Actions.

Select Custom Evaluations.

Click New Evaluation.

Fill in the details:

Name: Give your evaluation a unique, descriptive name.
Prompt: Describe the evaluation criteria (e.g., “Did the agent greet the caller by name and resolve the issue?”).
Category: Choose how you want to score the call (options: Numeric, Descriptive, Likert Scale, Pass/Fail).
Expected Result: Set the desired outcome based on the category.

Once you’re done, click Save.

Step 2. Attach the evaluation to an agent

On the left-hand sidebar, click Agents.

Select the agent you want to attach the custom evaluation to.

Select Actions on the navigation sidebar.

Switch to the After the Call tab in the top panel.

Click Add Action and select Custom Evaluation. Check the box next to the custom evaluation you created and click Add action.

What pain does it solve?

Eliminates one-size-fits-all metrics.
Makes it easy to prove ROI for your specific use case.
Supports internal quality assurance and compliance.

Use cases

Healthcare: Confirm if the bot successfully booked a patient appointment.
Customer Support: Rate if the customer’s issue was fully resolved.
Sales: Track if a follow-up call was scheduled.
Compliance: Ensure required legal scripts were read during calls.

Usage examples

Appointment Booking (Pass/Fail)

Name: Appointment_booked
Prompt: Check if the bot successfully booked an appointment.
Category: True/Fail
Expected Result: True

User Satisfaction (Numeric)

Name: User_satisfaction
Prompt: Analyze the conversation and rate the user’s overall satisfaction with the call. Consider the tone, language, and resolution.
Category: Numeric
Expected Result: 8

Compliance Check (Descriptive)

Name: Privacy_notice
Prompt: Did the agent inform the caller about data privacy and consent requirements at the start of the call?
Category: Descriptive
Expected Result: Excellent

Customer Experience (Likert Scale)

Name: Customer_experience
Prompt: The agent was friendly and professional and the user was happy with the interaction.
Category: Likert Scale
Expected Result: Strongly Agree

Tips and Best Practices

Be specific: Write clear and measurable prompts to avoid ambiguity.
Use categories wisely: Choose the simplest category that gets the job done. Use Pass/Fail for binary outcomes, Numeric for ratings, etc.
Iterate: Review evaluation results regularly and refine prompts for accuracy.

Troubleshooting

Evaluation not appearing: Make sure you’ve saved the evaluation and attached it to an agent.
Incorrect scoring: Double-check that the expected result matches the scoring category.
API errors: Ensure your JSON input follows the correct schema and matches the allowed category/result pairs.