Create a custom evaluation

Custom evaluations allow you to define and track your own criteria for what makes a successful call or conversation. Since every business and use case is different, you can set unique success metrics that matter most to you—whether that’s booking an appointment, providing accurate information, or ensuring customer satisfaction. This flexibility means you get actionable insights based on your own definition of success, not a generic standard.

Custom evaluations are for you, if:

  • You want to measure AI agent performance against custom goals
  • Your business has unique workflows or compliance needs
  • You need to track outcomes that can’t be inferred automatically.

Set up custom evaluations

Step 1. Create a custom evaluation

1

On the left-hand sidebar, click Actions.

2

Select Custom Evaluations.

3

Click New Evaluation.

4

Fill in the details:

  • Name: Give your evaluation a unique, descriptive name.
  • Prompt: Describe the evaluation criteria (e.g., “Did the agent greet the caller by name and resolve the issue?”).
  • Category: Choose how you want to score the call (options: Numeric, Descriptive, Likert Scale, Pass/Fail).
  • Expected Result: Set the desired outcome based on the category.

Once you’re done, click Save.

Step 2. Attach the evaluation to an agent

1

On the left-hand sidebar, click Agents.

2

Select the agent you want to attach the custom evaluation to.

3

Select Actions on the navigation sidebar.

4

Switch to the After the Call tab in the top panel.

5

Click Add Action and select Custom Evaluation. Check the box next to the custom evaluation you created and click Add action.


What pain does it solve?

  • Eliminates one-size-fits-all metrics.
  • Makes it easy to prove ROI for your specific use case.
  • Supports internal quality assurance and compliance.

Use cases

  • Healthcare: Confirm if the bot successfully booked a patient appointment.
  • Customer Support: Rate if the customer’s issue was fully resolved.
  • Sales: Track if a follow-up call was scheduled.
  • Compliance: Ensure required legal scripts were read during calls.

Usage examples

Appointment Booking (Pass/Fail)

  • Name: Appointment_booked
  • Prompt: Check if the bot successfully booked an appointment.
  • Category: True/Fail
  • Expected Result: True

User Satisfaction (Numeric)

  • Name: User_satisfaction
  • Prompt: Analyze the conversation and rate the user’s overall satisfaction with the call. Consider the tone, language, and resolution.
  • Category: Numeric
  • Expected Result: 8

Compliance Check (Descriptive)

  • Name: Privacy_notice
  • Prompt: Did the agent inform the caller about data privacy and consent requirements at the start of the call?
  • Category: Descriptive
  • Expected Result: Excellent

Customer Experience (Likert Scale)

  • Name: Customer_experience
  • Prompt: The agent was friendly and professional and the user was happy with the interaction.
  • Category: Likert Scale
  • Expected Result: Strongly Agree

Tips and Best Practices

  • Be specific: Write clear and measurable prompts to avoid ambiguity.
  • Use categories wisely: Choose the simplest category that gets the job done. Use Pass/Fail for binary outcomes, Numeric for ratings, etc.
  • Iterate: Review evaluation results regularly and refine prompts for accuracy.

Troubleshooting

  • Evaluation not appearing: Make sure you’ve saved the evaluation and attached it to an agent.
  • Incorrect scoring: Double-check that the expected result matches the scoring category.
  • API errors: Ensure your JSON input follows the correct schema and matches the allowed category/result pairs.