LLM as Judge

Providing clear and insightful feedback is crucial for improving conversation quality and maintaining high standards in AI-customer interactions. We use fine-tuned LLMs to generate high-quality feedback on the success of the voice-based interaction with the AI assistant based on various criteria (LLM as Judge).

Why Useful

1. Great Feedback for Prompt Optimization:
Provides valuable insights to refine prompts effectively.

2. Monitor Quality of Calls:
Helps ensure consistent and high-quality interactions during client calls.

Judge Criteria

Feedback CriteriaAnswers the question / Content.
PersonaQuestion: Does the agent talk about itself (its persona) as defined in its prompt?
StyleQuestion: Is the style of the conversation consistent with the prompt (tone, wording like professional or casual, etc.)?
StepsQuestion: Does the agent follow the steps outlined in the prompt and in the correct order?
No RepetitionQuestion: Does the agent avoid unnecessary repetition unless requested by the human?
ObjectionsQuestion: Does the agent handle objections as defined in the prompt?
Objection Not DefinedQuestion: How does the agent handle objections not mentioned in the prompt?
KnowledgeQuestion: Does the agent use relevant knowledge beyond what's in its prompt if needed?
Goal AchievementQuestion: Did the agent achieve the call objective specified in its prompt?
Appointment HandlingQuestion: If the user wants to book an appointment, are date, time, and timezone handled correctly?
User SentimentQuestion: Was the human sentiment positive during the call?
Agent SentimentQuestion: Was the agent sentiment positive during the call?
Call CompletionQuestion: Did the call end with a proper goodbye instead of being cut off?
Answered by HumanQuestion: Was the call answered by a human?
Call SummaryConcise summary of the call (max 500 characters).

LLM as Judge data is listed for each individual call after it's completion in the call logs of the AI assistant.

Assistant Teams Image