The Baseline Call
Start with the most common caller path. For a clinic, that might be appointment booking. For a restaurant, it might be a reservation. For a law firm, it might be initial intake and routing.
Document the exact expected outcome before calling. A passing call should end with a confirmed action, a clear summary, and a safe next step.
Test Setup
Before the first call, prepare:
- One phone number or test line per vendor
- A written caller persona
- The expected workflow outcome
- A stopwatch or call recording timestamp
- A scoring sheet
- Access to transcripts, summaries, recordings, and cost reporting
- A known integration failure case, such as a blocked calendar slot or unavailable CRM endpoint
Run the same caller persona across vendors. If a vendor changes the workflow during the demo, record the change instead of letting the test become a custom sales presentation.
Five Required Calls
- Normal happy path: the caller gives clean answers and accepts the recommended next step
- Caller correction: the caller changes a date, phone number, budget, or practice area mid-call
- Interruption path: the caller talks over the agent or asks an unrelated question
- Sensitive path: the caller describes urgency, distress, legal sensitivity, or healthcare complexity
- Integration failure: the calendar, CRM, or downstream system is unavailable
Scenario Matrix
Use one matrix across every vendor:
| Scenario | What it tests | Pass signal |
|---|---|---|
| Happy path | Basic intent detection and workflow completion. | The task completes with accurate summary and next step. |
| Correction | Whether the agent updates state when the caller changes details. | The final record uses the corrected detail, not the first one. |
| Interruption | Barge-in, turn-taking, and conversation recovery. | The agent stops, listens, and resumes without repeating too much. |
| Sensitive topic | Escalation judgment and safe language. | The agent avoids advice and routes according to policy. |
| Failed system | Tool timeout, unavailable slot, or CRM error. | The caller gets a truthful fallback and staff receives a useful note. |
| Human request | Transfer and fallback behavior. | The agent transfers with context or captures a clear callback request. |
Run each scenario with the same caller persona and expected result. This keeps the comparison fair.
Sample Caller Script
Use a realistic script, not a trick question:
“Hi, I need to book an appointment for next Thursday afternoon. Actually, Friday morning might be better. I am a new patient and I have some tooth pain. Do you take Delta Dental?”
This one call tests scheduling, correction handling, new-customer intake, sensitivity, insurance capture, and escalation judgment.
Industry Script Examples
| Industry | Example test call |
|---|---|
| Restaurant | ”Can I book four people at 7 tonight? Actually make that six. Also, do you have gluten-free options?” |
| Law firm | ”I think I may have a case, but I do not want to explain everything to a bot. Can someone call me today?” |
| Dental | ”I am a new patient with tooth pain. I can come Friday morning, but I need to know if you take Delta Dental.” |
| Home service | ”My AC stopped working, I need someone today, and I am not sure if I am in your service area.” |
| Support | ”My order never arrived, but I changed addresses after ordering. Can you help me?” |
The best script is ordinary and slightly messy. It should feel like a real call from a tired, busy, or distracted person.
What To Record
Track call start time, agent first-response delay, number of clarifying questions, whether the workflow completed, transcript quality, escalation path, and actual cost if usage billing is active.
| Metric | Pass signal |
|---|---|
| First response | Greeting starts without an awkward delay. |
| Interruption recovery | Agent stops, listens, and updates the workflow. |
| Task completion | Appointment, lead, ticket, or message is created accurately. |
| Escalation | Sensitive or uncertain calls reach the right fallback path. |
| Transcript quality | Names, numbers, dates, and intent are captured correctly. |
| Cost trace | Usage cost can be tied back to the test call. |
Passing Standard
The agent does not need to sound perfect. It needs to recover safely, avoid hallucinating policies, ask for missing details, and escalate when the caller leaves the approved workflow.
Buyer Note
Ask the vendor to run the same script twice: once in a demo environment and once in the intended production configuration. If the two results differ materially, the buyer should treat the production test as the source of truth.
Review Meeting Format
After the calls, review each vendor in the same order:
- Listen to the worst call first.
- Check whether the final system update matches the expected result.
- Read the summary as if you were a busy staff member.
- Inspect transfer context and failure language.
- Record unresolved questions for the vendor.
- Update the scorecard only after every vendor has run the same script.
This reduces demo bias. A vendor that produces one magical call and four fragile calls should not outrank a vendor with slightly less sparkle but more predictable recovery.
Call-Test Pack Structure
A complete call-test pack should include:
- Five scenario scripts
- Scoring sheet
- Vendor comparison table
- Transcript review checklist
- Compliance note field
- Cost-normalization field
That format makes the page useful enough for agencies, consultants, and operators to cite.
