Voice Agent Index
Voice agent latency timing bench with desk phone, stopwatch, network cable, waveform display, and response timing cards.
Latency should be measured across the whole call path, not only model response time.

What This Benchmark Measures

Latency is not one number. A caller feels the full chain: phone routing, speech detection, transcription, model response, voice generation, tool calls, transfer, and post-call processing.

This protocol measures six moments:

MomentStartStop
Answer to greetingCall connectsFirst agent audio begins
Caller stop to responseCaller finishes a requestAgent begins relevant answer
Barge-in recoveryCaller interrupts while agent speaksAgent stops and responds to correction
Tool-call waitAgent says it will check somethingAgent returns with a result or failure path
Transfer startEscalation trigger occursHuman ring, queue, or callback path begins
Post-call artifactCall endsTranscript, summary, and structured fields are available

The benchmark should be run on the same network, phone path, and test prompt where practical. If not, document the difference.

Test Script

Run this scenario three times:

  1. Call the test number.
  2. Ask a simple factual question.
  3. Interrupt the answer with a correction.
  4. Ask the agent to check a calendar, CRM, order, or equivalent tool.
  5. Ask for a human.
  6. End the call and wait for the transcript or summary.

Use a stopwatch, call recording, or transcript timestamps. The buyer does not need lab-grade instrumentation for early vendor screening, but the timing method should be the same across vendors.

Scoring Rubric

ScoreMeaning
1Long pauses, frequent talk-over, or no usable timing evidence.
2Usable in a demo, but tool waits or interruptions feel awkward.
3Acceptable for low-risk calls with occasional pauses and recoverable delays.
4Natural timing in most turns, clean interruption recovery, and clear tool-call language.
5Consistently natural pacing, logged timing, fast recovery, and no unexplained silence.

Score the worst credible run. If two calls feel smooth and the third contains a long unexplained pause, the buyer should plan for that pause in production.

What To Record

Capture:

  • Connection method and phone path.
  • Greeting delay.
  • Average caller-stop-to-agent-response time.
  • Worst caller-stop-to-agent-response time.
  • Barge-in recovery time.
  • Tool-call wait time.
  • Transfer start time.
  • Transcript availability time.
  • Any long silence above 3 seconds.
  • Whether the agent explained waits honestly.

Red Flags

Watch for:

  • The agent talks over corrections.
  • The agent repeats a full sentence after interruption.
  • The agent creates silence while a tool runs.
  • The agent claims an action succeeded before the tool confirms it.
  • The agent delays transfer after the caller asks for a person.
  • The transcript arrives too late for staff follow-up.

Latency problems often become trust problems. A caller may forgive a small pause, but not a wrong booking or a stalled urgent transfer.

Better Than Raw Speed

The best voice agent is not always the one with the smallest response number. Buyers should prefer systems that:

  • Acknowledge tool waits briefly.
  • Stop speaking when interrupted.
  • Confirm only when needed.
  • Escalate without extra debate.
  • Show timing data after the call.

Use the latency and architecture guide for deeper stack review, then run this benchmark during vendor demos.

Benchmark FAQs

What latency number matters most?

The most useful number is perceived response delay after the caller stops speaking, but buyers should also measure greeting speed, barge-in recovery, tool-call waits, transfer start, and post-call artifact availability.

Should the fastest vendor always win?

No. Predictable timing, clean interruption handling, and honest tool-call language can matter more than the lowest raw response number.