Latency Protocol

AI Voice Agent Latency Benchmark

A buyer protocol for measuring AI voice agent greeting speed, response delay, interruption recovery, tool-call waits, transfer timing, and transcript availability.

Voice agent latency timing bench with desk phone, stopwatch, network cable, waveform display, and response timing cards. — Latency should be measured across the whole call path, not only model response time.

What This Benchmark Measures

Latency is not one number. A caller feels the full chain: phone routing, speech detection, transcription, model response, voice generation, tool calls, transfer, and post-call processing.

This protocol measures six moments:

Moment	Start	Stop
Answer to greeting	Call connects	First agent audio begins
Caller stop to response	Caller finishes a request	Agent begins relevant answer
Barge-in recovery	Caller interrupts while agent speaks	Agent stops and responds to correction
Tool-call wait	Agent says it will check something	Agent returns with a result or failure path
Transfer start	Escalation trigger occurs	Human ring, queue, or callback path begins
Post-call artifact	Call ends	Transcript, summary, and structured fields are available

The benchmark should be run on the same network, phone path, and test prompt where practical. If not, document the difference.

Test Script

Run this scenario three times:

Call the test number.
Ask a simple factual question.
Interrupt the answer with a correction.
Ask the agent to check a calendar, CRM, order, or equivalent tool.
Ask for a human.
End the call and wait for the transcript or summary.

Use a stopwatch, call recording, or transcript timestamps. The buyer does not need lab-grade instrumentation for early vendor screening, but the timing method should be the same across vendors.

Scoring Rubric

Score	Meaning
1	Long pauses, frequent talk-over, or no usable timing evidence.
2	Usable in a demo, but tool waits or interruptions feel awkward.
3	Acceptable for low-risk calls with occasional pauses and recoverable delays.
4	Natural timing in most turns, clean interruption recovery, and clear tool-call language.
5	Consistently natural pacing, logged timing, fast recovery, and no unexplained silence.

Score the worst credible run. If two calls feel smooth and the third contains a long unexplained pause, the buyer should plan for that pause in production.

What To Record

Capture:

Connection method and phone path.
Greeting delay.
Average caller-stop-to-agent-response time.
Worst caller-stop-to-agent-response time.
Barge-in recovery time.
Tool-call wait time.
Transfer start time.
Transcript availability time.
Any long silence above 3 seconds.
Whether the agent explained waits honestly.

Red Flags

Watch for:

The agent talks over corrections.
The agent repeats a full sentence after interruption.
The agent creates silence while a tool runs.
The agent claims an action succeeded before the tool confirms it.
The agent delays transfer after the caller asks for a person.
The transcript arrives too late for staff follow-up.

Latency problems often become trust problems. A caller may forgive a small pause, but not a wrong booking or a stalled urgent transfer.

Better Than Raw Speed

The best voice agent is not always the one with the smallest response number. Buyers should prefer systems that:

Acknowledge tool waits briefly.
Stop speaking when interrupted.
Confirm only when needed.
Escalate without extra debate.
Show timing data after the call.

Use the latency and architecture guide for deeper stack review, then run this benchmark during vendor demos.

Benchmark FAQs

What latency number matters most?

The most useful number is perceived response delay after the caller stops speaking, but buyers should also measure greeting speed, barge-in recovery, tool-call waits, transfer start, and post-call artifact availability.

Should the fastest vendor always win?

No. Predictable timing, clean interruption handling, and honest tool-call language can matter more than the lowest raw response number.