Voice Agent Index

Lab model

Test the caller journey, not only the voice

A fast, natural-sounding agent still needs to handle interruptions, tools, transfers, compliance language, and recovery when something breaks.

Repeatable calls

Run the same caller script, interruption, and failure path across vendors.

Evidence packets

Capture transcript, timestamps, tool logs, transfer events, and callback outcomes.

Operational scoring

Score what a buyer can verify: completion, handoff, latency, recovery, and visibility.

Vendor evidence

Priority vendor proof pages

These pages give buyers and vendors a clear target for missing recordings, transcripts, timing logs, transfer proof, workflow artifacts, and policy evidence.

Evidence page

Retell AI

Profiled, with public claims to verify and a standardized evidence packet still needed before scored benchmark results.

Open benchmark proof checklist
Evidence page

Vapi

Profiled, with implementation-dependent outcomes and standardized proof needed before scored benchmark results.

Open benchmark proof checklist
Evidence page

Bland AI

Profiled, with public evidence to check and a standardized evidence packet still needed before scored benchmark results.

Open benchmark proof checklist
Evidence page

Synthflow

Profiled, with public evidence to check and standardized workflow proof needed before scored benchmark results.

Open benchmark proof checklist
Evidence page

Goodcall

Profiled, with booking scenario ready and policy, handoff, latency, and noisy-caller proof still needed before scored benchmark results.

Open benchmark proof checklist

Protocols

Start with the benchmark pack

Each protocol gives buyers a scenario, evidence checklist, scorecard, and failure modes to verify before launch.

Workflow Test

AI Receptionist Appointment Booking Benchmark

A benchmark protocol for testing whether an AI receptionist can book, reschedule, or route appointment calls without losing caller details or making unsafe promises.

Reviewed 2026-06-17
Risk Test

AI Voice Agent Emergency Escalation Benchmark

A benchmark protocol for testing whether an AI voice agent detects urgent caller language and routes to the approved human or fallback path.

Reviewed 2026-06-17
Handoff Protocol

AI Voice Agent Human Handoff Benchmark

A benchmark protocol for testing whether an AI voice agent transfers, escalates, or creates callback tasks with enough context for a human to continue the conversation.

Reviewed 2026-06-17
Latency Protocol

AI Voice Agent Latency Benchmark

A buyer protocol for measuring AI voice agent greeting speed, response delay, interruption recovery, tool-call waits, transfer timing, and transcript availability.

Reviewed 2026-06-17
Methodology

AI Voice Agent Benchmark Methodology

A repeatable methodology for scoring AI voice agents by caller experience, workflow completion, latency, handoff, observability, and launch risk.

Reviewed 2026-06-17
Robustness Test

AI Voice Agent Noisy Caller Benchmark

A benchmark protocol for testing whether an AI voice agent handles background noise, accents, spelling, corrections, interruptions, and low-confidence caller details.

Reviewed 2026-06-17

Evidence before launch

Ask vendors to show the same proof.

Use the lab protocols inside demos, RFPs, and pilot reviews so every vendor is judged by the same call path, transfer criteria, and post-call artifacts.

Latency Handoff Failure recovery
Build RFP Open scorecard
Timing log Transcript Tool events Handoff result

Benchmark FAQs

Does Voice Agent Index publish live vendor benchmark scores?

The Benchmark Lab starts with repeatable test protocols and scoring rubrics. Public vendor scores should only be published when the same scenario, phone path, evidence fields, and review method are used across vendors.

What should buyers test before trusting an AI voice agent demo?

Buyers should test first greeting speed, interruption recovery, workflow completion, tool-call behavior, human handoff, failure language, and post-call evidence such as transcripts, logs, summaries, and cost reporting.

Why separate benchmark protocols from vendor profiles?

Vendor profiles describe fit and positioning. Benchmark protocols define the repeatable evidence a buyer can request or run so polished demos do not replace operational proof.