Benchmark results

The evidence matrix for AI voice agents.

Track which vendors have public evidence, which tests are ready to run, and where buyers should ask for proof before trusting a polished demo.

AI voice agent benchmark command center with vendor rows, scorecard columns, evidence packets, and call-test status markers. — The first matrix tracks evidence status. Public scored results should come only after repeatable tests.

Current matrix

Evidence status by vendor

Use this as a buyer checklist and vendor submission target. It is intentionally conservative until repeatable benchmark packets are reviewed.

Vendor	Latency	Handoff	Booking	Escalation	Noisy caller	Evidence state
Retell AI Developer voice agent platform Benchmark evidence page	Public claims to verify	Test pending	Scenario ready	Needs evidence packet	Test pending	Profiled
Vapi Developer voice agent API Benchmark evidence page	Public claims to verify	Implementation dependent	Scenario ready	Needs evidence packet	Test pending	Profiled
Telnyx Voice infrastructure	Infrastructure evidence	Implementation dependent	Implementation dependent	Needs workflow proof	Test pending	Architecture mapped
Bland AI Enterprise voice AI Benchmark evidence page	Public evidence check	Test pending	Scenario ready	Needs evidence packet	Test pending	Profiled
Synthflow No-code enterprise voice AI Benchmark evidence page	Public evidence check	Test pending	Scenario ready	Needs evidence packet	Test pending	Profiled
Goodcall AI receptionist Benchmark evidence page	Test pending	Needs evidence packet	Scenario ready	Needs policy proof	Test pending	Profiled
Smith.ai Hybrid receptionist	Service dependent	Hybrid model noted	Scenario ready	Needs packet	Service dependent	Profiled
Slang AI Restaurant voice AI	Test pending	Needs evidence packet	Restaurant scenario ready	Needs escalation proof	Restaurant audio test pending	Profiled

Status definitions

What the labels mean

The matrix separates public claims, test readiness, missing evidence, and implementation-dependent workflows.

Protocol ready

The test scenario exists and can be run during a demo or pilot.

Public evidence check

Public claims or docs are available, but standardized evidence is not complete yet.

Test pending

Voice Agent Index has not published a repeatable benchmark result for that vendor and scenario.

Needs evidence packet

The vendor should provide recordings, transcripts, logs, routing proof, or workflow artifacts.

Implementation dependent

The result depends heavily on how the buyer or implementation partner configures the workflow.

Vendor evidence

Move from pending to reviewed.

Vendors can submit benchmark evidence packets for the same latency, handoff, booking, escalation, and noisy-caller protocols buyers use during demos.

Submit evidence Read methodology

Demo path Timing proof Tool proof Handoff proof

Results FAQs

Are these benchmark results final vendor rankings?

No. This matrix tracks evidence status and test readiness. Numeric vendor scores should only be added when the same benchmark scenario and evidence packet are applied across vendors.

How can a vendor move from pending to reviewed?

A vendor can submit demo access, call recordings, transcripts, tool logs, transfer evidence, pricing details, and policy documentation for editorial review.

Why include vendors before all tests are complete?

Buyers need to know which evidence is public, missing, implementation-dependent, or ready to test. The matrix makes gaps visible instead of pretending every profile is equally proven.