Benchmark Evidence Summary
Vapi should be benchmarked as a developer platform. The most important evidence is whether the buyer or implementation partner can debug the full path from phone call to assistant behavior, tool execution, transfer, and post-call analysis.
What Is Already Clear
- Local profile positions Vapi for developers, product teams, and custom agent builders.
- The profile highlights telephony, custom tools, webhooks, assistant configuration, and call analysis as evaluation surfaces.
- Results depend heavily on the buyer's architecture, model choices, tool design, and monitoring process.
Evidence Still Missing
- Reference assistant configuration for the tested workflow, including tools, model choices, and phone route.
- Call logs that show tool timeouts, retries, failures, and downstream webhooks.
- Transfer configuration and proof that the human receives useful context.
- Cost trace for representative calls across model, voice, telephony, and platform layers.
Recommended Proof Packet
- One simple call and one tool-heavy call using the same published benchmark script.
- Assistant, tool, and phone-number configuration screenshots or exports.
- Structured call analysis, webhook payloads, and failed-tool logs for the same call IDs.
- Human transfer recording and summary packet.
Buyer Questions
- Who will maintain assistant prompts, tool schemas, credentials, and fallback language?
- Can the team reproduce a failed call from logs without vendor support?
- What model, voice, telephony, and tool choices were used in the demo?
- How are assistants and tools versioned between test and production?
Protocols To Run
Vapi Benchmark FAQs
Why is Vapi marked implementation dependent?
Vapi can support many workflow designs, so benchmark performance depends on the actual assistant, model, phone route, tools, and operational monitoring used by the buyer or partner.
What proof should a Vapi benchmark packet include?
Include assistant configuration, call recordings, transcripts, tool logs, webhook evidence, transfer records, call analysis, and a cost trace for the same benchmark calls.