anyreach logo

Voicemail Detection That Actually Delivers

When Your Brand Speaks, Make Sure It Lands

34.

+11.6 ptsSuccess Rate
96.1%Accuracy
27.6msCPU Latency
34.4%Calls Hit VM
Compliance:
SOC2
HIPAA
Successful Deliveries+11.6 pts
94.8%

Beep Detection

96.1%

vs 89.9% DSP

CPU Latency

27.6ms

~50x faster

Executive Summary

What we built

A combined semantic + acoustic voicemail detection system that handles both classification ("Is this voicemail?") AND timing ("When do I start speaking?").

Why it matters

34.4% of outbound calls hit voicemail. Failures waste ASR/LLM/TTS spend, damage brand perception, and mean your message—that appointment reminder, payment alert, or urgent callback—never gets delivered.

Results

  • 96.1% beep detection accuracy (vs 89.9% DSP, 81.2% Gemini)
  • 27.6ms latency on CPU (~50x faster than multimodal LLMs)
  • Call success rates improved from 83.2% to 94.8% on live calls

Best for

  • High-volume outbound campaigns
  • Appointment reminders & payment alerts
  • Automated callback systems

Limitations

  • Performance varies on carriers with non-standard voicemail greetings
  • Model trained on English greetings; other languages require fine-tuning

The Problem

Voicemail detection fails in three distinct ways. Each has different causes and different costs.

Failure Volume55.6%

Symptom

Bot doesn't recognize it's talking to a recording

Cause

Semantic detection fails on ambiguous greetings ("Hello?"), ASR lag (0.5-2s), or silent voicemails with no transcript to analyze

Business Cost

Pathological dialogue loops
Wasted ASR/LLM/TTS spend
Bot sounds broken, customers perceive robocall
Failure Volume28.2%

Symptom

Message gets clipped or incomplete

Cause

Spoke before the beep—classification was correct but timing was wrong. Transcripts lag 0.5-2s behind real-time audio.

Business Cost

First seconds lost to greeting
Customer receives garbled message
Incomplete information delivery or nothing at all
Failure Volume16.2%

Symptom

No detection signal available

Cause

No recorded greeting—just silence followed by a beep. Semantic detection has nothing to work with. The beep is the only signal.

Business Cost

Full call duration wasted
Complete message delivery failure
No message delivered at all

How It Works

A two-layer detection system where each covers the other's weaknesses.

Semantic Layer

Provides classification confidence: "This is voicemail"

  • Analyzes transcript for voicemail cues
  • "you've reached...", "please leave a message..."
  • Detects long monologue with no turn-taking

Acoustic Layer (Beep Detector)

Provides precise timing: "Start speaking now"

  • Learns acoustic signature of beep tones
  • Not fixed frequency thresholds—trained on diverse recordings
  • Handles carrier/region variation automatically

Combined Policy

Each covers the other's weaknesses

  • Semantic catches clear greeting patterns
  • Beep detection handles silent voicemails
  • Beep detection solves timing problem transcripts can't

Benchmark Results

Interactive explorer comparing Anyreach to baseline methods across datasets and metrics.

Baseline Comparison: Accuracy (%)

NameRegion
accuracy
precision
recall
f1
latency
Samples

Anyreach Beep Detector

All96.1%96.5%95.4%95.927.6ms
125,000±0.3

Anyreach (GPU)

All96.1%96.5%95.4%95.92.5ms
10,000±0.5

DSP (Signal Processing)

All89.9%100%82.8%90.610ms
125,000±0.7

Gemini (Multimodal LLM)

All81.2%76.3%88.2%77.81320ms
125,000±0.8

Ablation Studies

We tested each approach in isolation to understand what works and why.

Key Takeaways

  • 1.Classification alone isn't enough—28% of failures are timing problems
  • 2.DSP misses too many beeps (82.8% recall); LLMs are too slow (1,320ms)
  • 3.Combined semantic + acoustic achieves +11.6 pts success rate improvement

Semantic detection only

Text-based classification is sufficient for voicemail detection

-8.2 F1
-5.4 Recall

Fails on silent VMs, timing issues from ASR lag (0.5-2s)

DSP-only detection

FFT-based beep detection at ~1kHz is sufficient

-5.3 F1
-12.6 Recall

82.8% recall—misses too many beeps due to codec/carrier variation

Multimodal LLM (Gemini)

Audio-capable LLMs can detect beeps

-18.1 F1
+1292.4ms Latency

81.2% accuracy, 1,320ms latency—impractical for real-time

Combined semantic + acoustic

Winner

Multi-signal approach with each covering the other's weaknesses

+5.3 F1
+12.6 Recall

Production model: 83.2% → 94.8% success rate on live calls

Audio Examples

See how the model handles different voicemail scenarios.

Standard Voicemail (Success)

Deliveredstandarddelivered
58.0s
Beep
0.0s / 58.0s

Transcript

[Voicemail] Please leave your message. [Agent] Good morning, my name is Grace. I'm calling from Mary's Center Dental Department. This message is for Lula H. I was calling to confirm your Cleaning on Thursday, August seventh at 12 noon...

Model Decision

Voicemail detected (semantic), waited for beep, full message delivered successfully

Silent Voicemail (Success)

Deliveredsilentedge-casedelivered
62.0s
Beep
0.0s / 62.0s

Transcript

[No greeting - silent voicemail] [Agent] Good morning, my name is Grace. I'm calling from Mary's Center Dental Department to confirm an upcoming appointment for Kedste T...

Model Decision

No transcript available—beep detection triggered message delivery

Timing Failure (Clipped)

Clippedtiming-failureclipped
55.0s
Beep
0.0s / 55.0s

Transcript

[Voicemail] Please leave your message for— [Agent] Good morning, my name is Grace. I'm calling from Mary's Center Dental Department. This message is for Denis S...

Model Decision

Agent started speaking before greeting completed—first words overlapped with voicemail

Ambiguous Greeting

Deliveredambiguouslive-pickup
4.1s
Audio not available

Transcript

Hello?

Model Decision

Low VM confidence (0.32), waited for additional signal—live pickup confirmed

Reliability & Rollout

How we safely deployed to production with continuous monitoring.

Rollout Timeline

completed

Shadow Mode

Model runs in parallel, no production impact

2 weeks
Accuracy95.2%
Latency P9948ms
completed

Canary Rollout

5% of traffic with automatic rollback triggers

1 week
Success Rate94.1%
Rollbacks0
active

Full Rollout

100% of traffic with enhanced monitoring

Ongoing
Success Rate94.8%
Improvement+11.6 pts

Live Monitoring

Call Success Rate

Voicemail calls with successful message delivery

94.8%

Threshold: > 90%

healthy

Classification Accuracy

Correct voicemail vs live pickup detection

96.1%

Threshold: > 95%

healthy

P99 Latency

99th percentile detection latency (CPU)

48ms

Threshold: < 100ms

healthy

Safety Guardrails

  • Low confidence (<70%) triggers conservative wait-for-beep mode
  • Beep detection fallback for silent voicemails with no transcript
  • Carrier-specific thresholds for known edge cases

Product Features

Ready for production with enterprise-grade reliability.

~50x Faster Than LLMs

27.6ms on CPU vs 1,320ms for multimodal LLMs—fast enough for real-time decisions

Solves Both Problems

Classification AND timing in one system—no more clipped messages

Handles Silent Voicemails

Beep detection works when there's no transcript to analyze

Higher Recall Than DSP

95.4% recall vs 82.8%—catches beeps that signal processing misses

No GPU Required

Production-grade accuracy on CPU. GPU available for 2.5ms latency.

Carrier Variation Handled

Trained on diverse recordings across carriers, regions, and edge cases

Integration Details

Runs On

Edge (WebAssembly) or Server (Docker)

Latency Budget

<50ms P99 recommended

Providers

Twilio, Vonage, SIP, Custom WebSocket

Implementation

1-2 days typical

Frequently Asked Questions

Common questions about our voicemail detection system.

Methodology

How we built, trained, and evaluated this model.

Dataset

NameAnyreach Voicemail Corpus
SizeProduction traffic over 30 days

34.4% of outbound calls hit voicemail. Diverse carrier, region, and edge case coverage.

Labeling

Automated via carrier metadata + human review for edge cases. Failure mode analysis: 55.6% classification failures, 28.2% timing failures.

Evaluation Protocol

Before/after comparison on live calls. Success rate: 83.2% → 94.8% (+11.6 pts).

Known Limitations

  • Model trained on English greetings; other languages require fine-tuning
  • Performance varies on carriers with non-standard voicemail greetings
  • Silent voicemail detection depends on beep presence

Evaluation Details

Last Evaluated:2025-01-03
Model Version:vm-detect-v4.0
Eval Run:eval-20250103-prod
Commit:b7e4f1a

Ready to see this in action?

Book a technical walkthrough with our team to see how this research applies to your use case.