Robust Speech Detection in Noisy Environments
Advanced Voice Activity Detection and Noise Filtering
False VAD triggers cause random stops, double responses, and interruption issues.
Fewer ASR Errors
50%
With noise filter
Response Time
<100ms
Configurable
Executive Summary
What we built
Advanced Voice Activity Detection and noise filtering that ensures voice agents respond to actual speech — not background noise, dogs barking, or TV audio.
Why it matters
False VAD triggers cause random stops, double responses, and interruption issues. These problems make voice agents unusable in real-world noisy environments.
Results
- 50% fewer ASR errors due to noise
- <100ms configurable response time
- Improved call center environment performance
- Better speakerphone scenario handling
Best for
- →Call center environments
- →Speakerphone scenarios
- →Noisy home environments
- →Multi-speaker settings
Limitations
- Domain-specific VAD models still in research
- Speech localization for target speaker in development
- Personalized VAD adaptation requires call data
The Problem
Voicemail detection fails in three distinct ways. Each has different causes and different costs.
Symptom
Bot stops speaking randomly
Cause
False VAD trigger on background noise
Business Cost
Symptom
Bot doesn't respond without prompting
Cause
VAD too aggressive in noisy environment
Business Cost
Symptom
Bot responds 2x for same turn
Cause
VAD sensitivity mismatch
Business Cost
Symptom
Hard to interrupt or self-interrupts
Cause
VAD threshold misconfiguration
Business Cost
How It Works
A two-layer detection system where each covers the other's weaknesses.
Noise Filter
Remove background noise before VAD
- Krisp-style SDK for real-time removal
- LiveKit Cloud Enhanced noise cancellation
- Filter non-speech components
Silero VAD
Detect speech activity with tuned thresholds
- min_silence_duration: 0.30s
- activation_threshold: 0.5
- min_speech_duration: 0.1s
Turn Detection
End-of-utterance detection
- LiveKit / Anyreach model
- Coordinate with VAD to avoid double-wait
- Handle false interruptions gracefully
Product Features
Ready for production with enterprise-grade reliability.
Silero VAD Integration
Production-ready VAD with configurable thresholds for different environments.
50% Fewer ASR Errors
Noise filtering before VAD results in cleaner signal and better transcription.
Configurable Sensitivity
Tune activation threshold, silence duration, and speech duration for your use case.
Interruption Handling
Allow interruptions with configurable duration and false interruption timeout.
Turn Detection Coordination
Avoid double-wait by coordinating VAD silence with endpointing delay.
Research Frontiers
Speech localization, domain-specific models, and personalized VAD in development.
Integration Details
Runs On
Edge (real-time) with LiveKit orchestration
Latency Budget
<100ms response time
Providers
LiveKit, Silero VAD, Krisp SDK
Implementation
1-2 days for standard configuration
Frequently Asked Questions
Common questions about our voicemail detection system.
Ready to see this in action?
Book a technical walkthrough with our team to see how this research applies to your use case.
