anyreach logo

Robust Speech Detection in Noisy Environments

Advanced Voice Activity Detection and Noise Filtering

False VAD triggers cause random stops, double responses, and interruption issues.

SileroVAD Integration
50%Fewer Errors
<100msResponse
ConfigurableSensitivity
Compliance:
SOC2
HIPAA
Primary VADTuned thresholds
Silero VAD

Fewer ASR Errors

50%

With noise filter

Response Time

<100ms

Configurable

Executive Summary

What we built

Advanced Voice Activity Detection and noise filtering that ensures voice agents respond to actual speech — not background noise, dogs barking, or TV audio.

Why it matters

False VAD triggers cause random stops, double responses, and interruption issues. These problems make voice agents unusable in real-world noisy environments.

Results

  • 50% fewer ASR errors due to noise
  • <100ms configurable response time
  • Improved call center environment performance
  • Better speakerphone scenario handling

Best for

  • Call center environments
  • Speakerphone scenarios
  • Noisy home environments
  • Multi-speaker settings

Limitations

  • Domain-specific VAD models still in research
  • Speech localization for target speaker in development
  • Personalized VAD adaptation requires call data

The Problem

Voicemail detection fails in three distinct ways. Each has different causes and different costs.

Symptom

Bot stops speaking randomly

Cause

False VAD trigger on background noise

Business Cost

Conversation breaks
User frustration
Bot seems broken

Symptom

Bot doesn't respond without prompting

Cause

VAD too aggressive in noisy environment

Business Cost

Dead air
Repeated user prompts
Bot seems unresponsive

Symptom

Bot responds 2x for same turn

Cause

VAD sensitivity mismatch

Business Cost

Overlapping responses
Confusion
Bot interrupts itself

Symptom

Hard to interrupt or self-interrupts

Cause

VAD threshold misconfiguration

Business Cost

Conversation flow broken
User cannot get a word in
Bot seems rude

How It Works

A two-layer detection system where each covers the other's weaknesses.

Noise Filter

Remove background noise before VAD

  • Krisp-style SDK for real-time removal
  • LiveKit Cloud Enhanced noise cancellation
  • Filter non-speech components

Silero VAD

Detect speech activity with tuned thresholds

  • min_silence_duration: 0.30s
  • activation_threshold: 0.5
  • min_speech_duration: 0.1s

Turn Detection

End-of-utterance detection

  • LiveKit / Anyreach model
  • Coordinate with VAD to avoid double-wait
  • Handle false interruptions gracefully

Product Features

Ready for production with enterprise-grade reliability.

Silero VAD Integration

Production-ready VAD with configurable thresholds for different environments.

50% Fewer ASR Errors

Noise filtering before VAD results in cleaner signal and better transcription.

Configurable Sensitivity

Tune activation threshold, silence duration, and speech duration for your use case.

Interruption Handling

Allow interruptions with configurable duration and false interruption timeout.

Turn Detection Coordination

Avoid double-wait by coordinating VAD silence with endpointing delay.

Research Frontiers

Speech localization, domain-specific models, and personalized VAD in development.

Integration Details

Runs On

Edge (real-time) with LiveKit orchestration

Latency Budget

<100ms response time

Providers

LiveKit, Silero VAD, Krisp SDK

Implementation

1-2 days for standard configuration

Frequently Asked Questions

Common questions about our voicemail detection system.

Ready to see this in action?

Book a technical walkthrough with our team to see how this research applies to your use case.