Wake Words & Privacy
A wake word engine listens passively to audio 24/7, triggering the full voice pipeline only when a specific phrase is detected — "Hey Jarvis", "Computer", "Alexa". Without a wake word, the voice pipeline would either run continuously (expensive and slow) or require a push-to-talk button (breaks natural interaction). The challenge: wake word detection must be always-on, low-power, and private — meaning it must run entirely on the device, with no audio ever sent to a server.
What Wake Word Engines Do
A wake word engine continuously analyses small audio frames (typically 10–30ms chunks) and classifies each frame as "wake word present" or "not wake word". When it detects the phrase, it signals the full STT pipeline to start recording and processing. The engine runs on the client — a phone, a Raspberry Pi, a microcontroller — consuming minimal CPU/memory so the device can remain responsive while always listening.
The two core trade-offs:
- False positives (false alarms): engine triggers on speech that is not the wake word. Causes accidental activation — the AI starts listening when not intended. Measured as false alarms per hour. Industry target: <1 false alarm per 10 hours of background speech.
- False negatives (missed detections): engine fails to trigger when the wake word is genuinely spoken. Causes the user to repeat themselves. Measured as miss rate. Industry target: <3% miss rate in clean speech conditions.
These trade-offs are controlled by a detection threshold — lower threshold = fewer misses but more false alarms; higher threshold = fewer false alarms but more misses.
openWakeWord (Open-Source)
openWakeWord is an open-source wake word detection framework designed for real-world accuracy on commodity hardware. It was popularised by its adoption in Home Assistant for local voice control.
How it works
- Built on Google's open-source audio embedding model (pre-trained on AudioSet)
- Fine-tuned per wake word using Piper TTS — generates thousands of audio clips with diverse speaker voices, room acoustics, and noise augmentation
- Training a new wake word requires only the text phrase — no recorded audio samples needed
- Runs 15–20 models simultaneously on a single core of a Raspberry Pi 3
- Python-based; integrates with Wyoming protocol (Home Assistant)
Best for
- Home automation and self-hosted voice assistants
- Projects where custom wake words are needed without recording real speakers
- Raspberry Pi and similar SBC deployments
- Open-source, privacy-first builds where commercial licensing is not acceptable
- Teams with ML knowledge who can tune the model
Limitation:
openWakeWord models are likely too large for microcontrollers and highly constrained embedded hardware (<1MB RAM). Porcupine is better suited for these environments.
Porcupine (Picovoice — Commercial)
Porcupine is a highly-accurate, commercial wake word engine from Picovoice, optimised for constrained hardware from microcontrollers to smartphones. It is the most widely deployed on-device wake word engine in commercial products.
Key capabilities
- 97%+ accuracy with <1 false alarm per 10 hours in background speech conditions
- Custom wake words: type in the phrase, receive a trained model within seconds via transfer learning — no audio recording needed
- Runs on ARM Cortex-M4 (microcontroller), Raspberry Pi, Android, iOS, web
- SDKs for Python, iOS, Android, Web, React Native, .NET, Java, Go
- Consistent performance across accents and noise conditions
Best for
- Commercial products requiring reliability guarantees
- Microcontroller and highly constrained embedded deployments
- Mobile apps (iOS/Android) needing always-on wake word
- Enterprise voice products where support and SLAs matter
- Multi-language or multi-accent requirements
openWakeWord vs Porcupine
| Dimension | openWakeWord | Porcupine |
|---|---|---|
| Licence | Apache 2.0 (open-source) | Commercial (free tier available) |
| Custom wake words | Yes (TTS-generated training) | Yes (type phrase → model in seconds) |
| Accuracy | Good (can exceed Porcupine with tuning) | 97%+ with <1 false alarm / 10hrs |
| Minimum hardware | Raspberry Pi 3 (single core) | ARM Cortex-M4 (microcontroller) |
| Languages | English-focused (expanding) | Multi-language |
| Training data needed | None (TTS-generated) | None (transfer learning from text) |
| Home Assistant support | Native (Wyoming protocol) | Via custom integration |
Privacy: Why On-Device Matters
The privacy guarantee of wake word detection depends entirely on where the audio is processed. A cloud-based wake word engine must stream microphone audio to a server 24/7 — the server "decides" when the wake word was said. An on-device engine processes audio locally; no audio leaves the device until the wake word is confirmed.
On-device privacy guarantees
- No audio is transmitted until the wake word is detected locally
- Works offline — no network required for activation
- No vendor can collect ambient audio
- Compliant with data sovereignty laws (GDPR, HIPAA contexts)
- Cannot be affected by server outages or rate limits during wake detection
What to tell users
- Clearly document what audio is sent after wake word detection and where
- Provide a mute/disable button that is hardware-enforced, not software-only
- Log activations so users can audit when the device was triggered
- False positive audio (overheard in a false alarm) should not be stored or processed — discard if the STT returns low-confidence results
Integrating Wake Words into a Voice Pipeline
Wake word → pipeline trigger flow:
- Wake word engine runs continuously, analysing 10–30ms audio frames (CPU only)
- Detection confidence exceeds threshold → engine fires a trigger event
- Pipeline activates VAD to capture the following utterance
- Audio (post-wake-word) is buffered and sent to STT
- On low-confidence STT result or very short utterance: discard, return to listening
- On valid transcript: continue to LLM → TTS
- After response completes: return to wake word listening mode
The wake word itself is typically excluded from the STT transcript — the utterance that matters starts immediately after the wake phrase.
Checklist: Do You Understand This?
- What is the difference between a false positive and false negative in wake word detection, and what is the industry target for each?
- How does openWakeWord generate training data for a custom wake word without recording real speakers?
- Why can Porcupine run on a microcontroller but openWakeWord cannot?
- What is the privacy difference between on-device and cloud-based wake word detection?
- In the wake word → pipeline trigger flow, what happens when the STT returns a low-confidence result after an activation?
- What is the detection threshold, and what happens when you raise vs lower it?