Voice Assistants
Voice AI combines speech recognition, language model reasoning, and speech synthesis into a real-time pipeline. Every component adds latency, and users are sensitive to delays in spoken conversation. This section covers the architecture, latency design, wake word systems, and the tradeoffs between on-device and cloud processing.
In This Section
Voice Pipeline Architecture
The STT → LLM → TTS stack end to end — components, connection patterns, and where latency accumulates.
Latency Design
Techniques for reducing perceived latency — streaming, early TTS start, response chunking, and latency budgets per component.
Wake Words & Privacy
How wake word detection works, always-on microphone privacy implications, and on-device wake word options.
On-Device vs Cloud
When to run voice AI on-device vs in the cloud — latency, privacy, cost, and capability tradeoffs for each approach.