Voice Assistants

Voice AI combines speech recognition, language model reasoning, and speech synthesis into a real-time pipeline. Every component adds latency, and users are sensitive to delays in spoken conversation. This section covers the architecture, latency design, wake word systems, and the tradeoffs between on-device and cloud processing.

In This Section

Voice Pipeline Architecture

The STT → LLM → TTS stack end to end — components, connection patterns, and where latency accumulates.

Latency Design

Techniques for reducing perceived latency — streaming, early TTS start, response chunking, and latency budgets per component.

Wake Words & Privacy

How wake word detection works, always-on microphone privacy implications, and on-device wake word options.

On-Device vs Cloud

When to run voice AI on-device vs in the cloud — latency, privacy, cost, and capability tradeoffs for each approach.