Voice

Build a LiveKit-powered voice assistant for web and mobile, run it locally, and deploy the agent with the repository's existing build pipeline.

The Voice app is the most real-time part of TurboStarter AI. Instead of a request-response UI, it gives users a shared audio session with an agent that can listen, reason, speak back, and stream conversation state across web and mobile.

Loading...
Web
Mobile
Loading...

Capabilities

LiveKit gives this app more than a microphone button. It provides the realtime transport, room lifecycle, participant state, and media controls that make the experience feel like a proper call interface instead of a chat form with audio attached.

Why LiveKit?

LiveKit is a powerful realtime platform that provides the infrastructure for the voice app. It is used to transport the audio and video streams between the client and the server.

It is also used to store the session state and to control the media tracks. It powers millions of real-time voice and video sessions every day, including a ChatGPT Voice mode.

Web

On web, the app leans into the full browser surface area. The desktop experience is especially good for demos, internal copilots, sales assistants, and any workflow where screen sharing matters.

Realtime conversation UI

Users can start a session, interrupt naturally, and follow the conversation through a live transcript while the agent is speaking or listening.

Camera and screen sharing

The web client exposes microphone, camera, and screen-share controls, which makes it useful for support, onboarding, and collaborative assistant flows.

Visualizer and layout controls

The visualizer is customizable, and the layout adapts when transcript, camera, or screen-share tiles are active.

Browser-native testing surface

Web is the fastest place to test prompts, media permissions, interruptions, and room behavior while you are iterating on the agent.

Web voice session

Mobile

On mobile, the same LiveKit room and agent stack is presented through a native-first session layout. The experience is optimized for touch controls, safe areas, and audio-session handling on real devices.

Native audio session handling

The mobile app manages the underlying audio session so the voice experience behaves like a real call instead of a fragile media demo.

Transcript plus in-session chat

Users can keep the conversation voice-first while still opening transcript and chat surfaces when they want more control or visibility.

Camera and screen sharing

The mobile UI also supports microphone, camera, and screen-share controls, with a second media tile shown when visual tracks are active.

Shared backend, native UI

Both apps use the same voice backend and request lifecycle, while keeping the interaction model natural on phones and tablets.

Mobile voice session

Shared infrastructure

Under the UI differences, the architecture stays consistent across platforms. Both apps request a room token from the shared API layer, join a LiveKit room, and then hand the live session off to a LiveKit agent worker.

Shared request lifecycle

The web and mobile welcome screens trigger the same shared voice route module in packages/api/src/modules/ai/voice.ts, so auth, credits, and token creation stay centralized.

Shared voice package

The LiveKit token logic, environment handling, agent entrypoint, and deployment assets all live in packages/ai/voice/src.

Live session transport

Session state, media tracks, transcript messages, and agent events are streamed over LiveKit instead of the text-streaming path used by the chat apps.

Architecture

The voice app still uses the same monorepo boundaries as the rest of the AI product. The difference is that the backend work happens around room creation and agent sessions rather than around a single streamed HTTP response.

  1. The user starts a session from the platform-specific voice UI in src/modules/voice/**.
  2. The shared API route module in packages/api/src/modules/ai/voice.ts runs through the normal request lifecycle, including auth context and credit checks.
  3. The token-creation logic in packages/ai/voice/src/api.ts creates a short-lived LiveKit participant token and room configuration.
  4. The web and mobile SessionProvider implementations use LiveKit's token source helpers to join the room.
  5. The LiveKit agent worker defined in packages/ai/voice/src/agent/main.ts joins that room and drives the conversation.

This keeps the app aligned with the rest of the starter while still giving voice its own realtime transport and deployment model.

Voice architecture

The most useful mental model for this app is the classic voice pipeline: speech comes in, gets transcribed, passed to an LLM, and then rendered back to audio. LiveKit Agents supports that pattern directly, and it is the right baseline to understand before looking at realtime speech-to-speech models.

This is the recommended architecture to understand first because it gives you the most flexibility. You can mix providers for transcription, reasoning, and voice quality instead of accepting one provider's full stack.

import { AgentSession } from "@livekit/agents";

const session = new AgentSession({
  stt: "deepgram/nova-3:multi",
  llm: "openai/gpt-4.1-mini",
  tts: "cartesia/sonic-3:voice-id",
});

In the repository, this full pipeline is already scaffolded in packages/ai/voice/src/agent/main.ts, along with turn detection, VAD, and noise-cancellation hooks.

LiveKit's own docs present both approaches side by side, which is a helpful way to reason about tradeoffs: pipeline mode gives you finer provider control, while realtime mode gives you a more tightly integrated speech experience. See the Voice AI quickstart, turn handling guide, and realtime models overview.

STT and TTS provider options

One of the strengths of LiveKit Agents is that you are not locked into a single speech stack. You can mix and match providers based on latency, language coverage, cost, and voice quality.

CapabilityCommon choicesNotes
STTDeepgram, AssemblyAI, OpenAI, Google Cloud, SpeechmaticsDeepgram is a common starting point for low-latency multilingual transcription, but LiveKit supports a wider plugin ecosystem.
TTSCartesia, ElevenLabs, Deepgram, OpenAI, Google Cloud, RimeCartesia and ElevenLabs are common picks when voice quality is the main product differentiator.
Realtime speech-to-speechOpenAI Realtime, Gemini Live, xAI Grok Voice, Amazon Nova Sonic, PhonicRealtime models reduce app-layer composition but change how you think about control and provider mix.

If you are exploring the space, start with STT models, TTS models, and realtime models. For product-specific guidance inside this docs set, see Speech, Transcription, OpenAI, and ElevenLabs.

Create a project and set environment variables

The repository already knows how to build and run the voice agent, but you still need a LiveKit project to point it at. You can use LiveKit Cloud for the smoothest path, or run the LiveKit server locally when you want full control during development.

LiveKit Cloud is the easiest way to get from code to a working agent. It gives you hosted transport, agent deployment, observability, and the cloud dashboard in one place.

Create a project in the LiveKit Cloud dashboard.

Install the LiveKit CLI and link it to your account:

brew install livekit-cli
lk cloud auth

Add the LiveKit credentials to apps/web/.env.local, because the @workspace/ai-voice package scripts load that file by default:

apps/web/.env.local
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-api-secret

# Optional: voice-model providers
OPENAI_API_KEY=your-openai-api-key
DEEPGRAM_API_KEY=your-deepgram-api-key
CARTESIA_API_KEY=your-cartesia-api-key
ELEVENLABS_API_KEY=your-elevenlabs-api-key

You do not need every provider key on day one. Add only the providers your chosen voice pipeline uses. The most important local prerequisite is simply having the lk CLI installed and available on your PATH, because the repository deploy script shells out to it directly.

All setup pre-built

The starter already includes Docker-based build assets for the agent itself, but it does not currently spin up a LiveKit server through Docker Compose for you. Treat LiveKit transport as a separate dependency from Postgres and the rest of the local services.

Run the agent locally

Once your environment variables are in place, local development is straightforward. The easiest path is to let Turbo orchestrate the voice package task graph for you, because packages/ai/voice/turbo.json already declares dev -> download-files -> build.

Recommended approach

From the ai repository root, you can run the voice worker through Turbo and let it handle the build and pre-download steps automatically:

pnpm with-env turbo dev --filter=@workspace/ai-voice

This is the best command to document for day-to-day development. The step-by-step commands below are still useful when you want to understand what happens under the hood or run each piece manually.

If you are already working in the full monorepo, pnpm dev from the repository root is also a valid path. The root script runs pnpm with-env turbo dev, so the voice package can be started as part of the wider development graph alongside the web and mobile apps.

Install dependencies for the monorepo:

pnpm install

Build the package first if you want to run the pieces manually:

pnpm --filter @workspace/ai-voice build

This runs TypeScript compilation for @workspace/ai-voice and produces the dist output used by the worker's production-style scripts.

Optionally pre-download local files such as VAD-related assets:

pnpm --filter @workspace/ai-voice download-files

This boots the built agent in a special download mode so it can fetch any local runtime assets it needs ahead of time. In practice, this is where model helpers such as Silero VAD assets can be warmed up before you enter a live session.

Run the agent in development mode:

pnpm --filter @workspace/ai-voice dev

This loads apps/web/.env.local, starts the LiveKit agent entrypoint in development mode, prewarms the VAD, and waits for room jobs from your local or cloud LiveKit project.

In a separate terminal, run the app itself:

pnpm dev

This starts the rest of the product surface so you can actually join the room from the web app, mobile app, or any other connected client.

The package also includes pnpm --filter @workspace/ai-voice connect and pnpm --filter @workspace/ai-voice start for alternative LiveKit agent startup modes. If you prefer the Turbo task graph for those flows too, start is also wired to depend on download-files in packages/ai/voice/turbo.json. For the bigger picture on these modes, see LiveKit's voice quickstart.

Dashboard and playground

LiveKit Cloud gives you two especially useful surfaces while building. The dashboard is your operational control plane, and the playground is your fastest browser-based testing surface when you do not want to open the full app.

Dashboard

The LiveKit Cloud dashboard is where you create projects, manage API keys, inspect agent deployments, and review operational signals after deployment.

Project and key management

Create projects, generate credentials, and manage the values that end up in your local env files or cloud secrets.

Deployments and health

Review deployment status, session counts, errors, limits, and other agent health signals from one place.

Logs and debugging

Use the dashboard's runtime and build logs when a deployment starts failing, cold starts become visible, or a model provider is misconfigured.

LiveKit Cloud dashboard

Playground

The Agents Playground is useful when you want to verify the agent itself before you involve the full product UI. It is especially handy while tuning prompts, testing interruptions, or validating a new STT/TTS provider combination.

You can use the playground against a locally running agent in dev mode or a deployed agent in LiveKit Cloud. LiveKit covers this flow in the Voice AI quickstart.

LiveKit Agents Playground

Deployment

The repository already contains the agent build and staging flow, so deployment is less about writing Docker logic and more about understanding what the existing scripts are doing for you.

deploy command

The main entrypoint is the package-level deploy script in packages/ai/voice/package.json. It stages a deployment workspace and then hands that staged directory off to the LiveKit CLI.

pnpm --filter @workspace/ai-voice run deploy

What the staging script does

The file packages/ai/voice/src/deployment/stage-agent-deploy.ts prepares a clean .lk-stage-ai-voice workspace at the repo root. That staging step keeps the agent deployment isolated from the rest of the monorepo while still letting the Docker build reuse workspace packages.

It copies:

  • the root package.json
  • pnpm-lock.yaml and pnpm-workspace.yaml
  • the packages and tooling directories
  • the agent-specific deployment Dockerfile

How the container build works

The Dockerfile in packages/ai/voice/src/deployment/Dockerfile is already set up for the agent. It installs dependencies for @workspace/ai-voice, builds the package, pre-downloads model files, and then starts the built worker with the production start script.

That means the important work has already been encoded into the repository:

  • dependency installation is workspace-aware
  • the build only targets @workspace/ai-voice
  • download-files runs during image build
  • the final container boots directly into the LiveKit agent worker

How LiveKit Cloud sees the deployment

After staging, the script runs lk agent deploy against .lk-stage-ai-voice. LiveKit Cloud then builds the image, stores deployment metadata, and exposes the resulting agent through the dashboard and playground surfaces.

If you are using a cloud deployment, keep provider credentials such as OPENAI_API_KEY, DEEPGRAM_API_KEY, CARTESIA_API_KEY, or ELEVENLABS_API_KEY in LiveKit Cloud secrets instead of baking them into the image. LiveKit documents this flow in its deployment overview.

LiveKit project metadata

The staged workspace also contains a livekit.toml file once the CLI has linked that deployment to a specific LiveKit project and agent. Think of it as deployment metadata owned by the LiveKit toolchain rather than application code you should hand-edit frequently.

Structure

The voice feature spans the shared agent package, the API layer, and the two frontends. Keeping these boundaries clear makes it much easier to change models or deployment strategy without rebuilding the UI from scratch.

Core

The shared voice backend logic lives in packages/ai/voice/src. This is where token generation, environment handling, agent instructions, the agent entrypoint, and deployment assets live.

API

The voice route wiring lives in packages/api/src/modules/ai/voice.ts. It sits inside the same Hono request pipeline as the other AI apps, so voice still benefits from shared auth, validation, and credit logic before the request reaches the LiveKit-specific code.

Web

The web route entrypoints live in apps/web/src/app/[locale]/(apps)/voice/**, while the actual feature implementation lives in apps/web/src/modules/voice/**. That module tree contains the welcome screen, controls, transcript, session provider, settings, and visualizer components.

Mobile

The mobile route entrypoints live in apps/mobile/src/app/(apps)/voice/**, while the feature logic lives in apps/mobile/src/modules/voice/**. That is where the mobile session provider, controls, transcript, video tile, chat composer, and animations are defined.

Voice sits at the intersection of speech, transcription, provider choice, and realtime product design. These pages are the best next stop if you want to go deeper into one part of the stack.

References

These are the best official LiveKit references to keep open while working on the voice app:

How is this guide?

Last updated on

On this page

Make AI your edge, not replacement.Get TurboStarter AI