Text to Speech
Convert text into natural-sounding speech using advanced AI voice synthesis models.
The Text to Speech (TTS) demo application transforms written text into high-quality spoken audio. It uses ElevenLabs models to stream generated speech and gives users fine-grained control over voice settings.
Features
Discover the powerful capabilities of this AI-powered voice synthesis solution:
Large voice library
Browse a large library of voices from Eleven Labs to find a style that fits your product.
Real-time audio streaming
Experience near-instantaneous audio generation with streaming delivery, providing immediate feedback as your content comes to life.
Integrated audio player
Enjoy a full-featured playback interface with precise controls for playback speed and convenient options to download generated audio files.
Voice customization
Fine-tune your audio output with settings like speed, stability, similarity, and speaker boost, depending on the selected voice and model.
Intuitive user experience
Benefit from a thoughtfully designed interface that makes transforming text to speech effortless and efficient, even for first-time users.
AI models
This application primarily utilizes specialized text-to-speech models from Eleven Labs.
For comprehensive information about available voices and advanced customization techniques, consult the ElevenLabs SDK documentation.
Data flow
Unlike the chat, image, and RAG demos, the TTS demo does not persist generations in the database by default. The API streams back audio directly from ElevenLabs, and the UI handles playback and download on the client side.
Structure
The Text-to-Speech feature is organized across the monorepo for maximum flexibility and maintainability:
Core
The shared TTS logic lives in @workspace/ai-tts, implemented in packages/ai/tts/src:
- Validation schemas and constants for TTS options
- The ElevenLabs client wrapper
- Voice mapping utilities and streamed text-to-speech generation
API
The packages/api package wires the TTS app through packages/api/src/modules/ai/tts.ts.
That module validates the text-to-speech payload, applies shared middleware like authentication, rate limiting, and credits, and then delegates to @workspace/ai-tts, which fetches voices and streams generated audio from ElevenLabs back to the client.
Web
The Next.js application (apps/web) provides the user interface:
src/app/[locale]/(apps)/tts/**: route entry points for the TTS appsrc/modules/tts/**: feature modules for the composer, voice selector, settings controls, playback, and visualizer UI
Mobile
The Expo/React Native application (apps/mobile) provides the native mobile experience:
src/app/(apps)/tts/**: route entry points for the mobile TTS appsrc/modules/tts/**: mobile-native modules for composing and playing speech- API interaction: uses the same shared Hono client as the web app for consistent communication with the backend
This architecture ensures perfect consistency between platforms while allowing for optimized UI implementations tailored to each environment.
How is this guide?
Last updated on