Text to Speech

Convert text into natural-sounding speech using advanced AI voice synthesis models.

The Text to Speech (TTS) demo application transforms written text into high-quality spoken audio. It uses ElevenLabs models to stream generated speech and gives users fine-grained control over voice settings.

Loading...
Web
Mobile
Loading...

Features

Discover the powerful capabilities of this AI-powered voice synthesis solution:

Large voice library

Browse a large library of voices from Eleven Labs to find a style that fits your product.

Real-time audio streaming

Experience near-instantaneous audio generation with streaming delivery, providing immediate feedback as your content comes to life.

Integrated audio player

Enjoy a full-featured playback interface with precise controls for playback speed and convenient options to download generated audio files.

Voice customization

Fine-tune your audio output with settings like speed, stability, similarity, and speaker boost, depending on the selected voice and model.

Intuitive user experience

Benefit from a thoughtfully designed interface that makes transforming text to speech effortless and efficient, even for first-time users.

AI models

This application primarily utilizes specialized text-to-speech models from Eleven Labs.

For comprehensive information about available voices and advanced customization techniques, consult the ElevenLabs SDK documentation.

Data flow

Unlike the chat, image, and RAG demos, the TTS demo does not persist generations in the database by default. The API streams back audio directly from ElevenLabs, and the UI handles playback and download on the client side.

Structure

The Text-to-Speech feature is organized across the monorepo for maximum flexibility and maintainability:

Core

The shared TTS logic lives in @workspace/ai-tts, implemented in packages/ai/tts/src:

  • Validation schemas and constants for TTS options
  • The ElevenLabs client wrapper
  • Voice mapping utilities and streamed text-to-speech generation

API

The packages/api package wires the TTS app through packages/api/src/modules/ai/tts.ts.

That module validates the text-to-speech payload, applies shared middleware like authentication, rate limiting, and credits, and then delegates to @workspace/ai-tts, which fetches voices and streams generated audio from ElevenLabs back to the client.

Web

The Next.js application (apps/web) provides the user interface:

  • src/app/[locale]/(apps)/tts/**: route entry points for the TTS app
  • src/modules/tts/**: feature modules for the composer, voice selector, settings controls, playback, and visualizer UI

Mobile

The Expo/React Native application (apps/mobile) provides the native mobile experience:

  • src/app/(apps)/tts/**: route entry points for the mobile TTS app
  • src/modules/tts/**: mobile-native modules for composing and playing speech
  • API interaction: uses the same shared Hono client as the web app for consistent communication with the backend

This architecture ensures perfect consistency between platforms while allowing for optimized UI implementations tailored to each environment.

How is this guide?

Last updated on

On this page

Make AI your edge, not replacement.Get TurboStarter AI