Chat with PDF
Engage in conversations with your PDF documents using AI to extract insights and answer questions.
The Chat with PDF demo application enables intelligent interaction with document content through a conversational AI interface. Upload PDF files and instantly engage in natural dialogue about their contents, asking questions, requesting summaries, and extracting key information with remarkable accuracy.
Features
Transform how you interact with document content through these powerful capabilities:
PDF upload
Easily upload PDF files directly into the application for analysis.
Contextual conversation
Chat with an AI that understands the content of your uploaded PDF, providing relevant answers based on the text.
Information extraction
Quickly find specific information, key points, or summaries within the document through natural language queries.
Source highlighting (coming soon)
Visualize exactly which document sections informed the AI's responses with precise source highlighting.
Multi-document intelligence (coming soon)
Conduct sophisticated conversations spanning multiple uploaded documents, enabling cross-document analysis and comparison.
Setup
To implement the "Chat with PDF" application in your project, configure these essential backend services:
Database
Set up PostgreSQL with the pgvector
extension to efficiently store
conversation history, document metadata, and vector embeddings for semantic
search.
Storage
Configure S3-compatible cloud storage for secure management of uploaded PDF documents.
You'll also need to obtain API keys for both the conversational AI models and the embedding models used for text processing.
AI models
This application leverages two complementary AI model types working together:
- Large Language Models (LLMs): Provide sophisticated natural language understanding to interpret your questions and generate contextually appropriate responses based on document content.
- Embedding Models: Convert document text segments into numerical vector representations that enable efficient semantic similarity search and Retrieval-Augmented Generation (RAG).
Configure the providers for the models you wish to use:
OpenAI
Utilize GPT models for conversational AI and advanced embedding models for vector representation.
Anthropic
Implement Claude models for sophisticated reasoning and nuanced document understanding.
Google AI
Leverage Gemini models for powerful conversational capabilities with document content.
Replicate
Access diverse open-source embedding models for flexible implementation options.
For comprehensive configuration details, consult the AI SDK documentation covering provider setup and model selection.
Data persistence
The application stores data related to chats, documents, and embeddings to provide a persistent experience.
Database
Learn more about database services in TurboStarter AI.
Application data is organized within a dedicated PostgreSQL schema named pdf
:
chats
: captures essential metadata for each document-specific conversation session.messages
: stores all user queries and AI responses within conversation threads.documents
: maintains comprehensive tracking of uploaded PDF files, including filenames and storage locations.embeddings
: contains text segments extracted from PDFs along with their vector representations (usingpgvector
'svector
data type). To optimize similarity searches critical for RAG processing, the system creates an index (embeddingIndex
using HNSW) on theembedding
column.
Storage
Learn more about cloud storage services in TurboStarter AI.
The PDF files uploaded by users are securely stored in your configured cloud storage bucket. The path
field in the documents
table maintains the precise reference to each file's location.
Structure
The "Chat with PDF" feature is architected across the monorepo for optimal organization and code reuse:
Core
The @turbostarter/ai
package (packages/ai
) contains the essential logic under modules/pdf
:
- Comprehensive types, validation schemas, and constants specific to PDF processing
- Advanced document parsing, text segmentation, and embedding generation utilities
- Core API logic for managing conversations, performing RAG-based lookups, and interacting with LLMs
- Database operations for storing and retrieving conversations, documents, and embeddings
- Shared utilities for managing PDF file uploads and downloads
API
The packages/api
package defines the backend API endpoints using Hono:
src/modules/ai/pdf/pdf.router.ts
: implements Hono RPC routes for document upload and conversation management, handles input validation, applies middleware (authentication, credit management), and invokes the core functionality from@turbostarter/ai
.
Web
The Next.js application (apps/web
) delivers an intuitive user interface:
src/app/[locale]/(apps)/pdf/**
: contains the Next.js App Router pages and layouts for the document conversation experiencesrc/components/pdf/**
: houses reusable React components specific to the PDF interaction UI (document upload, conversation interface, message display)
Mobile
The Expo/React Native application (apps/mobile
) provides a native mobile experience:
src/app/pdf/**
: defines the screens for the mobile document conversation interfacesrc/components/pdf/**
: contains React Native components optimized for mobile document interaction- API integration: utilizes the same Hono RPC client (
packages/api
) as the web app for consistent backend communication
This architecture ensures that core AI processing and data handling logic is shared across platforms, while enabling optimized UI implementations tailored to each environment.
How is this guide?
Last updated on