Embeddings
Understand embeddings, vector search, and semantic retrieval for modern AI apps, with practical RAG patterns, code examples, and TurboStarter AI references.
Embeddings let machines represent text as vectors, which makes meaning searchable. Instead of matching exact keywords, you can compare semantic similarity: "pricing page" and "billing plan" may be close together even if they share few words.
That is why embeddings are a core building block for search, recommendations, clustering, deduplication, and especially RAG.
If text generation is how models answer, embeddings are often how they find
In many modern AI systems, embeddings are the bridge between raw content and useful retrieval.
What embeddings are good for
Semantic search, knowledge retrieval, document chat, duplicate detection, recommendations, and content grouping.
Where they appear in TurboStarter AI
The Knowledge RAG app uses embeddings to index uploaded PDFs and retrieve relevant chunks before generating an answer.
Best fit
Use embeddings when you need "similar meaning", not just "matching words".
Mental model
Imagine every sentence in your system gets turned into a point in a very high-dimensional space. Sentences about similar ideas land near each other. Queries can be embedded too, and then compared against stored vectors.
That gives you a simple retrieval loop:
This pattern is the backbone of many retrieval-augmented systems.
What embeddings are not
It is just as helpful to understand the boundaries of embeddings as it is to understand their strengths. That keeps teams from expecting retrieval systems to behave like answer engines on their own.
Not generation
Embeddings do not answer questions by themselves. They are for representation and retrieval, not final responses.
Not magic memory
Embeddings improve retrieval, but weak chunking, noisy source data, or poor ranking can still produce bad context.
Not only for RAG
RAG is the most popular use case, but embeddings are also useful for search, recommendations, classification pipelines, and analytics.
Core concepts that matter
A few concepts account for most of the quality difference between a weak embeddings system and a strong one. These are the ideas worth learning first.
Long documents are typically split into smaller sections before embedding. Chunk size and overlap shape retrieval quality more than many teams expect.
Once text is converted into vectors, you compare vectors with metrics like cosine similarity or cosine distance to find the closest matches.
You need somewhere to store embeddings and query them efficiently. That can
be a vector database, or Postgres with pgvector, as used in TurboStarter
AI.
Retrieving more chunks increases your chance of finding the right one, but also adds noise. Choosing the right threshold and top-k matters.
Retrieved chunks should be passed into a text-generation model with clear instructions to answer from the supplied context.
Common stack
A common production-friendly embeddings stack looks like this:
- LangChain for PDF loading and text splitting
- the AI SDK for
embedandembedMany - Postgres with
pgvectoror a vector database for similarity search
That is the key production pattern: embed source chunks once, then embed each user query at request time.
If you want to see how this capability is used in the starter, Knowledge RAG is the best companion page.
AI SDK example
The AI SDK gives you simple building blocks for both query-time embedding and batch indexing. Those two modes cover most real-world embeddings workflows.
import { openai } from "@ai-sdk/openai";
import { embed } from "ai";
const { embedding } = await embed({
model: openai.embedding("text-embedding-3-small"),
value: "How do I add AI chat to my SaaS app?",
});
console.log(embedding.length);Use this when embedding a single query at request time.
import { openai } from "@ai-sdk/openai";
import { embedMany } from "ai";
const { embeddings } = await embedMany({
model: openai.embedding("text-embedding-3-small"),
values: [
"TurboStarter supports AI chat.",
"TurboStarter includes background jobs.",
"TurboStarter ships with billing integrations.",
],
});
console.log(embeddings.length);Use this when indexing documents, help center content, or product knowledge in bulk.
Similarity search in plain language
Once you have vectors, you rank documents by "how close" they are to the query vector. In many systems, that means using cosine similarity or cosine distance and then selecting the top few chunks above some quality threshold.
This is one reason pgvector has become such a practical choice: many teams can add semantic retrieval to an existing Postgres-backed app without introducing a separate data system on day one.
Where teams usually go wrong
Embeddings are conceptually simple, but retrieval quality often breaks down in the implementation details. These are some of the most common failure points.
Chunks are too big
Large chunks blur topics together and make retrieval less precise. Smaller overlapping chunks are often easier to retrieve well.
Everything gets embedded blindly
Navigation chrome, repeated headers, or noisy boilerplate can pollute retrieval quality.
No retrieval threshold
Returning low-similarity chunks can hurt answer quality more than returning fewer chunks.
Retrieval is treated as the final answer
Retrieved context still needs a generation step that explains, compares, or answers in a user-friendly way.
When to use embeddings
This quick comparison helps separate problems that benefit from semantic retrieval from problems that are better solved with plain generation or deterministic logic.
| Problem | Use embeddings? | Why |
|---|---|---|
| Find docs related to a user question | Yes | Semantic similarity is usually better than keyword matching alone. |
| Answer questions from uploaded PDFs | Yes | Embeddings help retrieve relevant chunks before generation. |
| Write a product announcement | Probably not | That is primarily a text generation problem. |
| Compute an exact invoice total | No | This is deterministic logic, not semantic retrieval. |
Useful references
These references are a good next step if you want to understand both the practical implementation side and the research ideas behind modern embeddings systems.
How is this guide?
Last updated on
Reasoning
Learn when reasoning-capable AI models help, how to use them responsibly, and how TurboStarter AI exposes reasoning in chat experiences.
Transcription
Learn how speech-to-text works, when to use AI transcription, and how to design transcription flows for notes, search, captions, and voice interfaces.