The Complete Picture
The Complete Picture
Here's my entire RAG implementation. No vector database. No API keys. Everything in memory:
import { pipeline } from "@huggingface/transformers";
import { laws } from "./dataset";
// Generate embeddings
const generateEmbedding = async (text: string) => {
const extractor = await pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
);
const output = await extractor(text, {
pooling: "mean",
normalize: true,
});
return Array.from(output.data);
};
// Measure similarity
const measureSimilarity = (
queryEmbedding: number[],
lawEmbeddings: LawEmbedding[],
) => {
const results = lawEmbeddings.map((law) => {
const dotProduct = law.embedding.reduce(
(sum, val, index) => sum + val * queryEmbedding[index],
0,
);
return {
...law,
similarityCosine: dotProduct,
};
});
return results.sort((a, b) => b.similarityCosine - a.similarityCosine);
};
// Run it
const question = "What is the legal drinking age in Uganda?";
const queryEmbedding = await generateEmbedding(question);
const lawEmbeddings = await Promise.all(
laws.map(async (law) => ({
...law,
embedding: await generateEmbedding(law.content),
})),
);
const results = measureSimilarity(queryEmbedding, lawEmbeddings);
That's RAG. The "R" part, anyway. Retrieval through semantic similarity.
What This Teaches You
Understanding RAG this way means you can now:
- Debug similarity scores: Why is my relevant document scoring low? Maybe it needs better chunking, or a different embedding model
- Choose the right tools: Do I need Pinecone for 100 documents? Probably not. For 10 million? Probably yes
- Understand parameters: When you see
normalize: true, you know it's ensuring you compare direction, not magnitude - Scale intelligently: You know what problems you're solving as you add complexity
Most importantly: you're not scared of the tech stack anymore. When you see "vector database" or "semantic search," you think: "Oh, it's doing what I did, just faster."
What We Built
Let's recap the entire journey:
1. Understanding Embeddings
- Text → 384-dimensional vectors
- Vectors = directions in semantic space
- Similar meanings = similar directions
2. Understanding Similarity
- Cosine similarity measures angles between vectors
- High score (0.8+) = semantically similar
- Low score (< 0.5) = unrelated
3. Understanding Search
- Linear scan: compare query against all documents
- O(n) complexity - fine for small datasets
- Vector DBs optimize this for scale
4. Understanding Storage
- In-memory arrays for small datasets
- JSON files for persistence
- Vector databases for millions of documents
5. Understanding RAG
- Retrieve: find relevant documents via similarity
- Augment: insert context into prompt
- Generate: LLM answers based on your data
6. Understanding Prompts
- Structure guides the model's behavior
- Clear instructions = better outputs
- Context + task = effective prompts
All without:
- Vector databases
- API keys
- Credit cards
- Complex infrastructure
Just concepts, code, and understanding.
Next in this series: We'll explore fine-tuning, evals, and working with LLM APIs - all with the same first-principles approach. No buzzwords, just building to understand.