The Complete Picture

The Complete Picture

Here's my entire RAG implementation. No vector database. No API keys. Everything in memory:

import { pipeline } from "@huggingface/transformers";
import { laws } from "./dataset";

// Generate embeddings
const generateEmbedding = async (text: string) => {
  const extractor = await pipeline(
    "feature-extraction",
    "Xenova/all-MiniLM-L6-v2",
  );
  const output = await extractor(text, {
    pooling: "mean",
    normalize: true,
  });
  return Array.from(output.data);
};

// Measure similarity
const measureSimilarity = (
  queryEmbedding: number[],
  lawEmbeddings: LawEmbedding[],
) => {
  const results = lawEmbeddings.map((law) => {
    const dotProduct = law.embedding.reduce(
      (sum, val, index) => sum + val * queryEmbedding[index],
      0,
    );
    return {
      ...law,
      similarityCosine: dotProduct,
    };
  });
  return results.sort((a, b) => b.similarityCosine - a.similarityCosine);
};

// Run it
const question = "What is the legal drinking age in Uganda?";
const queryEmbedding = await generateEmbedding(question);
const lawEmbeddings = await Promise.all(
  laws.map(async (law) => ({
    ...law,
    embedding: await generateEmbedding(law.content),
  })),
);
const results = measureSimilarity(queryEmbedding, lawEmbeddings);

That's RAG. The "R" part, anyway. Retrieval through semantic similarity.

What This Teaches You

Understanding RAG this way means you can now:

  • Debug similarity scores: Why is my relevant document scoring low? Maybe it needs better chunking, or a different embedding model
  • Choose the right tools: Do I need Pinecone for 100 documents? Probably not. For 10 million? Probably yes
  • Understand parameters: When you see normalize: true, you know it's ensuring you compare direction, not magnitude
  • Scale intelligently: You know what problems you're solving as you add complexity

Most importantly: you're not scared of the tech stack anymore. When you see "vector database" or "semantic search," you think: "Oh, it's doing what I did, just faster."

What We Built

Let's recap the entire journey:

1. Understanding Embeddings

  • Text → 384-dimensional vectors
  • Vectors = directions in semantic space
  • Similar meanings = similar directions

2. Understanding Similarity

  • Cosine similarity measures angles between vectors
  • High score (0.8+) = semantically similar
  • Low score (< 0.5) = unrelated

3. Understanding Search

  • Linear scan: compare query against all documents
  • O(n) complexity - fine for small datasets
  • Vector DBs optimize this for scale

4. Understanding Storage

  • In-memory arrays for small datasets
  • JSON files for persistence
  • Vector databases for millions of documents

5. Understanding RAG

  • Retrieve: find relevant documents via similarity
  • Augment: insert context into prompt
  • Generate: LLM answers based on your data

6. Understanding Prompts

  • Structure guides the model's behavior
  • Clear instructions = better outputs
  • Context + task = effective prompts

All without:

  • Vector databases
  • API keys
  • Credit cards
  • Complex infrastructure

Just concepts, code, and understanding.


Next in this series: We'll explore fine-tuning, evals, and working with LLM APIs - all with the same first-principles approach. No buzzwords, just building to understand.