RAG for Code: Build a Codebase-Aware AI Assistant That Does Not Hallucinate Your API

The most expensive failure mode in AI-assisted coding is an LLM that confidently calls a method that does not exist in your codebase or your version of a library. Retrieval-Augmented Generation (RAG) for code solves this by making the AI aware of your actual APIs, conventions, and patterns before it generates anything. This is what Cursor’s “codebase context” and GitHub Copilot’s workspace awareness are built on.

⚡ TL;DR: Build a vector index of your codebase (function signatures, docstrings, examples). When a developer asks a question, retrieve the 5-10 most relevant code snippets and inject them into the LLM context. The LLM sees your actual API and generates code that matches your real patterns, not generic StackOverflow patterns.

Building the code index

// 1. Extract semantic units from your codebase
import * as ts from 'typescript';
import * as fs from 'fs';
import * as path from 'path';

interface CodeChunk {
  id: string;
  type: 'function' | 'class' | 'interface' | 'example';
  name: string;
  signature: string;
  docstring: string;
  body: string;  // First 20 lines
  filePath: string;
  embedding?: number[];
}

function extractChunks(filePath) {
  const source = fs.readFileSync(filePath, 'utf8');
  const sf = ts.createSourceFile(filePath, source, ts.ScriptTarget.Latest, true);
  const chunks = [];

  function visit(node) {
    if (ts.isFunctionDeclaration(node) && node.name) {
      chunks.push({
        id: filePath + '#' + node.name.text,
        type: 'function',
        name: node.name.text,
        signature: source.slice(node.pos, node.body?.pos || node.end).split('{')[0].trim(),
        docstring: getJSDoc(node, source),
        body: source.slice(node.pos, node.end).slice(0, 500),
        filePath
      });
    }
    ts.forEachChild(node, visit);
  }
  ts.forEachChild(sf, visit);
  return chunks;
}

// 2. Embed all chunks
async function buildIndex(codebaseRoot) {
  const files = getAllTsFiles(codebaseRoot);
  const allChunks = files.flatMap(f => extractChunks(f));

  // Batch embed for efficiency
  const embeddings = await Promise.all(
    chunk(allChunks, 100).map(batch =>
      embedBatch(batch.map(c => c.signature + '\n' + c.docstring + '\n' + c.body))
    )
  );

  return allChunks.map((c, i) => ({ ...c, embedding: embeddings[i] }));
}

Retrieval — finding the right code for each query

// Retrieve relevant code chunks for a given query
async function retrieveContext(query, index, topK = 8) {
  const queryEmbedding = await embed(query);

  // Cosine similarity search
  const scored = index.map(chunk => ({
    ...chunk,
    score: cosineSimilarity(queryEmbedding, chunk.embedding)
  }));

  // Sort by relevance, return top K
  return scored
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
    .filter(c => c.score > 0.7); // Filter low-relevance results
}

// Format for LLM context injection
function formatContext(chunks) {
  return chunks.map(c =>
    `// File: ${c.filePath}\n// Type: ${c.type}\n${c.signature}\n${c.docstring}\n${c.body.slice(0, 200)}`
  ).join('\n\n---\n\n');
}

// Generate code with codebase context
async function generateWithRAG(query, index) {
  const relevantCode = await retrieveContext(query, index);
  const context = formatContext(relevantCode);

  return await client.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 2000,
    system: 'You are a code assistant for this specific codebase.'
      + '\n\nUse ONLY the following functions and patterns from the actual codebase:'
      + '\n\n' + context
      + '\n\nNever invent functions not shown above.',
    messages: [{ role: 'user', content: query }]
  });
}

pgvector — production-grade vector storage

-- PostgreSQL with pgvector extension (production setup)
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE code_chunks (
  id          TEXT PRIMARY KEY,
  type        TEXT NOT NULL,  -- function, class, interface, example
  name        TEXT NOT NULL,
  signature   TEXT NOT NULL,
  docstring   TEXT,
  body        TEXT NOT NULL,
  file_path   TEXT NOT NULL,
  embedding   vector(1536),  -- text-embedding-3-small dimension
  updated_at  TIMESTAMP DEFAULT NOW()
);

-- IVFFlat index for fast approximate nearest neighbor search
CREATE INDEX ON code_chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Query: find top-5 most similar chunks to a query embedding
SELECT id, name, signature, docstring, body, file_path,
       1 - (embedding <=> $1) as similarity
FROM code_chunks
ORDER BY embedding <=> $1  -- cosine distance
LIMIT 5;

-- Incremental update: only re-embed changed files
UPDATE code_chunks SET embedding = $1, updated_at = NOW()
WHERE id = $2;

-- Check if file has changed before re-indexing:
SELECT MAX(updated_at) FROM code_chunks WHERE file_path = $1;

✅ Extract semantic chunks at function/class level, not arbitrary line splits
✅ Include docstrings and signatures as the primary embedding text
✅ Use pgvector with IVFFlat index for production (not in-memory arrays)
✅ Filter low-similarity results (score < 0.7) — irrelevant context hurts generation
✅ Incremental indexing — only re-embed changed files on each CI run
❌ Do not embed entire files — chunk at semantic boundaries
❌ Do not skip the similarity threshold — all context included equally causes confusion

RAG for code pairs directly with the structured code generation guide — RAG provides the accurate API context and structured output ensures the generated code is parseable and valid. For storing vector embeddings at scale, the PostgreSQL query optimization guide covers how to tune pgvector index performance. External reference: pgvector documentation.

Level up your AI development skills

→ View Course on Udemy — The most comprehensive hands-on course covering every concept in this post with real projects.

→ Building LLM Powered Applications (Amazon) — The definitive book on building production AI systems and agents.

Sponsored links. We may earn a commission at no extra cost to you.

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

RAG for Code: Build a Codebase-Aware AI Assistant That Does Not Hallucinate Your API

Building the code index

Retrieval — finding the right code for each query

pgvector — production-grade vector storage

Like this:

Related

Discover more from CheatCoders

Building the code index

Retrieval — finding the right code for each query

pgvector — production-grade vector storage

🚀 Don’t Miss the Next Cheat Code

Share this:

Like this:

Related

Discover more from CheatCoders