The most expensive failure mode in AI-assisted coding is an LLM that confidently calls a method that does not exist in your codebase or your version of a library. Retrieval-Augmented Generation (RAG) for code solves this by making the AI aware of your actual APIs, conventions, and patterns before it generates anything. This is what Cursor’s “codebase context” and GitHub Copilot’s workspace awareness are built on.
⚡ TL;DR: Build a vector index of your codebase (function signatures, docstrings, examples). When a developer asks a question, retrieve the 5-10 most relevant code snippets and inject them into the LLM context. The LLM sees your actual API and generates code that matches your real patterns, not generic StackOverflow patterns.
Building the code index
// 1. Extract semantic units from your codebase
import * as ts from 'typescript';
import * as fs from 'fs';
import * as path from 'path';
interface CodeChunk {
id: string;
type: 'function' | 'class' | 'interface' | 'example';
name: string;
signature: string;
docstring: string;
body: string; // First 20 lines
filePath: string;
embedding?: number[];
}
function extractChunks(filePath) {
const source = fs.readFileSync(filePath, 'utf8');
const sf = ts.createSourceFile(filePath, source, ts.ScriptTarget.Latest, true);
const chunks = [];
function visit(node) {
if (ts.isFunctionDeclaration(node) && node.name) {
chunks.push({
id: filePath + '#' + node.name.text,
type: 'function',
name: node.name.text,
signature: source.slice(node.pos, node.body?.pos || node.end).split('{')[0].trim(),
docstring: getJSDoc(node, source),
body: source.slice(node.pos, node.end).slice(0, 500),
filePath
});
}
ts.forEachChild(node, visit);
}
ts.forEachChild(sf, visit);
return chunks;
}
// 2. Embed all chunks
async function buildIndex(codebaseRoot) {
const files = getAllTsFiles(codebaseRoot);
const allChunks = files.flatMap(f => extractChunks(f));
// Batch embed for efficiency
const embeddings = await Promise.all(
chunk(allChunks, 100).map(batch =>
embedBatch(batch.map(c => c.signature + '\n' + c.docstring + '\n' + c.body))
)
);
return allChunks.map((c, i) => ({ ...c, embedding: embeddings[i] }));
}
Retrieval — finding the right code for each query
// Retrieve relevant code chunks for a given query
async function retrieveContext(query, index, topK = 8) {
const queryEmbedding = await embed(query);
// Cosine similarity search
const scored = index.map(chunk => ({
...chunk,
score: cosineSimilarity(queryEmbedding, chunk.embedding)
}));
// Sort by relevance, return top K
return scored
.sort((a, b) => b.score - a.score)
.slice(0, topK)
.filter(c => c.score > 0.7); // Filter low-relevance results
}
// Format for LLM context injection
function formatContext(chunks) {
return chunks.map(c =>
`// File: ${c.filePath}\n// Type: ${c.type}\n${c.signature}\n${c.docstring}\n${c.body.slice(0, 200)}`
).join('\n\n---\n\n');
}
// Generate code with codebase context
async function generateWithRAG(query, index) {
const relevantCode = await retrieveContext(query, index);
const context = formatContext(relevantCode);
return await client.messages.create({
model: 'claude-opus-4-5',
max_tokens: 2000,
system: 'You are a code assistant for this specific codebase.'
+ '\n\nUse ONLY the following functions and patterns from the actual codebase:'
+ '\n\n' + context
+ '\n\nNever invent functions not shown above.',
messages: [{ role: 'user', content: query }]
});
}
pgvector — production-grade vector storage
-- PostgreSQL with pgvector extension (production setup)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE code_chunks (
id TEXT PRIMARY KEY,
type TEXT NOT NULL, -- function, class, interface, example
name TEXT NOT NULL,
signature TEXT NOT NULL,
docstring TEXT,
body TEXT NOT NULL,
file_path TEXT NOT NULL,
embedding vector(1536), -- text-embedding-3-small dimension
updated_at TIMESTAMP DEFAULT NOW()
);
-- IVFFlat index for fast approximate nearest neighbor search
CREATE INDEX ON code_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Query: find top-5 most similar chunks to a query embedding
SELECT id, name, signature, docstring, body, file_path,
1 - (embedding <=> $1) as similarity
FROM code_chunks
ORDER BY embedding <=> $1 -- cosine distance
LIMIT 5;
-- Incremental update: only re-embed changed files
UPDATE code_chunks SET embedding = $1, updated_at = NOW()
WHERE id = $2;
-- Check if file has changed before re-indexing:
SELECT MAX(updated_at) FROM code_chunks WHERE file_path = $1;
- ✅ Extract semantic chunks at function/class level, not arbitrary line splits
- ✅ Include docstrings and signatures as the primary embedding text
- ✅ Use pgvector with IVFFlat index for production (not in-memory arrays)
- ✅ Filter low-similarity results (score < 0.7) — irrelevant context hurts generation
- ✅ Incremental indexing — only re-embed changed files on each CI run
- ❌ Do not embed entire files — chunk at semantic boundaries
- ❌ Do not skip the similarity threshold — all context included equally causes confusion
RAG for code pairs directly with the structured code generation guide — RAG provides the accurate API context and structured output ensures the generated code is parseable and valid. For storing vector embeddings at scale, the PostgreSQL query optimization guide covers how to tune pgvector index performance. External reference: pgvector documentation.
Level up your AI development skills
→ View Course on Udemy — The most comprehensive hands-on course covering every concept in this post with real projects.
→ Building LLM Powered Applications (Amazon) — The definitive book on building production AI systems and agents.
Sponsored links. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
