Every team building AI-powered tools debates RAG vs fine-tuning, and most of the debate misses the point. These are not competing approaches — they solve different problems. RAG retrieves dynamic context at inference time. Fine-tuning permanently changes model weights to improve format, style, or domain-specific tasks. Prompting shapes behavior with instructions. Understanding what each actually does — and what it cannot do — is the foundation of practical AI engineering.
⚡ TL;DR: Use prompting for behavior and format control. Use RAG when knowledge changes more often than weekly. Use fine-tuning when you need consistent output style, the task has thousands of labeled examples, or the base model consistently fails on your domain despite good prompting. All three are often combined in production.
What each technique actually changes
// Prompting: changes behavior, not knowledge
// - Shapes: output format, reasoning style, constraints, persona
// - Cannot: add knowledge the model doesn't have
// - Speed: immediate, zero cost
// - Best for: format control, behavior shaping, few-shot examples
const prompt = `You are a code reviewer. When reviewing code:
1. First identify the bug or issue
2. Explain why it is a problem
3. Provide the corrected code
4. Explain why the fix works
Format: use the ISSUE/WHY/FIX/EXPLANATION structure.`;
// RAG: retrieves dynamic knowledge at inference time
// - Shapes: what the model knows for this specific request
// - Cannot: change how the model reasons or writes
// - Speed: 100-500ms retrieval latency
// - Best for: documentation Q&A, codebase-specific knowledge, recent information
async function ragQuery(userQuestion) {
const relevantDocs = await vectorStore.similaritySearch(userQuestion, { k: 5 });
return llm.complete(SYSTEM_PROMPT + relevantDocs.map(d => d.content).join('\n') + userQuestion);
}
// Fine-tuning: permanently changes model weights
// - Shapes: writing style, output format, domain-specific patterns
// - Cannot: add knowledge to the model reliably (use RAG for that)
// - Speed: hours-days to train, inference same as base
// - Best for: consistent format, specialized writing style, classification tasks
RAG implementation for a developer documentation bot
import Anthropic from '@anthropic-ai/sdk';
import { VectorStore } from './vector-store';
const client = new Anthropic();
const docs = new VectorStore();
// Index your codebase documentation
async function indexDocumentation() {
const files = await glob('docs/**/*.md');
for (const file of files) {
const content = await fs.readFile(file, 'utf-8');
const chunks = splitIntoChunks(content, { size: 512, overlap: 50 });
for (const chunk of chunks) {
await docs.upsert({
id: `${file}-${chunk.index}`,
vector: await embed(chunk.text),
metadata: { source: file, text: chunk.text }
});
}
}
}
// Answer question with retrieved context
async function answerDocQuestion(question) {
// 1. Retrieve relevant chunks
const results = await docs.similaritySearch(question, {
k: 5,
threshold: 0.7 // Only use high-confidence matches
});
if (results.length === 0) {
return "I don't have documentation for that topic.";
}
// 2. Build context-augmented prompt
const context = results.map(r =>
`Source: ${r.metadata.source}\n${r.metadata.text}`
).join('\n---\n');
// 3. Generate answer grounded in retrieved docs
const response = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
system: `You are a documentation assistant. Answer based ONLY on the provided context.
If the answer is not in the context, say so explicitly.
Always cite which source document you used.`,
messages: [{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${question}`
}]
});
return response.content[0].text;
}
When fine-tuning actually helps (and when it does not)
// Fine-tuning WORKS for:
// 1. Consistent output format
// Base model: sometimes returns JSON, sometimes prose, sometimes both
// Fine-tuned: always returns exactly your JSON schema
// Training data: 500-2000 examples of (input, expected_format_output)
// 2. Domain-specific code style
// Base model: generates generic Python
// Fine-tuned: generates your team's exact conventions, naming patterns, error handling style
// Training data: 1000+ examples from your codebase
// 3. High-volume classification
// Base model: needs prompt explanation every time (expensive)
// Fine-tuned: classifies correctly with minimal prompt (cheap per inference)
// Fine-tuning DOES NOT work for:
// 1. Adding new factual knowledge (hallucination risk increases)
// Use RAG for knowledge, fine-tuning for style
// 2. Fixing reasoning errors on your domain
// If Claude gets math wrong in your domain, fine-tuning won't fix the reasoning
// Use chain-of-thought prompting instead
// 3. Teaching the model about recent events
// Model cannot learn new events from fine-tuning reliably
// Use RAG for anything time-sensitive
The production combination pattern
// Most production AI developer tools use all three:
class AICodeReviewer {
async review(prDiff) {
// Layer 1: Prompting — establishes review behavior and format
const systemPrompt = `You are a senior code reviewer.
Focus on: security vulnerabilities, performance issues, error handling.
Format: use JSON with { issues: [{line, severity, description, fix}] }`;
// Layer 2: RAG — retrieves team-specific conventions
const conventions = await this.codebaseIndex.search(
"coding conventions and patterns for " + inferLanguage(prDiff)
);
// Layer 3: Fine-tuned model — trained on team's historical review comments
// Uses company-fine-tuned Claude that writes in team's review style
const response = await this.fineTunedClient.create({
model: 'ft:claude-sonnet-4-5:company:code-reviewer:abc123',
system: systemPrompt + "\n\nTeam conventions:\n" + conventions,
messages: [{ role: 'user', content: prDiff }]
});
return JSON.parse(response.content[0].text);
}
}
- ✅ Prompting: always first — shape behavior before reaching for heavier techniques
- ✅ RAG: when knowledge changes, when context is dynamic, when grounding is needed
- ✅ Fine-tuning: consistent format, domain style, high-volume classification with many examples
- ✅ Combine all three for production systems — they are complementary not competing
- ❌ Never fine-tune to add factual knowledge — use RAG for knowledge, fine-tuning for style
- ❌ Never build RAG before optimizing your base prompt — prompting fixes 80% of problems
RAG systems need efficient vector similarity search — the DynamoDB single-table design handles metadata storage for RAG document chunks efficiently. For serving fine-tuned models, Lambda cold start optimization matters especially when loading model weights. External reference: Anthropic fine-tuning documentation.
Level Up: AI Engineering and RAG Systems
→ Python Bootcamp on Udemy — Build real AI agents and automation tools with Python from scratch.
→ Designing Data-Intensive Applications — The infrastructure foundation every AI engineer needs.
Sponsored links. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
