Single-model AI systems are giving way to multi-agent architectures where different models handle different parts of a task. Claude excels at long-context reasoning and code analysis. GPT-4o handles vision and multimodal inputs. Specialized fine-tuned models outperform general models on narrow domain tasks. Orchestrating these models together — with proper routing, handoffs, and cost control — is the architecture skill that separates AI engineers from prompt engineers.
⚡ TL;DR: Route tasks to models based on capability match, not default habit. Use Claude Opus for reasoning-heavy planning, Sonnet for implementation tasks, Haiku for high-volume classification. Route vision to GPT-4o-mini, long documents to Claude (200K context). Share context between agents via structured handoff objects, not raw text.
Model routing architecture
class ModelRouter {
// Model capability registry
models = {
"claude-opus-4-5": {
strengths: ["long_context", "reasoning", "code_analysis", "safety"],
contextWindow: 200_000,
costPer1kTokens: { input: 0.015, output: 0.075 },
bestFor: ["planning", "complex_reasoning", "code_review"]
},
"claude-sonnet-4-5": {
strengths: ["balanced", "coding", "analysis"],
contextWindow: 200_000,
costPer1kTokens: { input: 0.003, output: 0.015 },
bestFor: ["implementation", "summarization", "general_tasks"]
},
"claude-haiku-4-5": {
strengths: ["speed", "cost", "simple_tasks"],
contextWindow: 200_000,
costPer1kTokens: { input: 0.00025, output: 0.00125 },
bestFor: ["classification", "extraction", "simple_qa"]
},
"gpt-4o-mini": {
strengths: ["vision", "multimodal", "json_mode"],
contextWindow: 128_000,
costPer1kTokens: { input: 0.00015, output: 0.0006 },
bestFor: ["image_analysis", "document_ocr", "structured_extraction"]
}
};
route(task) {
if (task.hasImages || task.requiresVision) return "gpt-4o-mini";
if (task.tokenCount > 100_000) return "claude-opus-4-5"; // Long context
if (task.type === "planning" || task.complexity === "high") return "claude-opus-4-5";
if (task.type === "implementation") return "claude-sonnet-4-5";
if (task.type === "classification" || task.complexity === "low") return "claude-haiku-4-5";
return "claude-sonnet-4-5"; // Default
}
}
Structured handoff protocol between agents
// Never pass raw text between agents — use typed handoff objects
interface AgentHandoff {
taskId: string;
originalTask: string;
completedSteps: Array<{
agentId: string;
model: string;
action: string;
result: unknown;
confidence: number; // 0-1
durationMs: number;
tokensUsed: number;
}>;
currentContext: Record; // Accumulated facts
remainingSteps: string[];
constraints: {
maxSteps: number;
maxTokenBudget: number;
deadline: Date;
};
}
// Orchestrator sends typed handoff to next agent
async function handoffToAgent(agentId, handoff) {
const agent = this.agents.get(agentId);
const model = this.router.route(agent.taskType);
// Compress context to fit next model's window
const compressedHandoff = await this.contextManager.compress(
handoff,
model.contextWindow * 0.7 // Leave 30% for response
);
return agent.execute(compressedHandoff, model);
}
Cost optimization — use the right model tier
// Cost comparison for a typical pipeline (1M tasks/month):
// Naive approach: use Claude Opus for everything
// 1M × 2K input + 500 output tokens
// Cost: 1M × (2 × $0.015 + 0.5 × $0.075) = $67,500/month
// Optimized routing:
// - Haiku for classification (40% of tasks): $250/month
// - Sonnet for implementation (50% of tasks): $18,000/month
// - Opus for complex reasoning (10% of tasks): $6,750/month
// Total: $25,000/month — 63% cost reduction
// Implementation:
class CostOptimizedPipeline {
async process(task) {
// Step 1: Haiku classifies task type and complexity (cheap)
const classification = await this.callModel("claude-haiku-4-5", {
prompt: CLASSIFY_PROMPT + task.description,
maxTokens: 100, // Just need the classification JSON
});
// Step 2: Route to appropriate model based on classification
const model = this.router.route(classification);
// Step 3: Execute with routed model
return this.callModel(model, { prompt: EXECUTE_PROMPT + task.description });
}
}
Parallel agent execution with dependency graph
// Execute independent subtasks in parallel, dependent tasks sequentially
class DependencyAwareOrchestrator {
async execute(plan) {
const results = new Map();
const pending = new Set(plan.steps.map(s => s.id));
while (pending.size > 0) {
// Find steps whose dependencies are all complete
const ready = plan.steps.filter(step =>
pending.has(step.id) &&
step.dependsOn.every(dep => results.has(dep))
);
if (ready.length === 0) throw new Error('Circular dependency detected');
// Execute all ready steps in parallel
const batchResults = await Promise.all(
ready.map(async step => {
const deps = step.dependsOn.map(d => results.get(d));
const result = await this.executeStep(step, deps);
return [step.id, result];
})
);
batchResults.forEach(([id, result]) => {
results.set(id, result);
pending.delete(id);
});
}
return results;
}
}
- ✅ Route by capability: Claude for reasoning/code, GPT-4o for vision, fine-tuned for domain
- ✅ Use Haiku for classification and routing decisions — it is 60x cheaper than Opus
- ✅ Structured handoff objects between agents — never raw text
- ✅ Compress context before handoff — leave 30% headroom for the response
- ✅ Execute independent subtasks in parallel using a dependency graph
- ❌ Never use the most powerful model for every task — cost compounds fast
- ❌ Never pass entire conversation history between agents — extract only relevant facts
Multi-agent orchestration benefits from the production agent architecture patterns — especially the planner-executor separation that maps cleanly onto multi-model routing. For infrastructure, Step Functions Express Workflows handle the parallel agent execution graph efficiently. External reference: Anthropic multi-agent documentation.
Level Up: Multi-Agent AI Systems
→ Python Bootcamp on Udemy — Build real AI agents and automation tools with Python from scratch.
→ Designing Data-Intensive Applications — The infrastructure foundation every AI engineer needs.
Sponsored links. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
