Multi-agent systems introduce a new class of failure: agents hallucinating each other into cascading errors. Agent A produces incorrect output. Agent B trusts it without verification. Agent C builds on B’s result. By the time you notice the failure, it is three reasoning steps removed from the original error and nearly impossible to trace. These are the patterns that prevent this.
⚡ TL;DR: Never pass raw LLM output between agents without structured validation. Use a dedicated orchestrator agent that maintains state and verifies inter-agent communication. Give each sub-agent a specific, bounded task with a verifiable output schema. Track total token budget across all agents in the system.
The inter-agent trust problem
// WRONG: agents pass raw text to each other
async function badMultiAgent(task) {
const plannerOutput = await plannerAgent(task); // Returns raw text
const researchOutput = await researchAgent(plannerOutput); // Trusts planner blindly
const writerOutput = await writerAgent(researchOutput); // Trusts research blindly
// One hallucination compounds through all three stages
}
// RIGHT: structured output validation between agents
import { z } from 'zod';
const ResearchResult = z.object({
query: z.string(),
findings: z.array(z.object({
claim: z.string(),
source: z.string().url(),
confidence: z.enum(['high', 'medium', 'low'])
})),
limitations: z.array(z.string()) // Forces agent to acknowledge uncertainty
});
async function goodMultiAgent(task) {
const plan = await plannerAgent(task);
// Validate planner output before passing to research
const validPlan = PlanSchema.parse(plan);
const research = await researchAgent(validPlan);
// Validate research output — rejects hallucinated sources
const validResearch = ResearchResult.parse(research);
// Only pass high-confidence findings to writer
const highConfidence = validResearch.findings.filter(f => f.confidence === 'high');
return await writerAgent({ ...validPlan, findings: highConfidence });
}
Orchestrator pattern — central state management
// Orchestrator maintains global state and routes between workers
class AgentOrchestrator {
state = { task: '', results: {}, errors: [], totalTokens: 0 };
MAX_TOKENS = 200000; // Budget for entire multi-agent system
async run(task) {
this.state.task = task;
const plan = await this.runWorker('planner', task);
// Execute plan steps in parallel where possible
const parallelGroups = this.groupParallel(plan.steps);
for (const group of parallelGroups) {
const groupResults = await Promise.all(
group.map(step => this.runWorker(step.agent, step.input, step.id))
);
// Validate ALL results before proceeding
for (const result of groupResults) {
if (!this.validate(result)) {
this.state.errors.push(result);
return this.handlePartialFailure();
}
}
}
return this.compile();
}
async runWorker(agentName, input, stepId) {
if (this.state.totalTokens > this.MAX_TOKENS)
throw new Error('System token budget exceeded');
const result = await workers[agentName](input);
this.state.totalTokens += result.usage?.total_tokens || 0;
this.state.results[stepId] = result;
return result;
}
}
Preventing token explosion in multi-agent systems
// Token costs multiply in multi-agent systems:
// Single agent: 10K tokens/task
// 3 sequential agents: 10K + (10K + 10K context) + (10K + 20K context) = 70K tokens
// 5 agents with full history passing: exponential growth
// Solutions:
// 1. Structured summaries between agents (not full transcripts)
function summarizeForHandoff(agentOutput) {
return {
summary: agentOutput.conclusion, // Not full reasoning
key_facts: agentOutput.facts, // Structured, not prose
confidence: agentOutput.confidence,
// NOT: agentOutput.fullConversation (never pass this forward)
};
}
// 2. Context compression at each agent boundary
const handoff = await compressForHandoff({
model: 'claude-haiku-4-5-20251001', // Cheap model for compression
max_tokens: 500,
messages: [{ role: 'user', content:
'Extract the 3 most important findings in JSON format: ' + JSON.stringify(agentOutput)
}]
});
// 3. Track and cap system-wide token usage
console.log('Multi-agent cost:', (totalTokens / 1000000 * 9).toFixed(4), 'USD');
- ✅ Always validate inter-agent communication with a strict schema (Zod, Pydantic)
- ✅ Use a central orchestrator to maintain global state
- ✅ Pass structured summaries between agents, never full conversation histories
- ✅ Track total token budget across all agents in the system
- ✅ Give each sub-agent a single bounded task with a verifiable output
- ❌ Never pass raw LLM text output between agents without validation
- ❌ Never let agents spawn sub-agents recursively without a depth limit
Multi-agent systems benefit from the orchestration patterns in Step Functions Express Workflows — Step Functions provides the same state machine guarantees for cloud infrastructure that LangGraph provides for AI agents. For the token cost tracking, the CloudWatch Insights guide shows how to track custom metrics like agent token usage. External reference: Anthropic agents documentation.
Level up your AI development skills
→ View Course on Udemy — The most comprehensive hands-on course covering every concept in this post with real projects.
→ Building LLM Powered Applications (Amazon) — The definitive book on building production AI systems and agents.
Sponsored links. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
