Multi-Agent Systems: Orchestrating Claude, GPT-4o, and Specialized Models Together

Multi-Agent Systems: Orchestrating Claude, GPT-4o, and Specialized Models Together

Single-model AI systems are giving way to multi-agent architectures where different models handle different parts of a task. Claude excels at long-context reasoning and code analysis. GPT-4o handles vision and multimodal inputs. Specialized fine-tuned models outperform general models on narrow domain tasks. Orchestrating these models together — with proper routing, handoffs, and cost control — is the architecture skill that separates AI engineers from prompt engineers.

TL;DR: Route tasks to models based on capability match, not default habit. Use Claude Opus for reasoning-heavy planning, Sonnet for implementation tasks, Haiku for high-volume classification. Route vision to GPT-4o-mini, long documents to Claude (200K context). Share context between agents via structured handoff objects, not raw text.

Model routing architecture

class ModelRouter {
  // Model capability registry
  models = {
    "claude-opus-4-5": {
      strengths: ["long_context", "reasoning", "code_analysis", "safety"],
      contextWindow: 200_000,
      costPer1kTokens: { input: 0.015, output: 0.075 },
      bestFor: ["planning", "complex_reasoning", "code_review"]
    },
    "claude-sonnet-4-5": {
      strengths: ["balanced", "coding", "analysis"],
      contextWindow: 200_000,
      costPer1kTokens: { input: 0.003, output: 0.015 },
      bestFor: ["implementation", "summarization", "general_tasks"]
    },
    "claude-haiku-4-5": {
      strengths: ["speed", "cost", "simple_tasks"],
      contextWindow: 200_000,
      costPer1kTokens: { input: 0.00025, output: 0.00125 },
      bestFor: ["classification", "extraction", "simple_qa"]
    },
    "gpt-4o-mini": {
      strengths: ["vision", "multimodal", "json_mode"],
      contextWindow: 128_000,
      costPer1kTokens: { input: 0.00015, output: 0.0006 },
      bestFor: ["image_analysis", "document_ocr", "structured_extraction"]
    }
  };

  route(task) {
    if (task.hasImages || task.requiresVision) return "gpt-4o-mini";
    if (task.tokenCount > 100_000) return "claude-opus-4-5"; // Long context
    if (task.type === "planning" || task.complexity === "high") return "claude-opus-4-5";
    if (task.type === "implementation") return "claude-sonnet-4-5";
    if (task.type === "classification" || task.complexity === "low") return "claude-haiku-4-5";
    return "claude-sonnet-4-5"; // Default
  }
}

Structured handoff protocol between agents

// Never pass raw text between agents — use typed handoff objects
interface AgentHandoff {
  taskId: string;
  originalTask: string;
  completedSteps: Array<{
    agentId: string;
    model: string;
    action: string;
    result: unknown;
    confidence: number; // 0-1
    durationMs: number;
    tokensUsed: number;
  }>;
  currentContext: Record; // Accumulated facts
  remainingSteps: string[];
  constraints: {
    maxSteps: number;
    maxTokenBudget: number;
    deadline: Date;
  };
}

// Orchestrator sends typed handoff to next agent
async function handoffToAgent(agentId, handoff) {
  const agent = this.agents.get(agentId);
  const model = this.router.route(agent.taskType);

  // Compress context to fit next model's window
  const compressedHandoff = await this.contextManager.compress(
    handoff,
    model.contextWindow * 0.7 // Leave 30% for response
  );

  return agent.execute(compressedHandoff, model);
}

Cost optimization — use the right model tier

// Cost comparison for a typical pipeline (1M tasks/month):

// Naive approach: use Claude Opus for everything
// 1M × 2K input + 500 output tokens
// Cost: 1M × (2 × $0.015 + 0.5 × $0.075) = $67,500/month

// Optimized routing:
// - Haiku for classification (40% of tasks): $250/month
// - Sonnet for implementation (50% of tasks): $18,000/month
// - Opus for complex reasoning (10% of tasks): $6,750/month
// Total: $25,000/month — 63% cost reduction

// Implementation:
class CostOptimizedPipeline {
  async process(task) {
    // Step 1: Haiku classifies task type and complexity (cheap)
    const classification = await this.callModel("claude-haiku-4-5", {
      prompt: CLASSIFY_PROMPT + task.description,
      maxTokens: 100, // Just need the classification JSON
    });

    // Step 2: Route to appropriate model based on classification
    const model = this.router.route(classification);

    // Step 3: Execute with routed model
    return this.callModel(model, { prompt: EXECUTE_PROMPT + task.description });
  }
}

Parallel agent execution with dependency graph

// Execute independent subtasks in parallel, dependent tasks sequentially
class DependencyAwareOrchestrator {
  async execute(plan) {
    const results = new Map();
    const pending = new Set(plan.steps.map(s => s.id));

    while (pending.size > 0) {
      // Find steps whose dependencies are all complete
      const ready = plan.steps.filter(step =>
        pending.has(step.id) &&
        step.dependsOn.every(dep => results.has(dep))
      );

      if (ready.length === 0) throw new Error('Circular dependency detected');

      // Execute all ready steps in parallel
      const batchResults = await Promise.all(
        ready.map(async step => {
          const deps = step.dependsOn.map(d => results.get(d));
          const result = await this.executeStep(step, deps);
          return [step.id, result];
        })
      );

      batchResults.forEach(([id, result]) => {
        results.set(id, result);
        pending.delete(id);
      });
    }

    return results;
  }
}
  • ✅ Route by capability: Claude for reasoning/code, GPT-4o for vision, fine-tuned for domain
  • ✅ Use Haiku for classification and routing decisions — it is 60x cheaper than Opus
  • ✅ Structured handoff objects between agents — never raw text
  • ✅ Compress context before handoff — leave 30% headroom for the response
  • ✅ Execute independent subtasks in parallel using a dependency graph
  • ❌ Never use the most powerful model for every task — cost compounds fast
  • ❌ Never pass entire conversation history between agents — extract only relevant facts

Multi-agent orchestration benefits from the production agent architecture patterns — especially the planner-executor separation that maps cleanly onto multi-model routing. For infrastructure, Step Functions Express Workflows handle the parallel agent execution graph efficiently. External reference: Anthropic multi-agent documentation.

Level Up: Multi-Agent AI Systems

Python Bootcamp on Udemy — Build real AI agents and automation tools with Python from scratch.

Designing Data-Intensive Applications — The infrastructure foundation every AI engineer needs.

Sponsored links. We may earn a commission at no extra cost to you.


Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply