Building AI Agents That Actually Work: The Architecture No Tutorial Shows You

Building AI Agents That Actually Work: The Architecture No Tutorial Shows You

Building an AI agent that demos well is easy. Building one that works reliably in production is a completely different engineering challenge. The gap between a LangChain tutorial and a production-grade agent is enormous: planning architecture, memory systems, error recovery, tool reliability, and observability. This covers the architecture decisions that determine whether your agent is a toy or a production tool.

TL;DR: Production AI agents need: separation of planning from execution, three memory tiers (working, episodic, semantic), tool call schema validation with retry, an explicit state machine not an ad-hoc loop, and full observability on every LLM call. Build the infrastructure first, then the intelligence.

The core pattern: planner-executor separation

// Anti-pattern: monolithic agent loop
async function badAgent(task) {
  let result = await llm.call(task);
  while (result.hasToolCall) {
    result = await llm.call(executeTool(result) + previousContext);
  }
  return result; // Brittle, no planning, no recovery
}

// Production pattern: planner + executor separated
class ProductionAgent {
  async run(task) {
    // Phase 1: Planner creates explicit plan
    const plan = await this.planner.createPlan({
      task,
      availableTools: this.tools.list(),
      constraints: this.constraints,
    });
    // plan = { steps: [{tool, args, dependsOn, retryPolicy}], estimatedCost }

    // Phase 2: Executor runs plan with oversight
    const executor = new PlanExecutor(plan, this.tools);
    return executor.run({
      onStepComplete: this.monitor.recordStep,
      onError: this.errorHandler.handle,
      maxRetries: 3,
    });
  }
}
// Plan is inspectable BEFORE execution
// Executor retries individual steps without re-planning
// Each phase has its own observability

Three-tier memory architecture

class AgentMemory {
  // Tier 1: Working memory - current task context (in LLM context)
  working = new Map();

  // Tier 2: Episodic memory - past task results (vector DB)
  episodic = new VectorStore('episodes'); // "What happened when I did X?"

  // Tier 3: Semantic memory - facts and knowledge (vector DB)
  semantic = new VectorStore('knowledge'); // "What do I know about X?"

  async recall(query) {
    const [working, episodic, semantic] = await Promise.all([
      this.searchWorking(query),
      this.episodic.similaritySearch(query, { k: 3, threshold: 0.75 }),
      this.semantic.similaritySearch(query, { k: 5, threshold: 0.70 }),
    ]);
    return this.rankAndMerge([...working, ...episodic, ...semantic]);
  }

  async store(result) {
    await this.episodic.upsert({
      id: result.taskId,
      vector: await this.embed(result.summary),
      metadata: { task: result.task, outcome: result.outcome }
    });
  }
}

Tool calling with schema validation

class ToolRegistry {
  register(name, fn, schema) {
    this.tools.set(name, { fn, schema: z.object(schema) });
  }

  async call(name, rawArgs, context) {
    const tool = this.tools.get(name);
    if (!tool) return { error: 'tool_not_found', name };

    // Validate BEFORE calling
    const parsed = tool.schema.safeParse(rawArgs);
    if (!parsed.success) {
      // Structured error lets LLM self-correct
      return { error: 'invalid_args', details: parsed.error.format() };
    }

    // Execute with timeout
    const result = await Promise.race([
      tool.fn(parsed.data, context),
      new Promise((_, rej) => setTimeout(() => rej(new Error('timeout')), 10000))
    ]);

    this.telemetry.record({ tool: name, args: parsed.data, result });
    return result;
  }
}

// Typed tool schema:
registry.register('search_code', searchCode, {
  query: z.string().min(3).max(200),
  extensions: z.array(z.enum(['.ts','.js','.py'])).optional(),
  maxResults: z.number().int().min(1).max(20).default(5),
});

State machine — replace ad-hoc loops

const agentFSM = {
  IDLE:         { next: ['PLANNING'] },
  PLANNING:     { next: ['EXECUTING', 'FAILED'] },
  EXECUTING:    { next: ['WAITING_TOOL', 'REVIEWING', 'FAILED'] },
  WAITING_TOOL: { next: ['EXECUTING', 'FAILED'] },
  REVIEWING:    { next: ['EXECUTING', 'COMPLETE', 'REPLANNING'] },
  REPLANNING:   { next: ['EXECUTING', 'FAILED'] },
  COMPLETE:     { next: ['IDLE'] },
  FAILED:       { next: ['IDLE'] },
};

class StatefulAgent {
  state = 'IDLE';
  history = [];

  transition(next) {
    if (!agentFSM[this.state].next.includes(next)) {
      throw new Error(this.state + ' cannot transition to ' + next);
    }
    this.history.push({ from: this.state, to: next, at: Date.now() });
    this.state = next;
    this.telemetry.record('state_change', { from: this.state, to: next });
  }
}
// Every state change is logged, validated, and replayable

Observability — trace every LLM call

class InstrumentedLLM {
  async call(prompt, opts = {}) {
    const span = tracer.startSpan('llm.call');
    span.setAttributes({
      'llm.model': opts.model,
      'llm.prompt_tokens': estimateTokens(prompt),
      'agent.task_id': this.context.taskId,
    });
    try {
      const response = await this.client.complete(prompt, opts);
      span.setAttributes({
        'llm.completion_tokens': response.usage.completion_tokens,
        'llm.cost_usd': calculateCost(response.usage),
      });
      return response;
    } catch (err) {
      span.recordException(err);
      throw err;
    } finally {
      span.end();
    }
  }
}

Production agent checklist

  • ✅ Separate planner from executor — never mix planning and execution in one LLM call
  • ✅ All three memory tiers: working (in-context), episodic (past runs), semantic (knowledge base)
  • ✅ Validate every tool call against typed schema before execution
  • ✅ Use an explicit state machine — no ad-hoc while loops
  • ✅ Trace every LLM call with tokens, cost, and latency
  • ✅ Set hard limits on steps, cost, and time per task run
  • ❌ Never deploy agents without maximum step and cost budgets
  • ❌ Never run high-stakes agent actions without human approval checkpoint

Production agents need solid infrastructure — serverless execution with optimized Lambda cold starts is ideal for agent executors. For persisting agent state, DynamoDB single-table design handles episodic memory storage efficiently. External reference: Lilian Weng’s AI agent architecture survey.

Level Up: AI Agents and LLM Engineering

Python Bootcamp on Udemy — Build real AI agents and automation tools with Python from scratch.

Designing Data-Intensive Applications — The infrastructure foundation every AI engineer needs.

Sponsored links. We may earn a commission at no extra cost to you.


Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply