Building AI Agents That Actually Work: The Architecture No Tutorial Shows You

Building AI Agents That Actually Work: The Architecture No Tutorial Shows You

Building an AI agent that demos well is easy. Building one that works reliably in production is a completely different engineering challenge. The gap between a LangChain tutorial and a production-grade agent is enormous: planning architecture, memory systems, error recovery, tool reliability, and observability. This covers the architecture decisions that determine whether your agent is a toy or a production tool.

TL;DR: Production AI agents need: separation of planning from execution, three memory tiers (working, episodic, semantic), tool call schema validation with retry, an explicit state machine not an ad-hoc loop, and full observability on every LLM call. Build the infrastructure first, then the intelligence.

The core pattern: planner-executor separation

// Anti-pattern: monolithic agent loop
async function badAgent(task) {
  let result = await llm.call(task);
  while (result.hasToolCall) {
    result = await llm.call(executeTool(result) + previousContext);
  }
  return result; // Brittle, no planning, no recovery
}

// Production pattern: planner + executor separated
class ProductionAgent {
  async run(task) {
    // Phase 1: Planner creates explicit plan
    const plan = await this.planner.createPlan({
      task,
      availableTools: this.tools.list(),
      constraints: this.constraints,
    });
    // plan = { steps: [{tool, args, dependsOn, retryPolicy}], estimatedCost }

    // Phase 2: Executor runs plan with oversight
    const executor = new PlanExecutor(plan, this.tools);
    return executor.run({
      onStepComplete: this.monitor.recordStep,
      onError: this.errorHandler.handle,
      maxRetries: 3,
    });
  }
}
// Plan is inspectable BEFORE execution
// Executor retries individual steps without re-planning
// Each phase has its own observability

Three-tier memory architecture

class AgentMemory {
  // Tier 1: Working memory - current task context (in LLM context)
  working = new Map();

  // Tier 2: Episodic memory - past task results (vector DB)
  episodic = new VectorStore('episodes'); // "What happened when I did X?"

  // Tier 3: Semantic memory - facts and knowledge (vector DB)
  semantic = new VectorStore('knowledge'); // "What do I know about X?"

  async recall(query) {
    const [working, episodic, semantic] = await Promise.all([
      this.searchWorking(query),
      this.episodic.similaritySearch(query, { k: 3, threshold: 0.75 }),
      this.semantic.similaritySearch(query, { k: 5, threshold: 0.70 }),
    ]);
    return this.rankAndMerge([...working, ...episodic, ...semantic]);
  }

  async store(result) {
    await this.episodic.upsert({
      id: result.taskId,
      vector: await this.embed(result.summary),
      metadata: { task: result.task, outcome: result.outcome }
    });
  }
}

Tool calling with schema validation

class ToolRegistry {
  register(name, fn, schema) {
    this.tools.set(name, { fn, schema: z.object(schema) });
  }

  async call(name, rawArgs, context) {
    const tool = this.tools.get(name);
    if (!tool) return { error: 'tool_not_found', name };

    // Validate BEFORE calling
    const parsed = tool.schema.safeParse(rawArgs);
    if (!parsed.success) {
      // Structured error lets LLM self-correct
      return { error: 'invalid_args', details: parsed.error.format() };
    }

    // Execute with timeout
    const result = await Promise.race([
      tool.fn(parsed.data, context),
      new Promise((_, rej) => setTimeout(() => rej(new Error('timeout')), 10000))
    ]);

    this.telemetry.record({ tool: name, args: parsed.data, result });
    return result;
  }
}

// Typed tool schema:
registry.register('search_code', searchCode, {
  query: z.string().min(3).max(200),
  extensions: z.array(z.enum(['.ts','.js','.py'])).optional(),
  maxResults: z.number().int().min(1).max(20).default(5),
});

State machine — replace ad-hoc loops

const agentFSM = {
  IDLE:         { next: ['PLANNING'] },
  PLANNING:     { next: ['EXECUTING', 'FAILED'] },
  EXECUTING:    { next: ['WAITING_TOOL', 'REVIEWING', 'FAILED'] },
  WAITING_TOOL: { next: ['EXECUTING', 'FAILED'] },
  REVIEWING:    { next: ['EXECUTING', 'COMPLETE', 'REPLANNING'] },
  REPLANNING:   { next: ['EXECUTING', 'FAILED'] },
  COMPLETE:     { next: ['IDLE'] },
  FAILED:       { next: ['IDLE'] },
};

class StatefulAgent {
  state = 'IDLE';
  history = [];

  transition(next) {
    if (!agentFSM[this.state].next.includes(next)) {
      throw new Error(this.state + ' cannot transition to ' + next);
    }
    this.history.push({ from: this.state, to: next, at: Date.now() });
    this.state = next;
    this.telemetry.record('state_change', { from: this.state, to: next });
  }
}
// Every state change is logged, validated, and replayable

Observability — trace every LLM call

class InstrumentedLLM {
  async call(prompt, opts = {}) {
    const span = tracer.startSpan('llm.call');
    span.setAttributes({
      'llm.model': opts.model,
      'llm.prompt_tokens': estimateTokens(prompt),
      'agent.task_id': this.context.taskId,
    });
    try {
      const response = await this.client.complete(prompt, opts);
      span.setAttributes({
        'llm.completion_tokens': response.usage.completion_tokens,
        'llm.cost_usd': calculateCost(response.usage),
      });
      return response;
    } catch (err) {
      span.recordException(err);
      throw err;
    } finally {
      span.end();
    }
  }
}

Production agent checklist

  • ✅ Separate planner from executor — never mix planning and execution in one LLM call
  • ✅ All three memory tiers: working (in-context), episodic (past runs), semantic (knowledge base)
  • ✅ Validate every tool call against typed schema before execution
  • ✅ Use an explicit state machine — no ad-hoc while loops
  • ✅ Trace every LLM call with tokens, cost, and latency
  • ✅ Set hard limits on steps, cost, and time per task run
  • ❌ Never deploy agents without maximum step and cost budgets
  • ❌ Never run high-stakes agent actions without human approval checkpoint

Production agents need solid infrastructure — serverless execution with optimized Lambda cold starts is ideal for agent executors. For persisting agent state, DynamoDB single-table design handles episodic memory storage efficiently. External reference: Lilian Weng’s AI agent architecture survey.

Level Up: AI Agents and LLM Engineering

Python Bootcamp on Udemy — Build real AI agents and automation tools with Python from scratch.

Designing Data-Intensive Applications — The infrastructure foundation every AI engineer needs.

Sponsored links. We may earn a commission at no extra cost to you.


Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.