Building an AI agent that demos well is easy. Building one that works reliably in production is a completely different engineering challenge. The gap between a LangChain tutorial and a production-grade agent is enormous: planning architecture, memory systems, error recovery, tool reliability, and observability. This covers the architecture decisions that determine whether your agent is a toy or a production tool.
⚡ TL;DR: Production AI agents need: separation of planning from execution, three memory tiers (working, episodic, semantic), tool call schema validation with retry, an explicit state machine not an ad-hoc loop, and full observability on every LLM call. Build the infrastructure first, then the intelligence.
The core pattern: planner-executor separation
// Anti-pattern: monolithic agent loop
async function badAgent(task) {
let result = await llm.call(task);
while (result.hasToolCall) {
result = await llm.call(executeTool(result) + previousContext);
}
return result; // Brittle, no planning, no recovery
}
// Production pattern: planner + executor separated
class ProductionAgent {
async run(task) {
// Phase 1: Planner creates explicit plan
const plan = await this.planner.createPlan({
task,
availableTools: this.tools.list(),
constraints: this.constraints,
});
// plan = { steps: [{tool, args, dependsOn, retryPolicy}], estimatedCost }
// Phase 2: Executor runs plan with oversight
const executor = new PlanExecutor(plan, this.tools);
return executor.run({
onStepComplete: this.monitor.recordStep,
onError: this.errorHandler.handle,
maxRetries: 3,
});
}
}
// Plan is inspectable BEFORE execution
// Executor retries individual steps without re-planning
// Each phase has its own observability
Three-tier memory architecture
class AgentMemory {
// Tier 1: Working memory - current task context (in LLM context)
working = new Map();
// Tier 2: Episodic memory - past task results (vector DB)
episodic = new VectorStore('episodes'); // "What happened when I did X?"
// Tier 3: Semantic memory - facts and knowledge (vector DB)
semantic = new VectorStore('knowledge'); // "What do I know about X?"
async recall(query) {
const [working, episodic, semantic] = await Promise.all([
this.searchWorking(query),
this.episodic.similaritySearch(query, { k: 3, threshold: 0.75 }),
this.semantic.similaritySearch(query, { k: 5, threshold: 0.70 }),
]);
return this.rankAndMerge([...working, ...episodic, ...semantic]);
}
async store(result) {
await this.episodic.upsert({
id: result.taskId,
vector: await this.embed(result.summary),
metadata: { task: result.task, outcome: result.outcome }
});
}
}
Tool calling with schema validation
class ToolRegistry {
register(name, fn, schema) {
this.tools.set(name, { fn, schema: z.object(schema) });
}
async call(name, rawArgs, context) {
const tool = this.tools.get(name);
if (!tool) return { error: 'tool_not_found', name };
// Validate BEFORE calling
const parsed = tool.schema.safeParse(rawArgs);
if (!parsed.success) {
// Structured error lets LLM self-correct
return { error: 'invalid_args', details: parsed.error.format() };
}
// Execute with timeout
const result = await Promise.race([
tool.fn(parsed.data, context),
new Promise((_, rej) => setTimeout(() => rej(new Error('timeout')), 10000))
]);
this.telemetry.record({ tool: name, args: parsed.data, result });
return result;
}
}
// Typed tool schema:
registry.register('search_code', searchCode, {
query: z.string().min(3).max(200),
extensions: z.array(z.enum(['.ts','.js','.py'])).optional(),
maxResults: z.number().int().min(1).max(20).default(5),
});
State machine — replace ad-hoc loops
const agentFSM = {
IDLE: { next: ['PLANNING'] },
PLANNING: { next: ['EXECUTING', 'FAILED'] },
EXECUTING: { next: ['WAITING_TOOL', 'REVIEWING', 'FAILED'] },
WAITING_TOOL: { next: ['EXECUTING', 'FAILED'] },
REVIEWING: { next: ['EXECUTING', 'COMPLETE', 'REPLANNING'] },
REPLANNING: { next: ['EXECUTING', 'FAILED'] },
COMPLETE: { next: ['IDLE'] },
FAILED: { next: ['IDLE'] },
};
class StatefulAgent {
state = 'IDLE';
history = [];
transition(next) {
if (!agentFSM[this.state].next.includes(next)) {
throw new Error(this.state + ' cannot transition to ' + next);
}
this.history.push({ from: this.state, to: next, at: Date.now() });
this.state = next;
this.telemetry.record('state_change', { from: this.state, to: next });
}
}
// Every state change is logged, validated, and replayable
Observability — trace every LLM call
class InstrumentedLLM {
async call(prompt, opts = {}) {
const span = tracer.startSpan('llm.call');
span.setAttributes({
'llm.model': opts.model,
'llm.prompt_tokens': estimateTokens(prompt),
'agent.task_id': this.context.taskId,
});
try {
const response = await this.client.complete(prompt, opts);
span.setAttributes({
'llm.completion_tokens': response.usage.completion_tokens,
'llm.cost_usd': calculateCost(response.usage),
});
return response;
} catch (err) {
span.recordException(err);
throw err;
} finally {
span.end();
}
}
}
Production agent checklist
- ✅ Separate planner from executor — never mix planning and execution in one LLM call
- ✅ All three memory tiers: working (in-context), episodic (past runs), semantic (knowledge base)
- ✅ Validate every tool call against typed schema before execution
- ✅ Use an explicit state machine — no ad-hoc while loops
- ✅ Trace every LLM call with tokens, cost, and latency
- ✅ Set hard limits on steps, cost, and time per task run
- ❌ Never deploy agents without maximum step and cost budgets
- ❌ Never run high-stakes agent actions without human approval checkpoint
Production agents need solid infrastructure — serverless execution with optimized Lambda cold starts is ideal for agent executors. For persisting agent state, DynamoDB single-table design handles episodic memory storage efficiently. External reference: Lilian Weng’s AI agent architecture survey.
Level Up: AI Agents and LLM Engineering
→ Python Bootcamp on Udemy — Build real AI agents and automation tools with Python from scratch.
→ Designing Data-Intensive Applications — The infrastructure foundation every AI engineer needs.
Sponsored links. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
