Code Generation with LLMs: Structured Output, AST Manipulation, and Avoiding Hallucinated APIs

Code Generation with LLMs: Structured Output, AST Manipulation, and Avoiding Hallucinated APIs

The naive approach to LLM code generation — prompt, receive code, use it — breaks in production because LLMs hallucinate API signatures, invent non-existent library methods, and produce syntactically correct but semantically broken code. Reliable code generation is a pipeline, not a single call. Structured output, compilation validation, and test execution are what make generated code trustworthy enough to automate.

TL;DR: Use structured output (JSON schema + Zod/Pydantic validation) to constrain code generation. Pipe generated code through a compiler to catch syntax and type errors. Run a test harness against generated functions. Use retrieval-augmented generation (RAG) with your actual API docs to prevent hallucinated method calls.

Structured output for code generation

// Force LLM to return structured code with metadata
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';

const CodeGenerationOutput = z.object({
  code: z.string().describe('The generated TypeScript code'),
  imports: z.array(z.string()).describe('Required import statements'),
  exports: z.array(z.string()).describe('Exported function/class names'),
  dependencies: z.array(z.object({
    package: z.string(),
    version: z.string(),
    reason: z.string()
  })).describe('npm packages required'),
  tests: z.array(z.object({
    description: z.string(),
    input: z.unknown(),
    expectedOutput: z.unknown()
  })).describe('Test cases for the generated code'),
  confidence: z.enum(['high', 'medium', 'low']),
  caveats: z.array(z.string()).describe('Known limitations or assumptions')
});

async function generateCode(spec) {
  const response = await client.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 4000,
    system: 'You are a code generator. Return ONLY valid JSON matching the schema. No prose.',
    messages: [{ role: 'user', content:
      'Generate TypeScript code for: ' + spec + '\n' +
      'Return JSON with these fields: code, imports, exports, dependencies, tests, confidence, caveats'
    }]
  });

  const raw = JSON.parse(response.content[0].text);
  return CodeGenerationOutput.parse(raw); // Throws if schema violated
}

Compilation validation pipeline

import { execSync } from 'child_process';
import * as fs from 'fs';
import * as ts from 'typescript';

async function validateGeneratedCode(generated) {
  const tmpFile = `/tmp/generated-${Date.now()}.ts`;

  // Write to temp file
  const fullCode = generated.imports.join('\n') + '\n\n' + generated.code;
  fs.writeFileSync(tmpFile, fullCode);

  // TypeScript compilation check
  const program = ts.createProgram([tmpFile], {
    strict: true,
    target: ts.ScriptTarget.ES2022,
    module: ts.ModuleKind.ESNext,
    noEmit: true,
  });

  const diagnostics = ts.getPreEmitDiagnostics(program);
  const errors = diagnostics
    .filter(d => d.category === ts.DiagnosticCategory.Error)
    .map(d => ts.flattenDiagnosticMessageText(d.messageText, '\n'));

  if (errors.length > 0) {
    // Auto-fix: send errors back to LLM
    return await fixCode(generated.code, errors);
  }

  // Run generated tests
  fs.writeFileSync(tmpFile.replace('.ts', '.test.ts'), buildTestSuite(generated));
  try {
    execSync('npx jest ' + tmpFile.replace('.ts', '.test.ts') + ' --passWithNoTests',
      { timeout: 30000 });
  } catch (err) {
    return await fixCode(generated.code, [err.stdout?.toString() || err.message]);
  }

  return { ...generated, validated: true };
}

async function fixCode(code, errors) {
  // Re-prompt with specific errors
  const fixed = await client.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 2000,
    messages: [{
      role: 'user',
      content: 'Fix these TypeScript errors in the code.\nErrors:\n' +
        errors.join('\n') + '\n\nCode:\n' + code
    }]
  });
  return { code: fixed.content[0].text }; // Re-validate after fix
}

RAG for accurate API usage — prevent hallucinated methods

// LLMs hallucinate API methods because they interpolate from similar libraries
// Solution: inject actual API docs into context via RAG

// 1. Build a vector store of your actual API documentation
const apiDocs = [
  { path: 'prisma.findMany', signature: 'findMany(args?: FindManyArgs): Promise',
    example: 'await db.user.findMany({ where: { active: true } })' },
  { path: 'prisma.findUnique', signature: 'findUnique(args: FindUniqueArgs): Promise',
    example: 'await db.user.findUnique({ where: { id: userId } })' },
  // ...all 50+ Prisma methods
];

// 2. Retrieve relevant docs based on generation task
async function getRelevantAPIDocs(task) {
  // Use embeddings to find most relevant API docs
  const taskEmbedding = await embed(task);
  return apiDocs
    .map(doc => ({ ...doc, similarity: cosineSimilarity(taskEmbedding, doc.embedding) }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, 10);
}

// 3. Inject into system prompt
const relevantDocs = await getRelevantAPIDocs(spec);
const systemPrompt = 'Generate code using ONLY these exact API methods:\n'
  + relevantDocs.map(d => `${d.path}: ${d.signature}\nExample: ${d.example}`).join('\n\n')
  + '\n\nNever invent methods not listed above.';

AST-based code manipulation — precision editing

// For surgical code modifications, use AST manipulation instead of LLM for the edit
import * as ts from 'typescript';

function addReturnTypeAnnotation(sourceCode, functionName, returnType) {
  const sourceFile = ts.createSourceFile(
    'temp.ts', sourceCode, ts.ScriptTarget.Latest, true
  );

  const transformer = (context) => (rootNode) => {
    function visit(node) {
      if (ts.isFunctionDeclaration(node) &&
          node.name?.text === functionName &&
          !node.type) {  // No return type yet
        return ts.factory.updateFunctionDeclaration(
          node, node.modifiers, node.asteriskToken, node.name,
          node.typeParameters, node.parameters,
          ts.factory.createTypeReferenceNode(returnType),  // Add return type
          node.body
        );
      }
      return ts.visitEachChild(node, visit, context);
    }
    return ts.visitNode(rootNode, visit);
  };

  const result = ts.transform(sourceFile, [transformer]);
  const printer = ts.createPrinter();
  return printer.printFile(result.transformed[0]);
}

// Use LLM for "what to add", AST for "how to add it precisely"
// This eliminates the class of errors where LLM edits break surrounding code
  • ✅ Structured output schemas (Zod/Pydantic) to constrain generation format
  • ✅ TypeScript compilation check on every generated file before use
  • ✅ RAG with actual API docs to prevent hallucinated method calls
  • ✅ Auto-fix loop: send compilation errors back to LLM for self-correction
  • ✅ AST manipulation for precise edits, LLM for what to generate
  • ❌ Never use generated code without compilation + test validation
  • ❌ Never let LLM invent API usage — always provide actual docs in context

Code generation pipelines work best with the TypeScript generic patterns that make the generated type signatures correct by construction. For deploying code generation pipelines as APIs, Lambda streaming lets you stream generated code tokens to the UI as they arrive. External reference: Anthropic structured outputs documentation.

Level up your AI development skills

View Course on Udemy — The most comprehensive hands-on course covering every concept in this post with real projects.

Building LLM Powered Applications (Amazon) — The definitive book on building production AI systems and agents.

Sponsored links. We may earn a commission at no extra cost to you.


Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply