AWS Lambda SnapStart: Sub-100ms Cold Starts for Java Without Changing Your Code

AWS Lambda SnapStart: Sub-100ms Cold Starts for Java Without Changing Your Code

AWS Lambda SnapStart reduces Java cold starts from 8 seconds to under 100ms — and the most surprising thing about it is that it requires zero changes to your application code. It works by snapshotting the initialized execution environment after your init phase completes, then restoring from that snapshot instead of re-initializing from scratch on every cold start. This is the biggest Lambda performance improvement AWS has shipped in years, and most Java teams still haven’t adopted it.

TL;DR: Enable SnapStart on your Java 11/17/21 Lambda with one CloudFormation setting. Lambda snapshots your post-init JVM state. Cold starts restore from snapshot in <100ms instead of re-running your full init. Watch out for uniqueness issues: random seeds, timestamps, and network connections captured in the snapshot need special handling with the RuntimeHook interface.

How Lambda SnapStart actually works

Normal Lambda cold start for Java: JVM boots (500ms) → class loading (1–3s) → Spring context / framework init (2–5s) → first request handler runs. Total: 4–8 seconds. SnapStart changes this completely.

# SnapStart lifecycle:
# 1. Lambda runs your init code normally (once, at deployment)
# 2. After init completes, Lambda freezes the JVM and takes a memory snapshot (Firecracker microVM snapshot)
# 3. Snapshot is encrypted and stored in S3
# 4. Cold start = restore snapshot from S3 → thaw JVM → run handler
# 5. Thaw takes 100-200ms vs 4-8s for full init

# Enable in SAM template:
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions  # Required — only works on versions, not $LATEST
      AutoPublishAlias: live        # SAM automatically publishes a version and creates alias

The uniqueness problem — what breaks after restore

// Problems with naive SnapStart adoption:

// 1. Random seeds — SecureRandom initialized in init gets same seed after every restore
import java.security.SecureRandom;
// BAD: SecureRandom seeded once at init → all post-restore calls get same random sequence
private static final SecureRandom rng = new SecureRandom(); // Seeded in snapshot

// 2. Network connections — open sockets captured in snapshot are dead after restore
// DB connection pools, HTTP clients with keepalive, Redis connections all break

// 3. Timestamps — System.currentTimeMillis() captured in init gives stale time

// FIX: Implement CRaC (Coordinated Restore at Checkpoint) hooks via Lambda RuntimeHook:
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;

public class MyHandler implements RequestHandler<APIGatewayEvent, APIGatewayResponse>, Resource {

  private static SecureRandom rng;
  private static Connection dbConn;

  static {
    Core.getGlobalContext().register(new MyHandler());
  }

  @Override
  public void beforeCheckpoint(Context<? extends Resource> context) {
    // Called BEFORE snapshot is taken
    // Close all connections, save state
    if (dbConn != null) dbConn.close();
    System.out.println("Before checkpoint: closing connections");
  }

  @Override
  public void afterRestore(Context<? extends Resource> context) {
    // Called AFTER restore from snapshot
    // Re-initialize anything that breaks across checkpoint
    rng = new SecureRandom(); // Fresh seed after restore
    dbConn = createNewConnection(); // Fresh connection after restore
    System.out.println("After restore: re-initialized connections");
  }
}

Real benchmark: SnapStart vs cold start vs Provisioned Concurrency

# Measured on Spring Boot 3 Lambda, 512MB, us-east-1, Java 21

# Without SnapStart:
# Init duration:    5,840ms
# Cold start p50:   6,200ms
# Cold start p99:   8,900ms
# Monthly cost (100 cold starts/day): $0.00 extra (init is free)

# With SnapStart:
# Init duration:    5,840ms (same — runs once at deploy)
# Restore duration: 180ms
# Cold start p50:   210ms
# Cold start p99:   380ms
# Improvement:      96% reduction in cold start latency

# With Provisioned Concurrency (no cold starts at all):
# Cold start:       0ms (always warm)
# Extra cost:       ~$14/month per 1 PC unit (1 always-warm instance)

# Verdict:
# SnapStart = 96% of Provisioned Concurrency benefit at $0 extra cost
# Use Provisioned Concurrency only when you need guaranteed sub-10ms response

SnapStart with Quarkus and Micronaut (better than Spring)

// Quarkus with SnapStart: even better results because Quarkus native compile
// moves more work to build time

// pom.xml dependency for CRaC support:
// <dependency>
//   <groupId>io.quarkus</groupId>
//   <artifactId>quarkus-amazon-lambda</artifactId>
// </dependency>

// Quarkus + SnapStart benchmark (512MB, Java 21):
// Init:    1,200ms (vs 5,800ms Spring — 4x less init work)
// Restore: 140ms
// p99 cold start: 180ms

// Micronaut benchmark (512MB, Java 21):
// Init:    800ms
// Restore: 110ms
// p99 cold start: 150ms

// If you're starting a new Lambda: Micronaut > Quarkus > Spring for SnapStart performance
// If you're already on Spring: SnapStart still saves 96% — don't rewrite just for this

SnapStart limitations you must know

  • ✅ Supported runtimes: Java 11, Java 17, Java 21 — not Python, Node.js, or custom runtimes
  • ✅ Only works on published versions and aliases — not on $LATEST
  • ✅ Free — no additional cost beyond normal Lambda pricing
  • ⚠️ Snapshot stored encrypted in S3 — adds ~200ms to first restore after deployment
  • ⚠️ Must handle uniqueness issues with CRaC hooks (random, connections, timestamps)
  • ⚠️ Not available in all regions — check AWS docs for current availability
  • ❌ Does NOT work with Lambda@Edge or functions using $LATEST
  • ❌ Concurrent restores each get their own snapshot copy — memory is not shared

Production CloudFormation configuration

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: my-java-api
      Handler: com.example.Handler::handleRequest
      Runtime: java21
      MemorySize: 1024      # More memory = faster restore (more CPU allocated)
      Timeout: 30
      SnapStart:
        ApplyOn: PublishedVersions
      AutoPublishAlias: live
      Environment:
        Variables:
          JAVA_TOOL_OPTIONS: "-XX:+TieredCompilation -XX:TieredStopAtLevel=1"
          # TieredStopAtLevel=1 skips JIT compilation during init
          # Snapshot captures interpreted bytecode — JIT re-warms after restore
          # Trade-off: faster snapshot, slightly slower first requests post-restore

  # Point API Gateway at the alias, not $LATEST
  ApiGateway:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod

SnapStart is the most impactful free optimization available for Java Lambdas — combine it with the general Lambda cold start guide for a complete optimization strategy. For monitoring your SnapStart restore times, the CloudWatch Insights queries include a restore duration query that distinguishes @initDuration from restore events. Official reference: AWS Lambda SnapStart documentation.

Master AWS Lambda

AWS Solutions Architect Course on Udemy — The most comprehensive AWS course covering Lambda, serverless patterns, and production architecture.

AWS Certified Solutions Architect Study Guide — Deep Lambda chapter covering cold starts, VPC, layers, and SnapStart.

Sponsored links. We may earn a commission at no extra cost to you.


Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply