Design a Production Rate Limiter: Algorithms Seniors Actually Use in Interviews and at Work

Rate limiting is one of those system design topics where knowing the algorithm name is the easy part. Everyone knows “token bucket” and “sliding window”. What separates a senior answer from a junior one is knowing which to use when, how to implement them correctly in a distributed environment, and which edge cases will bite you in production.

⚡ TL;DR: Use token bucket for API rate limiting (handles bursts gracefully). Use sliding window log for strict per-user limits. Use fixed window for simple quota enforcement. All three need Redis in production — in-memory implementations don’t survive restarts or horizontal scaling.

Algorithm 1: Token Bucket (Best for API Rate Limiting)

Each user gets a bucket with a maximum capacity of N tokens. Tokens are added at a fixed rate (e.g., 10/second). Each request consumes 1 token. If the bucket is empty, reject the request. If the user makes no requests, tokens accumulate up to the bucket capacity.

// Redis Lua script — atomic token bucket (no race conditions)
// Run with: redis.call('EVAL', script, 1, key, capacity, rate, now)

const tokenBucketScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])  -- max tokens
local rate = tonumber(ARGV[2])      -- tokens per second
local now = tonumber(ARGV[3])       -- current timestamp (ms)

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Calculate tokens to add since last request
local elapsed = (now - last_refill) / 1000  -- seconds
local new_tokens = math.min(capacity, tokens + (elapsed * rate))

if new_tokens >= 1 then
  -- Allow request, consume 1 token
  redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
  redis.call('EXPIRE', key, 3600)  -- auto-expire inactive users
  return {1, math.floor(new_tokens - 1)}  -- {allowed, remaining}
else
  -- Reject request
  redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
  redis.call('EXPIRE', key, 3600)
  return {0, 0}  -- {rejected, remaining}
end
`;

// Node.js implementation
const redis = require('ioredis');
const client = new redis();

async function tokenBucket(userId, capacity = 100, rate = 10) {
  const key = `ratelimit:token:${userId}`;
  const now = Date.now();
  const [allowed, remaining] = await client.eval(
    tokenBucketScript, 1, key, capacity, rate, now
  );
  return {
    allowed: allowed === 1,
    remaining: parseInt(remaining),
    retryAfter: allowed ? null : Math.ceil(1000 / rate) // ms until next token
  };
}

// Express middleware
app.use(async (req, res, next) => {
  const result = await tokenBucket(req.user.id);
  res.set('X-RateLimit-Remaining', result.remaining);
  if (!result.allowed) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfter: result.retryAfter + 'ms'
    });
  }
  next();
});

Algorithm 2: Sliding Window Log (Most Accurate)

System design and architecture

→ System Design Complete Guide (Udemy) — Full rate limiting module — token bucket, Redis, and distributed design.

Sponsored links. We may earn a commission at no extra cost to you.

// Stores timestamp of each request in a sorted set
// Counts requests in the rolling window (e.g., last 60 seconds)
// Most accurate but uses the most memory (O(requests) per user)

async function slidingWindowLog(userId, limit = 100, windowMs = 60000) {
  const key = `ratelimit:swl:${userId}`;
  const now = Date.now();
  const windowStart = now - windowMs;

  const pipeline = client.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart); // Remove old entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
  pipeline.zcard(key);                             // Count requests in window
  pipeline.expire(key, Math.ceil(windowMs / 1000));
  const results = await pipeline.exec();

  const requestCount = results[2][1];
  const allowed = requestCount <= limit;

  if (!allowed) {
    // Remove the request we just added (rejected)
    await client.zremrangebyscore(key, now, now);
  }

  return {
    allowed,
    remaining: Math.max(0, limit - requestCount),
    resetAt: windowStart + windowMs
  };
}
// Advantage: perfectly accurate, no boundary spikes
// Disadvantage: Redis memory grows with request volume

Algorithm 3: Fixed Window Counter (Simplest, Most Common)

// Simplest algorithm — count requests per time window
// Gotcha: boundary spike — user can make 2x limit at window boundary

async function fixedWindow(userId, limit = 100, windowSec = 60) {
  const window = Math.floor(Date.now() / 1000 / windowSec); // Current window ID
  const key = `ratelimit:fw:${userId}:${window}`;

  const count = await client.incr(key);
  if (count === 1) {
    // First request in this window — set expiry
    await client.expire(key, windowSec * 2);
  }

  return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    resetAt: (window + 1) * windowSec * 1000 // Next window start
  };
}

// The boundary spike problem:
// Window: 12:00:00 → 12:01:00
// User sends 100 requests at 12:00:59 (allowed — new window starts at :00)
// User sends 100 more at 12:01:01 (allowed — new window)
// Effectively 200 requests in 2 seconds — NOT 100/minute
// Fix: use sliding window log OR sliding window counter for strict limits

Distributed Rate Limiting — The Production Gotchas

// Problem 1: Multiple API gateway nodes without Redis coordination
// Node A and Node B each have their own in-memory counter
// User gets 100 req/min per node = 100 * N nodes effective limit
// ✅ Fix: Always use Redis (or Memcached) for shared state

// Problem 2: Redis connection failure = no rate limiting or total outage
// ✅ Fix: Fail open (allow requests) on Redis error, log it, alert
async function rateLimitWithFallback(userId) {
  try {
    return await tokenBucket(userId);
  } catch (err) {
    console.error('Rate limiter Redis error:', err);
    // Fail open — better than taking down the service
    return { allowed: true, remaining: -1, error: true };
  }
}

// Problem 3: Different limits per endpoint or tier
// ✅ Fix: Include resource in key
const key = `ratelimit:${userId}:${endpoint}:${tier}`;
// E.g.: "ratelimit:u123:/api/search:free"

// Problem 4: Rate limit headers for good UX
res.set({
  'X-RateLimit-Limit': limit,
  'X-RateLimit-Remaining': result.remaining,
  'X-RateLimit-Reset': result.resetAt,   // Unix timestamp
  'Retry-After': result.retryAfter / 1000 // Seconds (RFC 7231)
});

Algorithm Comparison

Algorithm	Burst Handling	Accuracy	Memory	Use When
Token Bucket	✅ Allows bursts	Good	O(1)	API rate limiting, billing
Sliding Window Log	No bursts	✅ Perfect	O(requests)	Strict per-user limits
Fixed Window	Boundary spike	Good	✅ O(1)	Simple quota enforcement
Sliding Window Counter	Controlled	Very good	✅ O(1)	Balance of all factors

Rate Limiter Cheat Sheet

✅ Always use Redis for distributed rate limiting — never in-memory only
✅ Use Lua scripts for atomicity — prevents race conditions
✅ Set proper rate limit headers on every response including 429s
✅ Fail open on Redis errors — log and alert but don’t block users
✅ Include endpoint + user tier in the Redis key
❌ Never use fixed window for strict limits — boundary spike is real
❌ Never rate limit by IP alone — easy to bypass, hurts shared IPs

Rate limiting is foundational system design — see how it connects to the DynamoDB single-table pattern for storing rate limit configuration per user tier. On the infrastructure side, AWS Lambda has built-in throttling via reserved concurrency that acts as a server-side rate limiter — useful to understand alongside application-level limiting. Official reference: Redis rate limiting patterns.

Recommended resources

System Design Interview Vol 2 by Alex Xu — Chapter 4 is entirely dedicated to rate limiter design — token bucket, sliding window, and distributed rate limiting with Redis. The most complete treatment of this topic in interview prep format.
Designing Data-Intensive Applications — The replication and distributed systems chapters explain the consistency guarantees you need for distributed rate limiting to be correct.

Disclosure: This post contains affiliate links. If you purchase through these links, CheatCoders earns a small commission at no extra cost to you. We only recommend tools and books we genuinely find valuable.

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Design a Production Rate Limiter: Algorithms Seniors Actually Use in Interviews and at Work

Algorithm 1: Token Bucket (Best for API Rate Limiting)

Algorithm 2: Sliding Window Log (Most Accurate)

Algorithm 3: Fixed Window Counter (Simplest, Most Common)

Distributed Rate Limiting — The Production Gotchas

Algorithm Comparison

Rate Limiter Cheat Sheet

Recommended resources

Like this:

Related

Discover more from CheatCoders

Algorithm 1: Token Bucket (Best for API Rate Limiting)

Algorithm 2: Sliding Window Log (Most Accurate)

Algorithm 3: Fixed Window Counter (Simplest, Most Common)

Distributed Rate Limiting — The Production Gotchas

Algorithm Comparison

Rate Limiter Cheat Sheet

Recommended resources

🚀 Don’t Miss the Next Cheat Code

Share this:

Like this:

Related

Discover more from CheatCoders