Rate limiting is one of those system design topics where knowing the algorithm name is the easy part. Everyone knows “token bucket” and “sliding window”. What separates a senior answer from a junior one is knowing which to use when, how to implement them correctly in a distributed environment, and which edge cases will bite you in production.
⚡ TL;DR: Use token bucket for API rate limiting (handles bursts gracefully). Use sliding window log for strict per-user limits. Use fixed window for simple quota enforcement. All three need Redis in production — in-memory implementations don’t survive restarts or horizontal scaling.
Algorithm 1: Token Bucket (Best for API Rate Limiting)
Each user gets a bucket with a maximum capacity of N tokens. Tokens are added at a fixed rate (e.g., 10/second). Each request consumes 1 token. If the bucket is empty, reject the request. If the user makes no requests, tokens accumulate up to the bucket capacity.
// Redis Lua script — atomic token bucket (no race conditions)
// Run with: redis.call('EVAL', script, 1, key, capacity, rate, now)
const tokenBucketScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1]) -- max tokens
local rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3]) -- current timestamp (ms)
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Calculate tokens to add since last request
local elapsed = (now - last_refill) / 1000 -- seconds
local new_tokens = math.min(capacity, tokens + (elapsed * rate))
if new_tokens >= 1 then
-- Allow request, consume 1 token
redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
redis.call('EXPIRE', key, 3600) -- auto-expire inactive users
return {1, math.floor(new_tokens - 1)} -- {allowed, remaining}
else
-- Reject request
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {0, 0} -- {rejected, remaining}
end
`;
// Node.js implementation
const redis = require('ioredis');
const client = new redis();
async function tokenBucket(userId, capacity = 100, rate = 10) {
const key = `ratelimit:token:${userId}`;
const now = Date.now();
const [allowed, remaining] = await client.eval(
tokenBucketScript, 1, key, capacity, rate, now
);
return {
allowed: allowed === 1,
remaining: parseInt(remaining),
retryAfter: allowed ? null : Math.ceil(1000 / rate) // ms until next token
};
}
// Express middleware
app.use(async (req, res, next) => {
const result = await tokenBucket(req.user.id);
res.set('X-RateLimit-Remaining', result.remaining);
if (!result.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: result.retryAfter + 'ms'
});
}
next();
});
Algorithm 2: Sliding Window Log (Most Accurate)
System design and architecture
→ System Design Complete Guide (Udemy) — Full rate limiting module — token bucket, Redis, and distributed design.
Sponsored links. We may earn a commission at no extra cost to you.
// Stores timestamp of each request in a sorted set
// Counts requests in the rolling window (e.g., last 60 seconds)
// Most accurate but uses the most memory (O(requests) per user)
async function slidingWindowLog(userId, limit = 100, windowMs = 60000) {
const key = `ratelimit:swl:${userId}`;
const now = Date.now();
const windowStart = now - windowMs;
const pipeline = client.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
pipeline.zcard(key); // Count requests in window
pipeline.expire(key, Math.ceil(windowMs / 1000));
const results = await pipeline.exec();
const requestCount = results[2][1];
const allowed = requestCount <= limit;
if (!allowed) {
// Remove the request we just added (rejected)
await client.zremrangebyscore(key, now, now);
}
return {
allowed,
remaining: Math.max(0, limit - requestCount),
resetAt: windowStart + windowMs
};
}
// Advantage: perfectly accurate, no boundary spikes
// Disadvantage: Redis memory grows with request volume
Algorithm 3: Fixed Window Counter (Simplest, Most Common)
// Simplest algorithm — count requests per time window
// Gotcha: boundary spike — user can make 2x limit at window boundary
async function fixedWindow(userId, limit = 100, windowSec = 60) {
const window = Math.floor(Date.now() / 1000 / windowSec); // Current window ID
const key = `ratelimit:fw:${userId}:${window}`;
const count = await client.incr(key);
if (count === 1) {
// First request in this window — set expiry
await client.expire(key, windowSec * 2);
}
return {
allowed: count <= limit,
remaining: Math.max(0, limit - count),
resetAt: (window + 1) * windowSec * 1000 // Next window start
};
}
// The boundary spike problem:
// Window: 12:00:00 → 12:01:00
// User sends 100 requests at 12:00:59 (allowed — new window starts at :00)
// User sends 100 more at 12:01:01 (allowed — new window)
// Effectively 200 requests in 2 seconds — NOT 100/minute
// Fix: use sliding window log OR sliding window counter for strict limits
Distributed Rate Limiting — The Production Gotchas
// Problem 1: Multiple API gateway nodes without Redis coordination
// Node A and Node B each have their own in-memory counter
// User gets 100 req/min per node = 100 * N nodes effective limit
// ✅ Fix: Always use Redis (or Memcached) for shared state
// Problem 2: Redis connection failure = no rate limiting or total outage
// ✅ Fix: Fail open (allow requests) on Redis error, log it, alert
async function rateLimitWithFallback(userId) {
try {
return await tokenBucket(userId);
} catch (err) {
console.error('Rate limiter Redis error:', err);
// Fail open — better than taking down the service
return { allowed: true, remaining: -1, error: true };
}
}
// Problem 3: Different limits per endpoint or tier
// ✅ Fix: Include resource in key
const key = `ratelimit:${userId}:${endpoint}:${tier}`;
// E.g.: "ratelimit:u123:/api/search:free"
// Problem 4: Rate limit headers for good UX
res.set({
'X-RateLimit-Limit': limit,
'X-RateLimit-Remaining': result.remaining,
'X-RateLimit-Reset': result.resetAt, // Unix timestamp
'Retry-After': result.retryAfter / 1000 // Seconds (RFC 7231)
});
Algorithm Comparison
| Algorithm | Burst Handling | Accuracy | Memory | Use When |
|---|---|---|---|---|
| Token Bucket | ✅ Allows bursts | Good | O(1) | API rate limiting, billing |
| Sliding Window Log | No bursts | ✅ Perfect | O(requests) | Strict per-user limits |
| Fixed Window | Boundary spike | Good | ✅ O(1) | Simple quota enforcement |
| Sliding Window Counter | Controlled | Very good | ✅ O(1) | Balance of all factors |
Rate Limiter Cheat Sheet
- ✅ Always use Redis for distributed rate limiting — never in-memory only
- ✅ Use Lua scripts for atomicity — prevents race conditions
- ✅ Set proper rate limit headers on every response including 429s
- ✅ Fail open on Redis errors — log and alert but don’t block users
- ✅ Include endpoint + user tier in the Redis key
- ❌ Never use fixed window for strict limits — boundary spike is real
- ❌ Never rate limit by IP alone — easy to bypass, hurts shared IPs
Rate limiting is foundational system design — see how it connects to the DynamoDB single-table pattern for storing rate limit configuration per user tier. On the infrastructure side, AWS Lambda has built-in throttling via reserved concurrency that acts as a server-side rate limiter — useful to understand alongside application-level limiting. Official reference: Redis rate limiting patterns.
Recommended resources
- System Design Interview Vol 2 by Alex Xu — Chapter 4 is entirely dedicated to rate limiter design — token bucket, sliding window, and distributed rate limiting with Redis. The most complete treatment of this topic in interview prep format.
- Designing Data-Intensive Applications — The replication and distributed systems chapters explain the consistency guarantees you need for distributed rate limiting to be correct.
Disclosure: This post contains affiliate links. If you purchase through these links, CheatCoders earns a small commission at no extra cost to you. We only recommend tools and books we genuinely find valuable.
Free Weekly Newsletter
🚀 Don’t Miss the Next Cheat Code
You just read something most developers never learn. Get more secrets like this delivered every week — JavaScript internals, Python optimizations, AWS architectures, system design, and AI workflows.
Join 1,000+ senior developers who actually level up. Zero fluff, pure signal.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
