The URL shortener is the “hello world” of system design interviews — and the most commonly botched one. Everyone knows the basics: hash the URL, store it, redirect on lookup. What separates senior answers is the full chain of decisions: encoding choice, collision handling, cache strategy, read/write ratio optimization, analytics, and abuse prevention.
⚡ TL;DR: Use base62 encoding of an auto-increment ID for short code generation — simpler than hashing, no collision risk. Cache aggressively (redirects are 99% reads). Use a separate analytics pipeline to avoid write contention. This design handles 100M requests/day on 3 servers.
Requirements Scoping First
// Before designing, establish numbers:
// Write: 100 URLs shortened per second = 8.6M/day
// Read: 10,000 redirects per second = 864M/day (100:1 read:write ratio)
// Storage: 500 bytes per URL * 8.6M/day * 365 days * 5 years = ~8TB
// Short code length: base62^6 = 56 billion combinations (sufficient for years)
// Key insight: this is an extremely READ-HEAVY system
// 99% of traffic is redirects, not shortening
// Design optimizes for fast reads above all else
Short Code Generation: Base62 vs Hashing
// Option 1: Hash the URL (MD5/SHA) — DON'T do this
// Pros: deterministic, same URL = same code
// Cons: collisions, need to store + check, first 6 chars are not unique
// Option 2: Base62 encode an auto-increment ID — DO THIS
const CHARS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
function toBase62(num) {
let result = '';
while (num > 0) {
result = CHARS[num % 62] + result;
num = Math.floor(num / 62);
}
return result.padStart(6, 'a'); // Always 6 chars
}
function fromBase62(str) {
return str.split('').reduce((acc, char) => acc * 62 + CHARS.indexOf(char), 0);
}
// ID 1 → 'aaaaab'
// ID 1,000,000 → 'aadToG'
// ID 56 billion → 'zzzzzz' (6 chars cover all IDs for 10+ years)
// Advantages:
// - Zero collision risk
// - O(1) decode — just convert base62 to integer, lookup by primary key
// - Predictable, no birthday paradox
Database Schema
-- Core URL table (MySQL/PostgreSQL)
CREATE TABLE urls (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
short_code VARCHAR(10) GENERATED ALWAYS AS (base62_encode(id)) VIRTUAL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NULL,
click_count BIGINT DEFAULT 0, -- Denormalized for fast display
is_active BOOLEAN DEFAULT TRUE,
INDEX idx_short_code (short_code),
INDEX idx_user_id (user_id)
);
-- Separate analytics table (write-heavy, partitioned by date)
CREATE TABLE url_clicks (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
url_id BIGINT NOT NULL,
clicked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ip_hash CHAR(64), -- Hashed for privacy
country VARCHAR(2),
device VARCHAR(20),
referrer VARCHAR(500)
) PARTITION BY RANGE (YEAR(clicked_at) * 100 + MONTH(clicked_at));
Caching Strategy: The Key to 100M Requests/Day
// Redis cache for hot URLs (20% of URLs get 80% of traffic)
// On redirect request:
async function redirect(shortCode) {
// 1. Check Redis first (sub-millisecond)
const cached = await redis.get(`url:${shortCode}`);
if (cached) {
logClickAsync(shortCode); // Fire-and-forget analytics
return cached; // 99% of requests end here
}
// 2. Cache miss — query database
const url = await db.query(
'SELECT long_url, expires_at, is_active FROM urls WHERE short_code = ?',
[shortCode]
);
if (!url || !url.is_active || url.expires_at < new Date()) {
return null;
}
// 3. Populate cache with TTL
await redis.setex(`url:${shortCode}`, 86400, url.long_url); // 24hr TTL
logClickAsync(shortCode);
return url.long_url;
}
// Cache eviction: LRU with 24hr TTL
// Cache size: 10GB Redis = ~20M cached URLs
// Hit rate: ~90%+ for popular URLs
// DB load: reduced by 10x
System Architecture at Scale
// Component breakdown for 100M requests/day:
// 1. Load Balancer (nginx/CloudFront)
// - Routes /:shortCode to redirect service
// - Routes /api/* to shortener service
// - Terminates SSL
// 2. Redirect Service (stateless, 3+ instances)
// - GET /:shortCode → Redis lookup → DB fallback → 301 redirect
// - Stateless = easily horizontally scaled
// - Target: < 50ms p99 response time
// 3. Shortener Service (lower traffic, 1-2 instances)
// - POST /api/shorten → generate ID → write DB → return short URL
// - Rate limited per user
// 4. MySQL Primary + 2 Read Replicas
// - Writes go to primary
// - Reads (cache misses) go to replicas
// 5. Redis Cluster (3 nodes)
// - Short code → long URL cache
// - Rate limiting counters
// 6. Analytics Queue (Kafka/SQS)
// - Click events written to queue async
// - Consumer processes batch inserts to ClickHouse
URL Shortener Design Cheat Sheet
- ✅ Use base62(auto_increment_id) for short codes — no collisions, O(1) decode
- ✅ Cache aggressively in Redis — 90%+ of redirects should never hit the DB
- ✅ Separate analytics pipeline — never block redirects on analytics writes
- ✅ Use 301 for permanent redirects, 302 if you need click analytics (301 gets cached by browser)
- ✅ Partition analytics table by month — queries stay fast as data grows
- ❌ Don't hash URLs — base62 IDs are simpler with zero collision risk
- ❌ Don't write analytics synchronously — it adds latency to every redirect
The Redis caching layer here uses the same principles as the rate limiter design guide — atomic Redis operations for both use cases. For the database layer, the PostgreSQL query optimization guide covers how to keep the URL lookup index performing at scale. External reference: System Design One — URL shortener deep dive.
Master system design for senior engineering interviews
→ View Course on Udemy — Hands-on video course covering every concept in this post and more.
Sponsored link. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.

I’ve read a few excellent stuff here. Certainly worth bookmarking for revisiting.
I wonder how a lot effort you place to create the sort of
wonderful informative web site.