Designing a URL Shortener That Handles 100 Million Requests Per Day

The URL shortener is the “hello world” of system design interviews — and the most commonly botched one. Everyone knows the basics: hash the URL, store it, redirect on lookup. What separates senior answers is the full chain of decisions: encoding choice, collision handling, cache strategy, read/write ratio optimization, analytics, and abuse prevention.

⚡ TL;DR: Use base62 encoding of an auto-increment ID for short code generation — simpler than hashing, no collision risk. Cache aggressively (redirects are 99% reads). Use a separate analytics pipeline to avoid write contention. This design handles 100M requests/day on 3 servers.

Requirements Scoping First

// Before designing, establish numbers:
// Write: 100 URLs shortened per second = 8.6M/day
// Read:  10,000 redirects per second = 864M/day  (100:1 read:write ratio)
// Storage: 500 bytes per URL * 8.6M/day * 365 days * 5 years = ~8TB
// Short code length: base62^6 = 56 billion combinations (sufficient for years)

// Key insight: this is an extremely READ-HEAVY system
// 99% of traffic is redirects, not shortening
// Design optimizes for fast reads above all else

Short Code Generation: Base62 vs Hashing

// Option 1: Hash the URL (MD5/SHA) — DON'T do this
// Pros: deterministic, same URL = same code
// Cons: collisions, need to store + check, first 6 chars are not unique

// Option 2: Base62 encode an auto-increment ID — DO THIS
const CHARS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';

function toBase62(num) {
  let result = '';
  while (num > 0) {
    result = CHARS[num % 62] + result;
    num = Math.floor(num / 62);
  }
  return result.padStart(6, 'a'); // Always 6 chars
}

function fromBase62(str) {
  return str.split('').reduce((acc, char) => acc * 62 + CHARS.indexOf(char), 0);
}

// ID 1         → 'aaaaab'
// ID 1,000,000 → 'aadToG'
// ID 56 billion → 'zzzzzz' (6 chars cover all IDs for 10+ years)

// Advantages:
// - Zero collision risk
// - O(1) decode — just convert base62 to integer, lookup by primary key
// - Predictable, no birthday paradox

Database Schema

-- Core URL table (MySQL/PostgreSQL)
CREATE TABLE urls (
  id          BIGINT AUTO_INCREMENT PRIMARY KEY,
  short_code  VARCHAR(10) GENERATED ALWAYS AS (base62_encode(id)) VIRTUAL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at  TIMESTAMP NULL,
  click_count BIGINT DEFAULT 0,  -- Denormalized for fast display
  is_active   BOOLEAN DEFAULT TRUE,
  INDEX idx_short_code (short_code),
  INDEX idx_user_id (user_id)
);

-- Separate analytics table (write-heavy, partitioned by date)
CREATE TABLE url_clicks (
  id          BIGINT AUTO_INCREMENT PRIMARY KEY,
  url_id      BIGINT NOT NULL,
  clicked_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  ip_hash     CHAR(64),     -- Hashed for privacy
  country     VARCHAR(2),
  device      VARCHAR(20),
  referrer    VARCHAR(500)
) PARTITION BY RANGE (YEAR(clicked_at) * 100 + MONTH(clicked_at));

Caching Strategy: The Key to 100M Requests/Day

// Redis cache for hot URLs (20% of URLs get 80% of traffic)

// On redirect request:
async function redirect(shortCode) {
  // 1. Check Redis first (sub-millisecond)
  const cached = await redis.get(`url:${shortCode}`);
  if (cached) {
    logClickAsync(shortCode); // Fire-and-forget analytics
    return cached; // 99% of requests end here
  }

  // 2. Cache miss — query database
  const url = await db.query(
    'SELECT long_url, expires_at, is_active FROM urls WHERE short_code = ?',
    [shortCode]
  );

  if (!url || !url.is_active || url.expires_at < new Date()) {
    return null;
  }

  // 3. Populate cache with TTL
  await redis.setex(`url:${shortCode}`, 86400, url.long_url); // 24hr TTL
  logClickAsync(shortCode);
  return url.long_url;
}

// Cache eviction: LRU with 24hr TTL
// Cache size: 10GB Redis = ~20M cached URLs
// Hit rate: ~90%+ for popular URLs
// DB load: reduced by 10x

System Architecture at Scale

// Component breakdown for 100M requests/day:

// 1. Load Balancer (nginx/CloudFront)
//    - Routes /:shortCode to redirect service
//    - Routes /api/* to shortener service
//    - Terminates SSL

// 2. Redirect Service (stateless, 3+ instances)
//    - GET /:shortCode → Redis lookup → DB fallback → 301 redirect
//    - Stateless = easily horizontally scaled
//    - Target: < 50ms p99 response time

// 3. Shortener Service (lower traffic, 1-2 instances)
//    - POST /api/shorten → generate ID → write DB → return short URL
//    - Rate limited per user

// 4. MySQL Primary + 2 Read Replicas
//    - Writes go to primary
//    - Reads (cache misses) go to replicas

// 5. Redis Cluster (3 nodes)
//    - Short code → long URL cache
//    - Rate limiting counters

// 6. Analytics Queue (Kafka/SQS)
//    - Click events written to queue async
//    - Consumer processes batch inserts to ClickHouse

URL Shortener Design Cheat Sheet

✅ Use base62(auto_increment_id) for short codes — no collisions, O(1) decode
✅ Cache aggressively in Redis — 90%+ of redirects should never hit the DB
✅ Separate analytics pipeline — never block redirects on analytics writes
✅ Use 301 for permanent redirects, 302 if you need click analytics (301 gets cached by browser)
✅ Partition analytics table by month — queries stay fast as data grows
❌ Don't hash URLs — base62 IDs are simpler with zero collision risk
❌ Don't write analytics synchronously — it adds latency to every redirect

The Redis caching layer here uses the same principles as the rate limiter design guide — atomic Redis operations for both use cases. For the database layer, the PostgreSQL query optimization guide covers how to keep the URL lookup index performing at scale. External reference: System Design One — URL shortener deep dive.

Master system design for senior engineering interviews

→ View Course on Udemy — Hands-on video course covering every concept in this post and more.

Sponsored link. We may earn a commission at no extra cost to you.

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Designing a URL Shortener That Handles 100 Million Requests Per Day

Requirements Scoping First

Short Code Generation: Base62 vs Hashing

Database Schema

Caching Strategy: The Key to 100M Requests/Day

System Architecture at Scale

URL Shortener Design Cheat Sheet

Like this:

Related

Discover more from CheatCoders

1 Comment

Requirements Scoping First

Short Code Generation: Base62 vs Hashing

Database Schema

Caching Strategy: The Key to 100M Requests/Day

System Architecture at Scale

URL Shortener Design Cheat Sheet

🚀 Don’t Miss the Next Cheat Code

Share this:

Like this:

Related

Discover more from CheatCoders

1 Comment