Hash Tables Explained: How They Work, Collision Resolution, and Real-World Use Cases

Hash Tables Explained: How They Work, Collision Resolution, and Real-World Use Cases

Hash tables are the most used data structure in software engineering. Python dicts, JavaScript objects and Maps, Java HashMaps, Redis — all built on hash tables. Understanding how they work — not just how to use them — unlocks deep insight into performance, collision handling, and why some operations are O(1) average but not O(1) worst case.

TL;DR: Hash table = array + hash function. Hash function converts key → array index. Collisions (two keys → same index) handled by chaining or open addressing. Load factor (items/capacity) determines when to resize. Python dicts and JS Maps are hash tables with open addressing and chaining respectively.

How hash tables work

// Hash table: array indexed by hash(key)
class HashMap {
  constructor(size = 16) {
    this.buckets = new Array(size).fill(null).map(() => []);
    this.size = size;
    this.count = 0;
  }

  #hash(key) {
    // Convert key to array index
    let hash = 0;
    for (const char of String(key)) {
      hash = (hash * 31 + char.charCodeAt(0)) % this.size;
    }
    return hash; // O(key length) — treated as O(1) for fixed-length keys
  }

  set(key, value) {
    const idx = this.#hash(key);
    const bucket = this.buckets[idx];
    const existing = bucket.find(([k]) => k === key);
    if (existing) { existing[1] = value; return; }
    bucket.push([key, value]); // Chaining: multiple entries per bucket
    this.count++;
    if (this.count / this.size > 0.75) this.#resize(); // Load factor threshold
  }

  get(key) {
    const bucket = this.buckets[this.#hash(key)];
    return bucket.find(([k]) => k === key)?.[1];
  }

  #resize() {
    const old = this.buckets;
    this.size *= 2;
    this.buckets = new Array(this.size).fill(null).map(() => []);
    this.count = 0;
    old.forEach(bucket => bucket.forEach(([k, v]) => this.set(k, v)));
  }
}

Collision resolution: chaining vs open addressing

// Chaining: each bucket is a linked list / array
// Multiple keys can map to same index — stored in the list
// Pro: simple, handles high load factors well
// Con: linked list pointer overhead, cache unfriendly
// Used by: Java HashMap, most language default hash tables

// Open addressing: find next empty slot in array
// Linear probing: idx, idx+1, idx+2, ...
// Quadratic probing: idx, idx+1, idx+4, idx+9, ...
// Double hashing: idx, idx+hash2(key), ...
// Pro: cache friendly (all data in one array), no pointer overhead
// Con: clustering (many collisions near same index), needs low load factor
// Used by: Python dict, Java's newer hash tables, Redis

// Load factor = count / capacity
// Chaining: resize at load factor > 0.75 (Python uses 0.6667)
// Open addressing: resize at load factor > 0.5 (more collisions at high load)

// Real performance:
// O(1) average: when load factor is low, hash is good
// O(n) worst case: when everything hashes to same bucket (terrible hash fn)

Perfect hash functions — what makes a good one

// Good hash function properties:
// 1. Deterministic: same key always → same hash
// 2. Uniform distribution: spreads keys evenly across buckets
// 3. Fast: O(1) or O(key length)
// 4. Avalanche effect: small key change → completely different hash

// FNV-1a (fast, good distribution):
function fnv1a(key) {
  let hash = 2166136261; // FNV offset basis
  for (const byte of Buffer.from(String(key))) {
    hash ^= byte;
    hash = (hash * 16777619) >>> 0; // FNV prime, keep 32-bit
  }
  return hash;
}

// MurmurHash3: non-cryptographic, excellent distribution, used in Redis
// MD5/SHA256: cryptographic — use only when security matters, slow for hash tables

// Bad hash function (causes clustering):
function badHash(key) { return key.length % capacity; } // Only uses length!
// All 3-letter words → same bucket!

Hash table time complexity

  • Average case: O(1) insert, O(1) lookup, O(1) delete — with good hash + low load factor
  • ⚠️ Worst case: O(n) — if all keys collide into one bucket (theoretical, rare with good hash)
  • Space: O(n) where n = number of stored entries
  • Resize cost: O(n) amortized — resize is O(n) but rare, averages to O(1) per insert
  • ✅ Use Map over plain objects in JavaScript — correct O(1) operations + preserves insertion order
  • ❌ Never use objects as hash map keys in plain JS objects — they all become “[object Object]”

Hash tables underpin the Big O analysis — O(1) lookup is the hash table’s defining characteristic. The WeakMap guide covers a specialized hash table that integrates with JavaScript’s garbage collector. External reference: Hash table — Wikipedia.

Recommended Reading

Designing Data-Intensive Applications — Essential for every senior developer. Distributed systems, databases, and production architecture.

The Pragmatic Programmer — Timeless engineering wisdom for writing better code at any level.

Affiliate links. We earn a small commission at no extra cost to you.

Free Weekly Newsletter

🚀 Don’t Miss the Next Cheat Code

Join 1,000+ senior developers getting expert-level JS, Python, AWS, system design and AI secrets every week. Zero fluff, pure signal.

✓ No spam✓ Unsubscribe anytime✓ Expert-level only

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply