PostgreSQL Query Planner Secrets: Make Your JOINs 10x Faster Without Indexes

PostgreSQL Query Planner Secrets: Make Your JOINs 10x Faster Without Indexes

Most developers treat slow PostgreSQL queries the same way: add an index, check EXPLAIN, move on. But the query planner is making dozens of decisions your index doesn’t control — join order, scan strategy, parallelism, statistics staleness. Understanding these decisions is the difference between a DBA and a developer who occasionally makes queries faster.

TL;DR: Run EXPLAIN (ANALYZE, BUFFERS) — not just EXPLAIN. Look for Seq Scans on large tables, nested loop joins with high row estimates, and stale statistics. Fix with ANALYZE, partial indexes, and enable_* planner settings. Most “index problems” are actually statistics problems.

EXPLAIN ANALYZE vs EXPLAIN — Always Use Both

-- ❌ EXPLAIN alone — estimates only, no actual execution
EXPLAIN SELECT * FROM orders o JOIN users u ON o.user_id = u.id WHERE o.status = 'pending';

-- ✅ EXPLAIN ANALYZE — runs the query, shows actual vs estimated rows
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders o JOIN users u ON o.user_id = u.id WHERE o.status = 'pending';

-- Key fields to read:
-- "actual rows=X loops=Y"  → what actually happened
-- "rows=X"                 → what planner estimated
-- If estimate >> actual:   → statistics are stale (run ANALYZE)
-- "Buffers: shared hit=X read=Y" → cache hits vs disk reads
-- "Seq Scan on orders"     → full table scan (usually bad for large tables)
-- "Planning Time: X ms"    → time to build the plan
-- "Execution Time: X ms"   → actual execution time

-- Example bad output to watch for:
-- Nested Loop  (cost=0.00..45000.00 rows=1 width=64)
--   ->  Seq Scan on orders  (cost=0.00..25000.00 rows=500000 width=32)
--         Filter: (status = 'pending')
--         Rows Removed by Filter: 490000
-- This says: scanned all 500K rows to find 10K pending orders
-- Fix: partial index on (status) WHERE status = 'pending'

Statistics: The Hidden Reason Queries Go Wrong

PostgreSQL and database resources

PostgreSQL from Zero to Hero (Udemy) — Full course on query optimization, EXPLAIN, indexes, and VACUUM.

Sponsored links. We may earn a commission at no extra cost to you.

-- Check when statistics were last updated
SELECT schemaname, tablename, last_analyze, last_autoanalyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;

-- If last_analyze is NULL or weeks old → planner is guessing
-- Force statistics update:
ANALYZE orders;           -- One table
ANALYZE;                  -- Entire database

-- Check planner's row estimates vs reality
SELECT tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';
-- n_distinct: negative = fraction of rows are distinct (-0.001 = 0.1%)
-- correlation: 1.0 = perfectly sorted, 0 = random (affects index scan cost)

-- Increase statistics target for columns with high cardinality
ALTER TABLE orders ALTER COLUMN user_id SET STATISTICS 500;
-- Default is 100 (samples ~30K rows). Increase for better estimates on large tables.
ANALYZE orders;

Partial Indexes — The Most Underused PostgreSQL Feature

-- Instead of indexing the entire column, index only the rows you query

-- ❌ Full index — indexes all 10 million orders including 9.9M completed ones
CREATE INDEX idx_orders_status ON orders(status);

-- ✅ Partial index — only indexes the 100K pending orders you actually query
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- 99x smaller, 99x faster to maintain, query must include WHERE status='pending'

-- Partial unique index — enforce uniqueness only under conditions
CREATE UNIQUE INDEX idx_users_active_email ON users(email)
WHERE deleted_at IS NULL;
-- Allows duplicate emails for soft-deleted users, unique only for active

-- Verify planner uses your partial index:
EXPLAIN SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 10;
-- Should show: Index Scan using idx_orders_pending

JOIN Strategies — Help the Planner Choose Correctly

-- PostgreSQL has 3 join strategies:
-- 1. Nested Loop: fast for small inner sets, terrible for large
-- 2. Hash Join: builds hash table of smaller set, scans larger — great for big tables
-- 3. Merge Join: requires both inputs sorted — efficient when indexes provide sort

-- When planner picks wrong strategy, override temporarily:
SET enable_nestloop = OFF;   -- Force hash/merge join
SET enable_hashjoin = OFF;   -- Force merge/nested
SET enable_seqscan = OFF;    -- Force index scans (careful!)

-- Or use hints via pg_hint_plan extension:
/*+ HashJoin(o u) */
SELECT * FROM orders o JOIN users u ON o.user_id = u.id;

-- Check JOIN row estimates accuracy:
EXPLAIN (ANALYZE) SELECT * FROM orders o JOIN users u ON o.user_id = u.id;
-- "actual rows" ≠ "rows" by more than 10x → statistics problem
-- Run ANALYZE on both tables, then EXPLAIN again

-- For complex multi-table JOINs, set join_collapse_limit
SET join_collapse_limit = 1;  -- Force planner to use your written JOIN order
-- Useful when you know the optimal order better than the planner

The VACUUM Problem — Dead Tuples Kill Performance

-- PostgreSQL doesn't delete rows immediately — it marks them as dead
-- Dead tuples (bloat) cause index scans to read more pages than needed

-- Check table bloat:
SELECT tablename,
  pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS total_size,
  pg_size_pretty(pg_relation_size(tablename::regclass)) AS table_size,
  n_dead_tup,
  n_live_tup,
  round(100 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

-- If dead_pct > 10-20% → run VACUUM
VACUUM ANALYZE orders;      -- Reclaim dead tuple space, update stats
VACUUM FULL orders;         -- Fully rewrite table (locks table — use carefully)

-- Check if autovacuum is keeping up:
SELECT schemaname, tablename, last_vacuum, last_autovacuum,
       vacuum_count, autovacuum_count
FROM pg_stat_user_tables
WHERE tablename = 'orders';

Query Optimization Cheat Sheet

  • ✅ Always use EXPLAIN (ANALYZE, BUFFERS) — never plain EXPLAIN
  • ✅ Check estimate vs actual rows — big gap = stale statistics
  • ✅ Run ANALYZE table_name after bulk inserts/deletes
  • ✅ Use partial indexes for commonly filtered subsets
  • ✅ Check n_dead_tup — high value means run VACUUM
  • ✅ Increase STATISTICS target for high-cardinality columns
  • ❌ Don’t blindly add indexes — wrong index can make things slower
  • ❌ Don’t use VACUUM FULL on production during peak traffic

PostgreSQL query optimization connects directly to application-level patterns: if you’re running on AWS Lambda, connection pool sizing dramatically affects query planner behaviour under concurrent load. For Node.js applications hitting Postgres, the event loop blocking guide explains why synchronous query execution blocks your entire server. Official reference: PostgreSQL Performance Tips documentation.

Recommended resources

  • PostgreSQL: Up and Running (3rd Edition) — Regina Obe and Leo Hsu cover EXPLAIN output interpretation, index types, and query tuning with the same practical depth as this post, but across the full PostgreSQL feature set.
  • Designing Data-Intensive Applications — The storage and retrieval chapter explains why B-tree indexes exist, how the query planner uses statistics, and what “cardinality” means at a deep level.

Disclosure: This post contains affiliate links. If you purchase through these links, CheatCoders earns a small commission at no extra cost to you. We only recommend tools and books we genuinely find valuable.


Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

1 Comment

Leave a Reply