A Postgres query that ran fine for months suddenly takes 30 seconds. Users are complaining. You need answers fast — but the wrong diagnostic approach adds load to an already-struggling database. These techniques find the problem without making things worse.
⚡ TL;DR: Use
pg_stat_activityfor live query inspection,pg_stat_statementsfor historical slow query data,auto_explainfor automatic EXPLAIN on slow queries, andpg_locksfor lock contention. None of these require downtime or significant additional load.
Step 1: See What’s Running Right Now
-- See all active queries sorted by duration
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start IS NOT NULL
ORDER BY duration DESC;
-- Queries running for over 30 seconds
SELECT pid, duration, query
FROM (
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
) t
WHERE duration > interval '30 seconds';
-- Kill a specific runaway query (never use terminate in production unless certain)
SELECT pg_cancel_backend(PID); -- Sends SIGINT, allows graceful cancel
-- pg_terminate_backend(PID) -- SIGTERM, only if cancel doesn't work
Step 2: Find Locks Causing Blocking
-- Find blocking queries and their victims
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.query AS blocked_query,
blocking_locks.pid AS blocking_pid,
blocking_activity.query AS blocking_query,
now() - blocked_activity.query_start AS blocked_duration
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.relation = blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
-- Common causes of blocking:
-- 1. Long-running transaction holding lock
-- 2. ALTER TABLE (takes ACCESS EXCLUSIVE lock)
-- 3. VACUUM FULL (takes ACCESS EXCLUSIVE lock)
-- 4. Missing index causing full table scan while holding row locks
Step 3: pg_stat_statements — Historical Slow Queries
-- Enable pg_stat_statements (add to postgresql.conf, reload not restart)
-- shared_preload_libraries = 'pg_stat_statements'
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slowest queries by total time
SELECT
round(total_exec_time::numeric, 2) AS total_ms,
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
rows,
round(100 * total_exec_time / sum(total_exec_time) OVER (), 2) AS pct_total,
LEFT(query, 100) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Queries with highest variance (inconsistently slow)
SELECT LEFT(query, 100), calls, mean_exec_time, stddev_exec_time
FROM pg_stat_statements
WHERE calls > 100
ORDER BY stddev_exec_time DESC
LIMIT 10;
-- High stddev = sometimes fast, sometimes slow = index bloat or lock waits
Step 4: auto_explain — Capture Plans of Slow Queries Automatically
-- Enable auto_explain (no restart required, hot-reload)
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 1000; -- Log queries over 1 second
SET auto_explain.log_analyze = true; -- Include actual row counts
SET auto_explain.log_buffers = true; -- Include buffer hit/miss stats
SET auto_explain.log_timing = true;
SET auto_explain.log_nested_statements = true;
-- These settings are session-level, add to postgresql.conf for global:
-- auto_explain.log_min_duration = 1000
-- auto_explain.log_analyze = on
-- Now any query taking > 1s gets its EXPLAIN ANALYZE logged automatically
-- Check PostgreSQL logs for entries like:
-- LOG: duration: 2341.123 ms plan:
-- Seq Scan on orders (cost=0.00..250000.00 rows=5000000 ...)
-- (actual time=0.027..2340.891 rows=4999999 ...)
Step 5: Index Bloat — The Silent Performance Killer
-- Check index bloat (pages used vs pages actually needed)
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
-- Unused indexes (never been scanned)
SELECT indexrelid::regclass AS index, relid::regclass AS table, idx_scan
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
AND indisunique IS FALSE; -- Exclude unique constraints
-- Drop unused indexes — they slow down writes for zero read benefit
-- Rebuild bloated index online (no lock!)
REINDEX INDEX CONCURRENTLY idx_orders_status; -- PostgreSQL 12+
Production Postgres Debug Cheat Sheet
- ✅
pg_stat_activity— see live queries and wait events - ✅
pg_stat_statements— historical aggregates of query performance - ✅
auto_explain— automatic EXPLAIN on slow queries in production logs - ✅ Lock detection query — find blocking + blocked query pairs
- ✅
pg_cancel_backend(pid)to cancel gracefully,pg_terminate_backendas last resort - ✅
REINDEX CONCURRENTLYto rebuild indexes without locking - ❌ Never use
VACUUM FULLon production during peak traffic - ❌ Never drop an index without checking
idx_scanfirst
This guide pairs directly with the PostgreSQL query planner optimization guide — that covers why queries are slow, this covers how to find which ones are slow in production. For the application layer, the API debugging guide shows how to isolate whether slowness is in the database or the application layer. Official reference: PostgreSQL statistics collector documentation.
Master PostgreSQL performance and debugging
→ View Course on Udemy — Hands-on video course covering every concept in this post and more.
Sponsored link. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
