Python's GIL Explained: Why Your Threads Aren't Actually Parallel (And the Fix)

The Global Interpreter Lock is the most misunderstood thing in Python. Developers hit it, blame Python for being slow, and reach for multiprocessing or Go. But the GIL isn’t a bug — it’s a deliberate trade-off, and once you understand exactly what it does and doesn’t prevent, you’ll use the right concurrency tool every time.

⚡ TL;DR: The GIL prevents two Python threads from executing Python bytecode simultaneously. It does NOT affect I/O-bound threads (they release the GIL while waiting). For CPU-bound parallelism: use multiprocessing. Python 3.13+ has experimental no-GIL mode. Here’s when each matters.

What the GIL Actually Does

The GIL is a mutex (mutual exclusion lock) that protects CPython’s internal state — particularly reference counts. Every Python object has a reference count. When the count hits zero, the object is freed. Without the GIL, two threads could simultaneously modify the same reference count, corrupting memory.

import threading
import time

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1

# Without GIL protection, this would corrupt 'counter'
# With GIL, only one thread runs Python bytecode at a time
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()
print(counter)  # Always exactly 2,000,000 — GIL protects this

# But this comes at a cost for CPU-bound work:

The Benchmark That Shows Exactly When GIL Hurts

Master Python concurrency

→ Complete Python Bootcamp (Udemy) — Full section on threading, multiprocessing, and asyncio.

Sponsored links. We may earn a commission at no extra cost to you.

import threading
import multiprocessing
import time

def cpu_bound(n):
    """Pure CPU work — no I/O"""
    return sum(i * i for i in range(n))

def io_bound(n):
    """I/O simulation — releases GIL during wait"""
    time.sleep(0.1)  # GIL released during sleep
    return n

N = 10_000_000

# Test 1: Single thread
start = time.time()
cpu_bound(N)
cpu_bound(N)
print(f"Single thread: {time.time()-start:.2f}s")
# Result: ~1.2s

# Test 2: Two threads (CPU-bound — GIL kills parallelism)
start = time.time()
t1 = threading.Thread(target=cpu_bound, args=(N,))
t2 = threading.Thread(target=cpu_bound, args=(N,))
t1.start(); t2.start(); t1.join(); t2.join()
print(f"2 threads (CPU): {time.time()-start:.2f}s")
# Result: ~1.3s  ← SLOWER than single thread! GIL overhead.

# Test 3: Two processes (CPU-bound — true parallelism)
start = time.time()
with multiprocessing.Pool(2) as pool:
    pool.map(cpu_bound, [N, N])
print(f"2 processes (CPU): {time.time()-start:.2f}s")
# Result: ~0.65s  ← 2x faster, real parallelism

# Test 4: Two threads (I/O-bound — GIL released, works fine)
start = time.time()
t1 = threading.Thread(target=io_bound, args=(N,))
t2 = threading.Thread(target=io_bound, args=(N,))
t1.start(); t2.start(); t1.join(); t2.join()
print(f"2 threads (I/O): {time.time()-start:.2f}s")
# Result: ~0.1s  ← Both sleep concurrently, GIL released

Fix 1: multiprocessing — True CPU Parallelism

from multiprocessing import Pool, cpu_count
from concurrent.futures import ProcessPoolExecutor

data = list(range(1_000_000))

# ProcessPoolExecutor — cleaner API than Pool
def process_chunk(chunk):
    return [x ** 2 for x in chunk]

# Split data into chunks, process in parallel
chunk_size = len(data) // cpu_count()
chunks = [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]

with ProcessPoolExecutor(max_workers=cpu_count()) as executor:
    results = list(executor.map(process_chunk, chunks))

flat = [item for sublist in results for item in sublist]

# When to use ProcessPoolExecutor:
# ✅ CPU-bound: data processing, ML inference, image manipulation
# ❌ I/O-bound: use ThreadPoolExecutor or asyncio instead
# ⚠️  Data must be picklable (lambdas and local functions can't be pickled)

Fix 2: asyncio — I/O Concurrency Without Threads

import asyncio
import aiohttp  # pip install aiohttp

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.json()

async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        # All requests fire concurrently — GIL irrelevant (all I/O)
        tasks = [fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks)

urls = [f"https://api.example.com/item/{i}" for i in range(100)]

# asyncio vs threading for I/O:
# Both work. asyncio is more efficient (no thread overhead)
# Threading is simpler for existing synchronous code
# asyncio requires async/await throughout the call chain

Fix 3: Python 3.13 Free-Threaded Mode (No GIL)

# Python 3.13 experimental: run Python without the GIL
# Install free-threaded Python 3.13:
# pyenv install 3.13t  (t = free-threaded build)

# Check if GIL is active:
import sys
print(sys._is_gil_enabled())  # False in free-threaded build

# With free-threaded Python, CPU-bound threads now run in parallel:
# threading is now as fast as multiprocessing for CPU work
# BUT: many C extensions assume GIL exists — may crash or corrupt data
# Production use: not yet (2025). Watch Python 3.14 for stability.

# To disable GIL at runtime (Python 3.13+, unstable):
# PYTHON_GIL=0 python script.py

Decision Tree: Which Concurrency Tool to Use

🔵 I/O-bound + many connections → asyncio + aiohttp/aiofiles
🟢 I/O-bound + existing sync code → ThreadPoolExecutor
🔴 CPU-bound → ProcessPoolExecutor or multiprocessing.Pool
🟡 CPU-bound + shared memory → multiprocessing.shared_memory or numpy with C extensions
⚪ Simple background task → threading.Thread (fine for I/O)

The memory efficiency concepts from Python __slots__ matter even more when using multiprocessing — each process copies the parent’s memory space, so smaller objects mean faster fork and lower total memory. For deploying Python concurrency on serverless, see the AWS Lambda cold start guide — Lambda’s concurrency model is architecturally similar to multiprocessing.

Recommended resources

Fluent Python (2nd Edition) — Chapter 19 covers concurrency models including threading, multiprocessing, and asyncio with the clearest GIL explanation in print.
Python Tricks — The concurrency tricks chapter covers practical patterns for working around the GIL without reaching for multiprocessing unnecessarily.

Disclosure: This post contains affiliate links. If you purchase through these links, CheatCoders earns a small commission at no extra cost to you. We only recommend tools and books we genuinely find valuable.

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Python’s GIL Explained: Why Your Threads Aren’t Actually Parallel (And the Fix)

What the GIL Actually Does

The Benchmark That Shows Exactly When GIL Hurts

Fix 1: multiprocessing — True CPU Parallelism

Fix 2: asyncio — I/O Concurrency Without Threads

Fix 3: Python 3.13 Free-Threaded Mode (No GIL)

Decision Tree: Which Concurrency Tool to Use

Recommended resources

Like this:

Related

Discover more from CheatCoders

2 Comments

What the GIL Actually Does

The Benchmark That Shows Exactly When GIL Hurts

Fix 1: multiprocessing — True CPU Parallelism

Fix 2: asyncio — I/O Concurrency Without Threads

Fix 3: Python 3.13 Free-Threaded Mode (No GIL)

Decision Tree: Which Concurrency Tool to Use

Recommended resources

🚀 Don’t Miss the Next Cheat Code

Share this:

Like this:

Related

Discover more from CheatCoders

2 Comments