Python List Comprehension, Generators, and Itertools: The Complete Performance Guide

Python List Comprehension, Generators, and Itertools: The Complete Performance Guide

Python developers learn list comprehensions early and stop there — missing two more powerful layers. Generator expressions give you the same syntax with O(1) memory. The itertools module gives you C-level performance for the most common iteration patterns. Knowing when to use each is the difference between Python code that works and Python code that scales.

TL;DR: List comprehension when you need all results at once and size is manageable. Generator expression when data is large or you only need to iterate once. itertools when you need combinations, permutations, grouping, chaining, or sliding windows — always in O(1) memory.

List comprehensions — fast, readable, eager

# Basic: [expression for item in iterable if condition]
squares = [x**2 for x in range(1000) if x % 2 == 0]

# Nested: equivalent to nested for loops
matrix = [[i*j for j in range(5)] for i in range(5)]

# Dict comprehension
word_lengths = {word: len(word) for word in ['hello', 'world']}

# Set comprehension — automatic deduplication
unique_lengths = {len(word) for word in words}

# Benchmark vs for loop (1M elements):
# for loop: 180ms
# list comprehension: 90ms (2x faster — runs in C)

# When to use: need list, size < a few million, transform + filter

Generator expressions — lazy, O(1) memory

# Same syntax as list comp but with () instead of []
# Does NOT compute all values immediately

# Memory comparison on 10M elements:
import sys
list_comp = [x**2 for x in range(10_000_000)]  # ~400MB
gen_expr = (x**2 for x in range(10_000_000))   # ~120 bytes!

print(sys.getsizeof(list_comp))  # 89,095,160 bytes
print(sys.getsizeof(gen_expr))   # 120 bytes

# Generators are consumed once
gen = (x**2 for x in range(5))
list(gen)  # [0, 1, 4, 9, 16]
list(gen)  # [] — exhausted!

# Pipeline: chain generators for O(1) memory throughout
def read_logs(filename):
    with open(filename) as f:
        for line in f: yield line

errors = (line for line in read_logs('app.log') if 'ERROR' in line)
severe = (line for line in errors if 'CRITICAL' in line)
# Process 100GB log file in O(1) memory

itertools — the missing Python standard library

import itertools

# chain: flatten iterables without creating intermediate list
result = list(itertools.chain([1,2], [3,4], [5,6]))
# [1, 2, 3, 4, 5, 6]

# chain.from_iterable: flatten nested iterable
nested = [[1,2],[3,4],[5,6]]
result = list(itertools.chain.from_iterable(nested))  # [1,2,3,4,5,6]

# islice: take N items from any iterable (like Python slice but lazy)
first_100 = list(itertools.islice(huge_generator(), 100))

# groupby: group consecutive items by key (MUST be sorted first)
data = [('a',1),('a',2),('b',3),('b',4)]
for key, group in itertools.groupby(data, key=lambda x: x[0]):
    print(key, list(group))  # a [(a,1),(a,2)], b [(b,3),(b,4)]

# product: cartesian product (nested loops without nesting)
for x, y, z in itertools.product(range(3), repeat=3):
    print(x, y, z)  # 27 combinations

# combinations and permutations
list(itertools.combinations('ABC', 2))  # [(A,B),(A,C),(B,C)]
list(itertools.permutations('ABC', 2))  # [(A,B),(A,C),(B,A),(B,C),(C,A),(C,B)]

# pairwise: sliding window of 2 (Python 3.10+)
list(itertools.pairwise([1,2,3,4]))  # [(1,2),(2,3),(3,4)]

# accumulate: running aggregation
list(itertools.accumulate([1,2,3,4], lambda a,b: a+b))  # [1,3,6,10] running sum

Performance benchmark: list vs generator vs itertools

import timeit, itertools

data = list(range(1_000_000))

# Sum of squares of even numbers

# List comprehension (materializes full list)
def with_list():
    return sum([x**2 for x in data if x%2==0])

# Generator (no intermediate list)
def with_gen():
    return sum(x**2 for x in data if x%2==0)

# itertools.compress (most explicit, C-level)
def with_itertools():
    evens = itertools.compress(data, (x%2==0 for x in data))
    return sum(x**2 for x in evens)

# Results:
# list comprehension: 280ms (400MB memory peak)
# generator: 320ms (O(1) memory) — slightly slower, huge memory win
# itertools: 380ms (O(1) memory) — most Pythonic for complex pipelines

# Rule: generator usually wins on memory, similar speed for single-pass

Decision guide

  • List comprehension: need random access, multiple passes, small-medium data, return value
  • Generator expression: large data, single pass, memory-constrained, pipeline
  • itertools: combinatorics, grouping, chaining, windowing — always C-level performance
  • Generator function (yield): complex stateful iteration logic that's hard to express as expression
  • ❌ Never use list comprehension to build a list just to immediately iterate over it once
  • ❌ Never nest list comprehensions more than 2 levels — use explicit for loops

These patterns build directly on Python performance optimization — generator pipelines are one of the biggest wins for memory-constrained systems. For the stateful generator pattern, see the JavaScript generators guide — the concepts are identical across languages. External reference: Python itertools documentation.

Recommended Reading

Designing Data-Intensive Applications — The essential book every senior developer needs. Covers distributed systems, databases, and production architecture.

The Pragmatic Programmer — Timeless engineering wisdom for writing better, more maintainable code at any level.

Affiliate links. We earn a small commission at no extra cost to you.

Free Weekly Newsletter

🚀 Don’t Miss the Next Cheat Code

Join 1,000+ senior developers getting expert-level JS, Python, AWS, system design and AI secrets every week. Zero fluff, pure signal.

✓ No spam✓ Unsubscribe anytime✓ Expert-level only

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply