A Python generator is a function that produces values lazily — one at a time, on demand. This single property enables processing terabyte datasets in kilobytes of memory, building efficient data pipelines, and implementing cooperative multitasking. This guide covers generators from basic yield to the full coroutine protocol.
⚡ TL;DR: Generator function uses yield instead of return — produces one value then pauses. Generator expression is (x for x in …). Chain generators into pipelines for O(1) memory. Use send() to pass values back. Use yield from to delegate to sub-generators.
Generator functions — yield and pause
# Regular function: computes all values, returns full list
def squares_list(n):
return [i**2 for i in range(n)] # Creates full list in memory
# Generator function: yields one value, pauses, resumes on next()
def squares_gen(n):
for i in range(n):
yield i**2 # Pauses here, returns i**2 to caller
# Resumes here on next call to next()
import sys
print(sys.getsizeof(squares_list(10_000_000))) # 80MB
print(sys.getsizeof(squares_gen(10_000_000))) # 120 bytes!
Generator pipelines — compose lazy stages
# Each stage is a generator — entire pipeline is O(1) memory
def read_csv_lines(filename):
with open(filename) as f:
next(f) # Skip header
for line in f:
yield line.strip()
def parse_csv(lines):
for line in lines:
parts = line.split(',')
yield {'id': parts[0], 'name': parts[1], 'amount': float(parts[2])}
def filter_large(records, threshold=1000):
for record in records:
if record['amount'] > threshold:
yield record
def format_output(records):
for record in records:
yield f"{record['name']}: ${record['amount']:,.2f}"
# Pipeline: each stage lazy, only pulls what it needs
lines = read_csv_lines('transactions.csv') # 5GB file
records = parse_csv(lines)
large = filter_large(records, 10000)
output = format_output(large)
for line in output:
print(line) # Memory: ~1 record at a time
The send() protocol — two-way generators
# Generators can receive values too — making them coroutines
def running_average():
total = count = 0
avg = None
while True:
value = yield avg # Yield average, receive next value
if value is None:
break
total += value
count += 1
avg = total / count
calc = running_average()
next(calc) # Prime the generator (advance to first yield)
calc.send(10) # → 10.0
calc.send(20) # → 15.0
calc.send(30) # → 20.0
yield from — delegate to sub-generator
# yield from: delegate to another generator transparently
def flatten(nested):
for item in nested:
if isinstance(item, (list, tuple)):
yield from flatten(item) # Recursively flatten
else:
yield item
list(flatten([1, [2, 3, [4, 5]], 6])) # [1, 2, 3, 4, 5, 6]
# yield from also passes send() and throw() to sub-generator
# Essential for building coroutine chains
- ✅ Generator function for stateful lazy sequences
- ✅ Generator expression (x for x in …) for simple transforms
- ✅ Chain generators into pipelines — each stage pulls from previous
- ✅ yield from to delegate to sub-generators cleanly
- ✅ send() for two-way communication (coroutines)
- ❌ Never convert generator to list unless you need all values at once
- ❌ Never call next() manually when for loop works
Generator pipelines are the foundation of Python itertools patterns — itertools functions are C-level generators. External reference: Python yield expressions documentation.
Recommended Reading
→ Designing Data-Intensive Applications — The essential book every senior developer needs.
→ The Pragmatic Programmer — Timeless engineering wisdom for writing better code.
Affiliate links. We earn a small commission at no extra cost to you.
Free Weekly Newsletter
🚀 Don’t Miss the Next Cheat Code
Join 1,000+ senior developers getting expert JS, Python, AWS and system design secrets weekly.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
