Python Generators and Yield: Lazy Evaluation for Memory-Efficient Data Processing

A Python generator is a function that produces values lazily — one at a time, on demand. This single property enables processing terabyte datasets in kilobytes of memory, building efficient data pipelines, and implementing cooperative multitasking. This guide covers generators from basic yield to the full coroutine protocol.

⚡ TL;DR: Generator function uses yield instead of return — produces one value then pauses. Generator expression is (x for x in …). Chain generators into pipelines for O(1) memory. Use send() to pass values back. Use yield from to delegate to sub-generators.

Generator functions — yield and pause

# Regular function: computes all values, returns full list
def squares_list(n):
    return [i**2 for i in range(n)]  # Creates full list in memory

# Generator function: yields one value, pauses, resumes on next()
def squares_gen(n):
    for i in range(n):
        yield i**2  # Pauses here, returns i**2 to caller
                    # Resumes here on next call to next()

import sys
print(sys.getsizeof(squares_list(10_000_000)))  # 80MB
print(sys.getsizeof(squares_gen(10_000_000)))   # 120 bytes!

Generator pipelines — compose lazy stages

# Each stage is a generator — entire pipeline is O(1) memory

def read_csv_lines(filename):
    with open(filename) as f:
        next(f)  # Skip header
        for line in f:
            yield line.strip()

def parse_csv(lines):
    for line in lines:
        parts = line.split(',')
        yield {'id': parts[0], 'name': parts[1], 'amount': float(parts[2])}

def filter_large(records, threshold=1000):
    for record in records:
        if record['amount'] > threshold:
            yield record

def format_output(records):
    for record in records:
        yield f"{record['name']}: ${record['amount']:,.2f}"

# Pipeline: each stage lazy, only pulls what it needs
lines = read_csv_lines('transactions.csv')  # 5GB file
records = parse_csv(lines)
large = filter_large(records, 10000)
output = format_output(large)

for line in output:
    print(line)  # Memory: ~1 record at a time

The send() protocol — two-way generators

# Generators can receive values too — making them coroutines
def running_average():
    total = count = 0
    avg = None
    while True:
        value = yield avg  # Yield average, receive next value
        if value is None:
            break
        total += value
        count += 1
        avg = total / count

calc = running_average()
next(calc)        # Prime the generator (advance to first yield)
calc.send(10)     # → 10.0
calc.send(20)     # → 15.0
calc.send(30)     # → 20.0

yield from — delegate to sub-generator

# yield from: delegate to another generator transparently
def flatten(nested):
    for item in nested:
        if isinstance(item, (list, tuple)):
            yield from flatten(item)  # Recursively flatten
        else:
            yield item

list(flatten([1, [2, 3, [4, 5]], 6]))  # [1, 2, 3, 4, 5, 6]

# yield from also passes send() and throw() to sub-generator
# Essential for building coroutine chains

✅ Generator function for stateful lazy sequences
✅ Generator expression (x for x in …) for simple transforms
✅ Chain generators into pipelines — each stage pulls from previous
✅ yield from to delegate to sub-generators cleanly
✅ send() for two-way communication (coroutines)
❌ Never convert generator to list unless you need all values at once
❌ Never call next() manually when for loop works

Generator pipelines are the foundation of Python itertools patterns — itertools functions are C-level generators. External reference: Python yield expressions documentation.

🚀 Don’t Miss the Next Cheat Code

Join 1,000+ senior developers getting expert JS, Python, AWS and system design secrets weekly.

✓ No spam✓ Unsubscribe anytime

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Python Generators and Yield: Lazy Evaluation for Memory-Efficient Data Processing

Generator functions — yield and pause

Generator pipelines — compose lazy stages

The send() protocol — two-way generators

yield from — delegate to sub-generator

🚀 Don’t Miss the Next Cheat Code

Like this:

Related

Discover more from CheatCoders

Comments

Leave a ReplyCancel reply

Generator functions — yield and pause

Generator pipelines — compose lazy stages

The send() protocol — two-way generators

yield from — delegate to sub-generator

🚀 Don’t Miss the Next Cheat Code

Share this:

Like this:

Related

Discover more from CheatCoders

Comments

Leave a ReplyCancel reply