Loading a 10GB file into memory crashes your Node.js process. Streaming it processes it in kilobytes of RAM. Streams are the solution for large data — files, network responses, database cursors, real-time events. This guide covers every stream type and the patterns that make production data pipelines reliable.
⚡ TL;DR: Use
stream.pipeline()not.pipe(). Usefor await...ofas the cleanest API. Handle backpressure or memory explodes. Readable produces, Writable consumes, Transform does both.
The problem streams solve
const fs = require('fs');
// WRONG: 10GB into RAM
const data = fs.readFileSync('huge.csv'); // OutOfMemory!
// RIGHT: constant 64KB memory regardless of file size
const stream = fs.createReadStream('huge.csv', { highWaterMark: 64*1024 });
for await (const chunk of stream) processChunk(chunk);
pipeline() — compose streams correctly
const { pipeline } = require('stream/promises');
const fs = require('fs');
const zlib = require('zlib');
const { Transform } = require('stream');
const csvToJson = new Transform({
objectMode: true,
transform(chunk, enc, cb) {
chunk.toString().split('\n').filter(Boolean).forEach(line => {
const [id, name, email] = line.split(',');
this.push(JSON.stringify({ id, name, email }) + '\n');
});
cb();
}
});
// pipeline: handles errors, cleanup, backpressure automatically
await pipeline(
fs.createReadStream('users.csv'),
csvToJson,
zlib.createGzip(),
fs.createWriteStream('users.json.gz')
);
// Entire 10GB processed with ~200KB peak memory
Async iteration — cleanest stream API
const readline = require('readline');
const fs = require('fs');
async function processFile(path) {
const rl = readline.createInterface({
input: fs.createReadStream(path),
crlfDelay: Infinity
});
let count = 0;
for await (const line of rl) {
await processLine(line); // async per line, backpressure automatic
count++;
}
return count;
}
Backpressure — prevent memory explosions
// WRONG: ignore backpressure
readable.on('data', chunk => {
writable.write(chunk); // write() may return false = buffer full!
});
// RIGHT: pause/resume
readable.on('data', chunk => {
if (!writable.write(chunk)) {
readable.pause();
writable.once('drain', () => readable.resume());
}
});
// BEST: pipeline handles all of this automatically
await pipeline(readable, transform, writable);
Streams cheat sheet
- ✅ Always use pipeline() or stream/promises — not .pipe()
- ✅ for await…of for readable streams — cleanest API
- ✅ objectMode: true for object streams (JSON records)
- ✅ highWaterMark to tune buffer size (default 16KB)
- ❌ Never buffer entire stream content — defeats the purpose
- ❌ Never ignore write() return value — false = buffer full
Node.js streams build on the event loop — I/O streams use non-blocking reads under the hood. External reference: Node.js Streams documentation.
Recommended Reading
→ Designing Data-Intensive Applications — The essential book every senior developer needs.
→ The Pragmatic Programmer — Timeless engineering wisdom for writing better code.
Affiliate links. We earn a small commission at no extra cost to you.
Free Weekly Newsletter
🚀 Don’t Miss the Next Cheat Code
Join 1,000+ senior developers getting expert JS, Python, AWS and system design secrets weekly.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
