Node.js Streams: Process Gigabytes of Data Without Running Out of Memory

Node.js Streams: Process Gigabytes of Data Without Running Out of Memory

Loading a 10GB file into memory crashes your Node.js process. Streaming it processes it in kilobytes of RAM. Streams are the solution for large data — files, network responses, database cursors, real-time events. This guide covers every stream type and the patterns that make production data pipelines reliable.

TL;DR: Use stream.pipeline() not .pipe(). Use for await...of as the cleanest API. Handle backpressure or memory explodes. Readable produces, Writable consumes, Transform does both.

The problem streams solve

const fs = require('fs');

// WRONG: 10GB into RAM
const data = fs.readFileSync('huge.csv'); // OutOfMemory!

// RIGHT: constant 64KB memory regardless of file size
const stream = fs.createReadStream('huge.csv', { highWaterMark: 64*1024 });
for await (const chunk of stream) processChunk(chunk);

pipeline() — compose streams correctly

const { pipeline } = require('stream/promises');
const fs = require('fs');
const zlib = require('zlib');
const { Transform } = require('stream');

const csvToJson = new Transform({
  objectMode: true,
  transform(chunk, enc, cb) {
    chunk.toString().split('\n').filter(Boolean).forEach(line => {
      const [id, name, email] = line.split(',');
      this.push(JSON.stringify({ id, name, email }) + '\n');
    });
    cb();
  }
});

// pipeline: handles errors, cleanup, backpressure automatically
await pipeline(
  fs.createReadStream('users.csv'),
  csvToJson,
  zlib.createGzip(),
  fs.createWriteStream('users.json.gz')
);
// Entire 10GB processed with ~200KB peak memory

Async iteration — cleanest stream API

const readline = require('readline');
const fs = require('fs');

async function processFile(path) {
  const rl = readline.createInterface({
    input: fs.createReadStream(path),
    crlfDelay: Infinity
  });
  let count = 0;
  for await (const line of rl) {
    await processLine(line); // async per line, backpressure automatic
    count++;
  }
  return count;
}

Backpressure — prevent memory explosions

// WRONG: ignore backpressure
readable.on('data', chunk => {
  writable.write(chunk); // write() may return false = buffer full!
});

// RIGHT: pause/resume
readable.on('data', chunk => {
  if (!writable.write(chunk)) {
    readable.pause();
    writable.once('drain', () => readable.resume());
  }
});

// BEST: pipeline handles all of this automatically
await pipeline(readable, transform, writable);

Streams cheat sheet

  • ✅ Always use pipeline() or stream/promises — not .pipe()
  • ✅ for await…of for readable streams — cleanest API
  • ✅ objectMode: true for object streams (JSON records)
  • ✅ highWaterMark to tune buffer size (default 16KB)
  • ❌ Never buffer entire stream content — defeats the purpose
  • ❌ Never ignore write() return value — false = buffer full

Node.js streams build on the event loop — I/O streams use non-blocking reads under the hood. External reference: Node.js Streams documentation.

Recommended Reading

Designing Data-Intensive Applications — The essential book every senior developer needs.

The Pragmatic Programmer — Timeless engineering wisdom for writing better code.

Affiliate links. We earn a small commission at no extra cost to you.

Free Weekly Newsletter

🚀 Don’t Miss the Next Cheat Code

Join 1,000+ senior developers getting expert JS, Python, AWS and system design secrets weekly.

✓ No spam✓ Unsubscribe anytime

Discover more from CheatCoders

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply