Node Streams are Awesome

I’ve been using Node JS off and on for the past few years, ever since we used it in webOS, but I’ve really gotten to go deep recently. As part of my learning I’ve finally started digging into Streams, perhaps one of the coolest unknown features of Node.

If you don’t already know, Node JS is a server side framework built on JavaScript running in the V8 engine, the JS engine from Chrome, combined with libuv, a fast IO framework in C++. Node JS is single threaded, but this doesn’t cause a problem must most server side tasks are IO bound, or at least the ones people use Node for (you can bind to C++ code if you really need to).

Node does it’s magic by making almost all IO function calls asynchronous. When you call an IO function like readFile() you must also give it a callback function, or else attach some event handlers. The native side then performs the IO work, calling your code back when it’s done.

This callback system works reasonably well, but for complex IO operations you may end up with hard to under stand deeply nested code; known in the Node world as ‘callback hell’. There are some 3rd party utilities that can help, such as the ‘async’ module, but for pure IO another option is streams.

A stream is just what it sounds like from any other language. An array of data that you operate on as data arrives or is requested. Here’s a quick example. To copy a file you could do this:

var fs = require('fs');
fs.readFile('a.txt',function(err, data) {
     fs.writeFile('b.txt',data);
});

That will work okay, but all of the data has to be loaded into memory. For a large file you’ll be wasting massive amounts of memory and increase latency if you were trying to send that file on to a client. Instead you could do it with events:

var fs = require('fs');
var infile = fs.createReadStream('a.jpg');
var outfile = fs.createWriteStream('b.jpg');
infile.on('data',function(data) {
     outfile.write(data);
});
infile.on('close', function() {
     outfile.close();
});

Now we are processing the data in chunks, but that’s still a lot of boilerplate code to write. Streams can do this for you with the pipe function:

fs.createReadStream('a.jpg').pipe(fs.createWriteStream('b.jpg'));

All of the work will be done asynchronously and we have no extra variables floating around. Even better, the pipe function is smart enough to buffer properly. If the read or write stream is slow (network latency perhaps), then it will only read as much as needed at the time. You can pretty much just set it and forget it.

There’s one really cool thing about streams. Well, actually two. First, more and more Node APIs are starting to support streams. You can stream to or from a socket, or from an HTTP GET request to a POST on another server. You can add transform streams for compression or encryption. There’s even utility libraries which can perform regex transformations to your streams of data. It’s really quite handy.

The second cool thing, is that you can still use events with piped streams. Let’s get into some more useful examples:

I want to download a file from a web server. I can do it like this.

var fs = require('fs');
var http = require('http');

var req = http.get('http://foo.com/bigfile.tar.gz');
req.on('connect', function(res) {
     res.pipe(fs.createWriteStream('bigfile.tar.gz'));
});

That will stream the get request right into a file on disk.

Now suppose we want to uncompress the file as well. Easy peasy:

var req = http.get('http://foo.com/bigfile.tar.gz');
req.on('response', function(res) {
    res
        .pipe(zlib.createGunzip())
        .pipe(tar.Extract({path:'/tmp', strip: 1}))
});

Note that zlib is a built-in nodejs module, but tar is an open source one you’ll need to get with npm.

Now suppose you want to print the progress while it happens. We can get the file size from the http header, then add a listener for data events.

var req = http.get('http://foo.com/bigfile.tar.gz');
req.on('response', function(res) {
    var total = res.headers['content-length']; //total byte length
    var count = 0;
    res.on('data', function(data) {
        count += data.length;
        console.log(count/total*100);
        })
        .pipe(zlib.createGunzip())
        .pipe(tar.Extract({path:'/tmp', strip: 1}))
        .on('close',function() {
            console.log('finished downloading');
        });
});

Streams and pipes are really awesome. For more details and other libraries that can do cool things with Streams, check out this Streams Handbook.

Talk to me about it on Twitter

Posted June 25th, 2014

Tagged: node