Deep Streams

Usually when I think of Node, I think about web servers.  That’s mostly what I use it for when writing code- setting up a simple test server for a proof of concept, or bringing in Express and its ecosystem for more production-ready projects.

But in reality, I use Node for a whole lot more.  Of course, just about anything NPM related is a use of Node- but it also powers all the awesome developer tools that we don’t even really need to think about much anymore.  When a minifier runs over your Javascript code before you push to production- Node is probably doing that magic.  Same for a bundler.  It’s behind the scenes on quite a bit of frontend workflow these days.  I use it for such all the time, but I hadn’t had much chance to really write any of those dev tools until recently.

The task was pretty simple- I had a js file with a bunch of arrays and objects someone else had entered.  The formatting had been mangled somewhere along the way- there were long strings of spaces everywhere- I wanted to strip them out, but leave any single spaces (to preserve multi word strings).  Now I know: any halfway good code editor will have the search/replace feature to handle this, but I could see this being a nice little utility to write in Node.  That way, I could run it over an entire directory if necessary (probably won’t ever be necessary, but I really wanted to do this short little side project).

My first iteration was a super simple script using the fs module.  First, a nice path splitter utility in a separate file.  This takes a string, splits it out by path and extension, and inserts ‘-‘ and whatever new string you want.  This prevents overwriting the original file (though this part would be unnecessary if you do want to overwrite- just pass the same file name to the write function):

'use strict';
module.exports = function(filePath, additionalName) {
const filePathArray = filePath.split('.');
const lastItemInPath = filePathArray[filePathArray.length 2];
filePathArray[filePathArray.length 2] = `${lastItemInPath}${additionalName}`;
return filePathArray.join('.');
};

view raw
path-splitter.js
hosted with ❤ by GitHub

Then we can use that in our script to strip multi-spaces and return a new file:

'use strict';
const fs = require('fs');
const pathSplitter = require('./path-splitter');
const filePath = process.argv[2];
fs.readFile(filePath, (err, data) => {
const stripped = data.toString().replace(/\s+/g, ' ');
fs.writeFile(pathSplitter(filePath, 'stripped'), stripped);
});

view raw
strip-fs.js
hosted with ❤ by GitHub

All very cool. But I really like the notion of streams in Node. What if the file I need to manipulate is really large? With my current setup, it might take a lot of memory to read the entire file, then write it out. But that’s what streams are for! So I rewrote the script with a custom transform stream. It wasn’t all that difficult- as soon as I realized that the required method on your custom stream (that extends the Transform class) has to be named _transform. If you leave out the underscore, it will not work (recognizes Transform as undefined).

Again, in a separate file (small modules for the win!), I defined my custom stream:

'use strict';
const { Transform } = require('stream');
class MultiSpaceStrip extends Transform {
constructor(options) {
super(options);
}
_transform(chunk, encoding, callback) {
const text = chunk.toString();
this.push(text.replace(/\s+/g, ''));
callback();
}
}
module.exports = new MultiSpaceStrip();

view raw
multi-space.strip.js
hosted with ❤ by GitHub

Then it was just a matter of importing that and the path splitting utility created for the original fs version (code reuse for the win!) and running a couple Node-included streams (createReadStream and createWriteStream) that can make a stream from a file automatically:

'use strict';
const fs = require('fs');
const spaceStripStream = require('./multi-space-strip');
const pathSplitter = require('./path-splitter');
const filePath = process.argv[2];
fs.createReadStream(filePath)
.pipe(spaceStripStream)
.pipe(fs.createWriteStream(pathSplitter(filePath, 'stripped')));

view raw
strip-stream.js
hosted with ❤ by GitHub

Both methods (fs and stream) are simple and concise. Both have their place. Creating a custom transform stream was probably unnecessary for this task, but would be very useful for very large files. Either way, it was a fun quick dive into some other corners of Node!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s