Pulling the strings

So, back to the puppet show.

Actually, I guess this is going to be a bit more of a tangent, but it did come up when working on the new Puppeteer project (to crawl our Angular application and store server rendered pages in a local cache), so it still counts…

We wanted to have both a command line option and a graphic UI option to run the crawl service.  The UI would need a backend that keeps a websocket open and broadcast out updates when a new url was crawled (to show success or failure, and give stats at the end).  Socket.io works great for this- just install it on your Node project, and you can socket.emit messages and data to your frontend, which can listen with socket.on (use the same name for both emit and on to coordinate the two).

However, in the command line option, there would be no socket.  With this configuration, a user would just run the command and the messages/data should print to the console.  So we have a shared program that needs two different interfaces.  I had already created a “crawl-runner.js” file with my main “run” function.  It would handle the browser init, page creation, and navigation in headless Chrome (using Puppeteer).  It also handles storing and responding with the results.  It was set up to use a simple native Node EventEmitter- which worked fine for interfacing with websockets.  In fact, we could probably just cut out the middleman and eliminate the EventEmitter- just socket.emit directly from the crawler.

But either way, we will have to switch to console.log when using the command line option.  How to reuse the logic from crawl-runner.js in the command line version?  We can pass the emitter as an optional argument to “run” and if it’s not there, alias that name to console.log:

When the program is run in interactive, UI mode (via a dashboard on our Angular app), crawlEmitter is passed to run, and the socket interface works.  When it’s run as a command line application, we still call “crawlEmitter.emit” with the message and data we want to send, but the check at the top of the function will call “console.log” whenever “crawlEmitter.emit” is called (because there is no crawlEmitter in this case).

Another option would to be simply passing the function we want to use as a broadcaster into run.  So, pass crawlEmitter.emit as the 2nd argument for the dashboard version, or console.log for the command line version.  That might be a better, more readable solution, so I’m thinking about switching (haven’t tested this yet- but I don’t see any reason it shouldn’t work).

One of the most fun things about programming is how many roads you can take to one final product.  The trick is finding the balance between most efficient and most understandable – and always being open to finding a new route!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s