Pulling the strings

So, back to the puppet show.

Actually, I guess this is going to be a bit more of a tangent, but it did come up when working on the new Puppeteer project (to crawl our Angular application and store server rendered pages in a local cache), so it still counts…

We wanted to have both a command line option and a graphic UI option to run the crawl service.  The UI would need a backend that keeps a websocket open and broadcast out updates when a new url was crawled (to show success or failure, and give stats at the end).  Socket.io works great for this- just install it on your Node project, and you can socket.emit messages and data to your frontend, which can listen with socket.on (use the same name for both emit and on to coordinate the two).

However, in the command line option, there would be no socket.  With this configuration, a user would just run the command and the messages/data should print to the console.  So we have a shared program that needs two different interfaces.  I had already created a “crawl-runner.js” file with my main “run” function.  It would handle the browser init, page creation, and navigation in headless Chrome (using Puppeteer).  It also handles storing and responding with the results.  It was set up to use a simple native Node EventEmitter- which worked fine for interfacing with websockets.  In fact, we could probably just cut out the middleman and eliminate the EventEmitter- just socket.emit directly from the crawler.

But either way, we will have to switch to console.log when using the command line option.  How to reuse the logic from crawl-runner.js in the command line version?  We can pass the emitter as an optional argument to “run” and if it’s not there, alias that name to console.log:

When the program is run in interactive, UI mode (via a dashboard on our Angular app), crawlEmitter is passed to run, and the socket interface works.  When it’s run as a command line application, we still call “crawlEmitter.emit” with the message and data we want to send, but the check at the top of the function will call “console.log” whenever “crawlEmitter.emit” is called (because there is no crawlEmitter in this case).

Another option would to be simply passing the function we want to use as a broadcaster into run.  So, pass crawlEmitter.emit as the 2nd argument for the dashboard version, or console.log for the command line version.  That might be a better, more readable solution, so I’m thinking about switching (haven’t tested this yet- but I don’t see any reason it shouldn’t work).

One of the most fun things about programming is how many roads you can take to one final product.  The trick is finding the balance between most efficient and most understandable – and always being open to finding a new route!

Multiple Method Madness

Arguments are generally important.

Different languages have different ways of dealing with them- but a couple have really interested me lately.  I work mostly in Javascript- which generally doesn’t care at all about your arguments.  Sure, you might have a function that accepts “error” and “callback”, but JS doesn’t actually care if you pass them or not.

That can be kind of useful- though sometimes a bit confusing.  Sometimes, I’ll have a function that has a couple different possible outcomes.  Simple example: In our Angular application, there are multiple methods that get info from the database (using observables).  Sometimes, we will need a final callback to fire after the data is retrieved.  You can pass 3 ‘blocks’ to a subscription: the success, error, and finally blocks.  That finally block is a great place to perform a task like hiding a ‘loading’ animation.

But in one case, the method calling the .subscribe function was on a shared service, and the function that needed to be called in the ‘finally’ block had to remain on the component itself (it created a d3.js chart and needed access to the DOM).  So, we added an argument to the method on the service for callback.  Something like:

getData(cb) {
    backendService.get(url).subscribe(response => {
        //set local variables and manipulate data response here
        this.myData = response.map(d => d);
    }, error => {
        //handle errors here
    }, () => {
        //clean up and call callback
        if(this.myData && callback) {

The ‘finally’ block doesn’t take any arguments- you just perform tasks. In this case, I check to make sure a callback has been provided (see below for details), and to make sure the data came in (check the length because I think a simple boolean check on an empty array still returns true- but 0 will return false).  If it did, I fire the callback (sending the data to the chart and drawing it to the screen.

To call the method, we have a couple options.  Pass a callback function or don’t.  Like I said- JS don’t care.  If I don’t have a graph to draw but need to get data, I just don’t pass a callback.  The check in the ‘finally’ block to make sure ‘callback’ exists before it’s called prevents any errors.  If a callback is provided, it gets called when the data is available.  Pretty cool!

In a statically typed language like C#, things are a bit different.  C# definitely cares about arguments.  If you try to pass nothing to a function that has arguments, you get an error.  If you try to pass an argument to a function that takes none, you get an error.  If you pass a string argument to a function that takes an int argument, you get an error.

You get the idea.

Strict, but no less useful than the JS method- it just requires a different mindset.  You sacrifice flexibility for security.  You know what that function takes and it will not run with anything else.  Probably leads to less bugs down the road.

And you have different options for flexibility.  In C#, you can have two functions with the same name- as long as they take different arguments:

public int addStuff()
    return 2 + 2;

public int addStuff(int y)
    return 2 + y;

Stupid, simple example- I know.  But still a pretty cool concept.  You get the same flexibility at the call site as you would with JS.  The addStuff function doesn’t care if I pass an int or no int (as long as I have both methods accounted for above).  It will still give me what I want.  The above example wouldn’t be very useful, but we did use it for a method in our live application.  I’m a bit new to the .net world, but it’s still cool to see the parallels with something familiar (and the differences).