Pulling the strings

So, back to the puppet show.

Actually, I guess this is going to be a bit more of a tangent, but it did come up when working on the new Puppeteer project (to crawl our Angular application and store server rendered pages in a local cache), so it still counts…

We wanted to have both a command line option and a graphic UI option to run the crawl service.  The UI would need a backend that keeps a websocket open and broadcast out updates when a new url was crawled (to show success or failure, and give stats at the end).  Socket.io works great for this- just install it on your Node project, and you can socket.emit messages and data to your frontend, which can listen with socket.on (use the same name for both emit and on to coordinate the two).

However, in the command line option, there would be no socket.  With this configuration, a user would just run the command and the messages/data should print to the console.  So we have a shared program that needs two different interfaces.  I had already created a “crawl-runner.js” file with my main “run” function.  It would handle the browser init, page creation, and navigation in headless Chrome (using Puppeteer).  It also handles storing and responding with the results.  It was set up to use a simple native Node EventEmitter- which worked fine for interfacing with websockets.  In fact, we could probably just cut out the middleman and eliminate the EventEmitter- just socket.emit directly from the crawler.

But either way, we will have to switch to console.log when using the command line option.  How to reuse the logic from crawl-runner.js in the command line version?  We can pass the emitter as an optional argument to “run” and if it’s not there, alias that name to console.log:

When the program is run in interactive, UI mode (via a dashboard on our Angular app), crawlEmitter is passed to run, and the socket interface works.  When it’s run as a command line application, we still call “crawlEmitter.emit” with the message and data we want to send, but the check at the top of the function will call “console.log” whenever “crawlEmitter.emit” is called (because there is no crawlEmitter in this case).

Another option would to be simply passing the function we want to use as a broadcaster into run.  So, pass crawlEmitter.emit as the 2nd argument for the dashboard version, or console.log for the command line version.  That might be a better, more readable solution, so I’m thinking about switching (haven’t tested this yet- but I don’t see any reason it shouldn’t work).

One of the most fun things about programming is how many roads you can take to one final product.  The trick is finding the balance between most efficient and most understandable – and always being open to finding a new route!

Advertisements

Deep Streams

Usually when I think of Node, I think about web servers.  That’s mostly what I use it for when writing code- setting up a simple test server for a proof of concept, or bringing in Express and its ecosystem for more production-ready projects.

But in reality, I use Node for a whole lot more.  Of course, just about anything NPM related is a use of Node- but it also powers all the awesome developer tools that we don’t even really need to think about much anymore.  When a minifier runs over your Javascript code before you push to production- Node is probably doing that magic.  Same for a bundler.  It’s behind the scenes on quite a bit of frontend workflow these days.  I use it for such all the time, but I hadn’t had much chance to really write any of those dev tools until recently.

The task was pretty simple- I had a js file with a bunch of arrays and objects someone else had entered.  The formatting had been mangled somewhere along the way- there were long strings of spaces everywhere- I wanted to strip them out, but leave any single spaces (to preserve multi word strings).  Now I know: any halfway good code editor will have the search/replace feature to handle this, but I could see this being a nice little utility to write in Node.  That way, I could run it over an entire directory if necessary (probably won’t ever be necessary, but I really wanted to do this short little side project).

My first iteration was a super simple script using the fs module.  First, a nice path splitter utility in a separate file.  This takes a string, splits it out by path and extension, and inserts ‘-‘ and whatever new string you want.  This prevents overwriting the original file (though this part would be unnecessary if you do want to overwrite- just pass the same file name to the write function):

Then we can use that in our script to strip multi-spaces and return a new file:

All very cool. But I really like the notion of streams in Node. What if the file I need to manipulate is really large? With my current setup, it might take a lot of memory to read the entire file, then write it out. But that’s what streams are for! So I rewrote the script with a custom transform stream. It wasn’t all that difficult- as soon as I realized that the required method on your custom stream (that extends the Transform class) has to be named _transform. If you leave out the underscore, it will not work (recognizes Transform as undefined).

Again, in a separate file (small modules for the win!), I defined my custom stream:

Then it was just a matter of importing that and the path splitting utility created for the original fs version (code reuse for the win!) and running a couple Node-included streams (createReadStream and createWriteStream) that can make a stream from a file automatically:

Both methods (fs and stream) are simple and concise. Both have their place. Creating a custom transform stream was probably unnecessary for this task, but would be very useful for very large files. Either way, it was a fun quick dive into some other corners of Node!

Static Signal

Express is awesome.

I’ve worked with a few backend technologies, but Node will probably always be my favorite because JS was the first language I really picked up.  Python (Django) and C# (.NET) are both cool, but being able to write JS on the server is great- no more context switching and ending my Python lines with semicolons!  You can create some really cool stuff with Node if you mind the event loop.  If you’re experienced coding for a browser, it shouldn’t be too difficult- the principle of not blocking the loop applies in both places!

Anyway, back to Express.  It is a very useful wrapper for Node- making it easy to handle requests and send responses, set headers, and even bring in security packages (hello helmet!).

One of the best parts is the static file serving process.  As a learning exercise, I wrote a Node process to serve static files- it works, but it’s not pretty, and I’m sure it’s not as robust as Express offers.  Just do the normal Express app setup, then call:

app.use(express.static(path.join(__dirname, 'directory-path-here'));

If you want your code to be a little more flexible, use Node’s __dirname global.  It gives you the directory name of the current module- very useful to make sure your code is portable (for running in dev vs production, for example).

However, there is one little “gotcha” with serving static via express- particularly when creating a single page application.  express.static can be passed an options object as its second argument.  Without anything passed, it uses sensible defaults- one of which is looking for an ‘index’ file in the directory it’s given.  Makes sense- it’s a good place to start when looking for what file to serve.  If you’re creating a single page application, you probably want to return index.html for just about everything.

But we wanted to include a nice alternate homepage for people using very old browsers (think IE9 and below).  Those won’t load our fancy Angular application, but we didn’t want them to just see a page of gibberish.  So we created a simple HTML page with some vanilla JS interactions on a contact form (felt like the good old days!).  But when we went to configure the routing, we just couldn’t get the server to return anything but index.html- even when we set a conditional in our route to check the browser name/version and return a different file.

The answer was in the documentation (of course).  That options object can be passed an “index” property.  This is a boolean that defaults to true, but if you set it to false, Express won’t automatically serve up the index.  You can control what gets served- just remember that now you have to return index.html manually (after any other possible files, depending on your setup).

app.use(express.static(path.join(__dirname, '/'), {index: false}));

There we go!  Now people forced to use IE8 (and I can’t think of anyone who would voluntarily use it these days…) can still view our super important content!

Node Forever!

I really like working with Node on the server side.  That might be because I feel more comfortable in Javascript than Python or PHP or .NET.  I chose Node as the backend for a super simple internal marketing app created a couple months ago.  We only had a couple days to make it work and Node/Express seemed like the quickest way to get it going.  There would only be a couple routes, so that was easy- but it would also require logging in, so MongoDB would be a necessity.  The trickiest part (I think this was outlined in a previous post) was installing the bcrypt library for database password security.

But we figured it out and everything was running smoothly.  Until someone hit the little red ‘X’ in the top right of the command prompt that was running the Node server.  That kicked off some emails marked ‘Urgent’ and made for an early morning for one developer (me).

So I looked into how we could keep the Node server running even if that prompt was closed.  Keep it running forever, even.  And, of course, the answer was Forever- the npm module.

It’s a useful little thing that sets up your server as a daemon (or process- I’m unclear on the terminology).  Either way, it runs in the background, so your window doesn’t need to be open.  It works great, is easy to install, and I could go back to sleep.

But there is one gotcha- particularly for a fumble fingers typist low on sleep.  The command to start a server with Forever is simple: ‘forever start server.js’ (or whatever your server file is called).  But if you forget the ‘start’ part, and just go for ‘forever server.js’, Forever starts the process and appears to fail to assign an id to it.  So it can’t be stopped.  EVER.  That might not be terrible, but it was also preventing me from logging in, meaning it was preventing the boss from logging in, and that was trouble.  Internal marketing numbers were dropping fast! (I have no idea what that means- I just make this stuff work, I don’t know how to sell it).

Of course, nothing is completely unstoppable.  Ctrl-Alt-Delete is your Windows friend!  After trying every command listed by Forever –help, I opened the task manager and found a stray Node.js background process running.  Force quit that and the server stopped.

Moral of the story: type carefully and read the docs.  Powerful modules like Forever are a great help, but can bite you for mistakes.  I also updated my package.json file to make some foolproof scripts to not do this again.  ‘npm start’ now runs ‘net start mongodb && forever start server.js’ to make sure mongo and node are running.  ‘npm stop’ does the opposite.

I mean it

Getting the new project going using C#/.net has been a bit of an adventure.  There are multiple reasons:  I had no prior experience with the language or framework, Few people on the team really did, and we decided to use .net core 1.0- the latest and greatest version.

So, there have been some hurdles, but it’s been fun to learn.  Visual Studio 2015 is both great and frustrating.  It’s an amazing program that will help you quickly check what methods are available and what each does (intellisense), it provides a built in test server for debugging, and even manages dependencies for you.  However, it also slows to a crawl sometimes while debugging, occasionally crashes for no apparent reason, and can be very complicated to publish a project (unless it’s directly to Azure).

It can be complicated to even start a project when you try to include a frontend framework (like Angular 2).  We eventually got one working (with the help of a nice example using yeoman).  I’m not trying to rip on VS2015- it’s amazing, but sometimes it’s nicer working with something a little lighter.

Anyway, I’m still much more comfortable with Javascript.  So when a side project came in that would let me set up a simple server and login system to host an internal use app, I jumped on it.  I got to choose the stack to use, and went with Node on the server.  I know there are critics, but I’ve been learning quite a bit about Node recently and wanted to give it a try.

It’s been great so far.  The frontend of this app is also Angular 2, so we can just work in JS the entire time.  My favorite experience so far as been implementing the persistent login system.  Why?  Because I don’t really know if we’ve found a creative solution or just a terrible idea, but I hope to find out.

We needed the app to be secure, but also allow a user to stay logged in across multiple Angular routes and even a page refresh if needed.  The first part is easy- use the CanActivate operation to protect everything except the login screen (there’s no api at the moment, but obviously that would need to be protected on the backend as well).  A locally stored ‘isLoggedIn’ boolean will check to see if access is allowed.  It’s initially false- a user logs in and their username/password is sent to the (Node) server.  There, it’s checked against the actual user data (currently in an unsecure json object for testing, soon to be in MongoDB).  If a match on the user/password is found, a json response is sent back containing a success notice and a json web token which is stored in sessionStorage (more on that in a minute).

The frontend then sets isLoggedIn to true- this will keep the user logged in while ‘inside’ the angular app.  But what if a reload happens?  It shouldn’t, but definitely could.  That would reset the app, and isLoggedIn reverts to false.

In steps the json web token.  Accepted practice is to confirm this token on any request to a protected route on the server, but we only really have two (login and home- it will be a simple display page).  It seemed unnecessary to contact the server again, unless the page is reloaded.  So, in the root component of the Angular 2 app, we added a check in OnInit.  It looks to see if there is a ‘token’ property in sessionStorage.  If there is, it sends the token value to the server.  On the server, that token is decoded and checked against a valid active token.  If a match is found, that user is automatically logged back in.  If not, back to the login screen!

It seems to work well, and seems fairly secure.  The jwt is stored in sessionStorage, which could be viewed, but I believe only on the actual user’s computer, correct?  Either way, it’s encrypted- that token has to be sent back to the server to be verified before access is allowed.

We’re still working on it, so I’m sure there will be changes/improvements, but so far it’s been a great experience.  Before this, I’d only heard about the benefits of full stack JS (backends I’ve worked on range from PHP to Python and now C#), but it really is nice.