Pulling the strings

So, back to the puppet show.

Actually, I guess this is going to be a bit more of a tangent, but it did come up when working on the new Puppeteer project (to crawl our Angular application and store server rendered pages in a local cache), so it still counts…

We wanted to have both a command line option and a graphic UI option to run the crawl service.  The UI would need a backend that keeps a websocket open and broadcast out updates when a new url was crawled (to show success or failure, and give stats at the end).  Socket.io works great for this- just install it on your Node project, and you can socket.emit messages and data to your frontend, which can listen with socket.on (use the same name for both emit and on to coordinate the two).

However, in the command line option, there would be no socket.  With this configuration, a user would just run the command and the messages/data should print to the console.  So we have a shared program that needs two different interfaces.  I had already created a “crawl-runner.js” file with my main “run” function.  It would handle the browser init, page creation, and navigation in headless Chrome (using Puppeteer).  It also handles storing and responding with the results.  It was set up to use a simple native Node EventEmitter- which worked fine for interfacing with websockets.  In fact, we could probably just cut out the middleman and eliminate the EventEmitter- just socket.emit directly from the crawler.

But either way, we will have to switch to console.log when using the command line option.  How to reuse the logic from crawl-runner.js in the command line version?  We can pass the emitter as an optional argument to “run” and if it’s not there, alias that name to console.log:

When the program is run in interactive, UI mode (via a dashboard on our Angular app), crawlEmitter is passed to run, and the socket interface works.  When it’s run as a command line application, we still call “crawlEmitter.emit” with the message and data we want to send, but the check at the top of the function will call “console.log” whenever “crawlEmitter.emit” is called (because there is no crawlEmitter in this case).

Another option would to be simply passing the function we want to use as a broadcaster into run.  So, pass crawlEmitter.emit as the 2nd argument for the dashboard version, or console.log for the command line version.  That might be a better, more readable solution, so I’m thinking about switching (haven’t tested this yet- but I don’t see any reason it shouldn’t work).

One of the most fun things about programming is how many roads you can take to one final product.  The trick is finding the balance between most efficient and most understandable – and always being open to finding a new route!

Access-Control-Allow-Google

Quick tangent this week, then back to the puppet show!

We launched a website this past week.  I’d love to say we were able to use all the cool new deployment tools, but for various reasons (stack choice, database connectors, developing and deploying on Windows, major version and architecture updates), it just wasn’t possible.  So, we had to shut down the old server, remotely log in, make some upgrades and update some versions, cross our fingers, and re-start.

Always a risky procedure.  Particularly with a two person team.  Really a one and a half person team.  I do a bit of dev ops, but am mostly a Javascript guy.  Apache config files aren’t really my sweet spot.

My partner on this was updating Django versions, then trying to install the right db connector to work with SqlServer.  It was tricky, but eventually he got there.  I updated Node, built our Angular application, and deployed the front end server.  For the most part, it worked, but there were a few ‘gotchas’.

One involved enabling https.  This was something we didn’t test in development- we have one SSL certificate and it was in use for the production site.  Enabling it on the Express server was simple enough, but I didn’t realize that there were 3 important parts.  The key, the certificate (got both those), and the chain.  Most modern browsers only need the first two, but some require the chain (I’m looking at you, Firefox on Mac).  The chain needs to be split on newlines in Node to work.  But we got past that hurdle after a bit of frantic Googling.

The other interesting one involved server side rendering our Angular application.  Like most modern JS apps, ours uses some 3rd party libraries.  We could never get Angular’s built in SSR to work (though we are still trying), so we’re using a 3rd party service to do the render.  Our server contacts that 3rd party’s servers, gets the rendered version of a page, and returns it (and then the SPA loads and takes over).  Google’s spiders are happy.  But our users aren’t- they just had to wait over 2 seconds to get a page.  However, we can store those renders in a local cache, so on any subsequent visit, it’s blazing fast again (I’m documenting the crawl and storing of those in the Puppeteer series- see entry one from last week- it’s super fun!).

Back to the related launch issue: we were getting cross origin request errors.  But only if the url included ‘www’.  For example: if I went to domain.tld, our site loaded, it made a delayed https call to our internal news article api, loaded those, and everything was great.  But if I went to http://www.domain.tld, boom- CORS error.  The error referenced a disallowed header.

The lesson this reinforced was to use your dev tools.  I hit F12 and checked the network tab.  Clicked the red response and inspected the request and response headers.  Then I made a successful (no ‘www’) request and compared those.  The ‘www’ request was getting an extra header related to the prerendering service (still not sure why it wasn’t present on all requests).  So, on our Apache server (running the news API- which just returns JSON), we just had to add that header to the allowed list.  I definitely don’t have that command memorized, but once I knew exactly what I was Googling for, it took about 2 minutes to find and implement.

Use your dev tools.  Words to live by in this business.  When something doesn’t work, there’s always an answer- particularly in this world where everything is online.  Dev tools help narrow down that search.  I went from a panic search of “site works without ‘www’ but not with ‘www'” to “apache configuration allow headers”.  The first got me nothing useful.  The second solve my problem almost immediately.

 

Master of Puppets

I’ve been working with Google’s new Puppeteer library the past week or so.  It’s basically the next generation headless browser API- you can use it for testing, scraping, performance profiling- their list is better than anything I can post here.

I’m using it to crawl our Angular application.  We have a service to generate a server side render version of our components (we are working on a custom process for this, but it turns out to be very tricky if you’re using 3rd party libraries).  However, using this service requires a call out to their server to create the pre-rendered version (so search engines can index more than just an empty body tag for our site!).  This extra call adds a noticeable lag on our page load.

We do have a local cache on our own server- once a page is rendered, it’s stored in that cache for a while and all future requests are nice and fast.  Ideally, however, no actual user should see that initial slow load if the cache doesn’t have the page they’re looking for.

Side note: I know this is a bit of a small problem.  It will affect very few users – specifically, only people who find a “deep” page in our SPA via a Google search or link and that page hasn’t been visited for a while so it’s not in our cache.  However, we are at the point where we are trying to optimize as much as possible pre-launch, and this project’s time has come!

So, we need a reliable, preferably automated way to crawl all our routes.  Puppeteer is perfect for this operation.  It has a fairly large API, but performing the basics is really simple:

//be sure to npm install puppeteer
const puppeteer = require('puppeteer');

//create a browser instance
const browser = await puppeteer.launch();

//open a page
const page = await browser.newPage();

//navigate
await page.goto('https://your-url.com');

//important for some sites (like SPAs)- make sure we wait until the page is fully loaded
await page.waitForNavigation({ waitUntil: 'networkidle' });

//evaluate gives you access to the DOM- use good old querySelector type methods!
const myLinks = await page.evaluate(async () => {
    const links = Array.from(document.querySelectorAll('htmlselector'));
    return Promise.resolve(links.map(link => link.href));
});

Note that everything is using async/await – which is really great. It makes asynchronous code easier to read and reason about (depending on who you ask…).  However, you will need Node 7.6 or higher.

After the above code, you should have an array of links from whatever selector you passed to querySelectorAll.  The functional programmer in me wanted to just forEach that array, visit each in a new puppeteer ‘goto’ command, and call it good.  However, that doesn’t appear to work.  With async/await, you need to use a for/of loop and await each crawl:

for(let link of myLinks) {
    try {
        await crawlPage(link);
    } catch(err) {
        handleError(link, err);
    }
}

Try/catch is the preferred (only?) error handling mechanism for async/await. Also, at this point, I had abstracted my crawl and error processes out into their own methods. Basically, crawlPage just opens a new page, goes to it, saves the result to a log,  emits the result (more on that in the next post!), and returns the page.close() method- which itself returns a promise (allowing crawlPage to be an async function.

handleError records and emits the error source (url) and message.

async function crawlPage(url) {
    const page = await browser.newPage();
    const pageResult = await page.goto(url);
    let resultObj = { success: pageResult.ok, url };
    resultGroup.push(resultObj);
    crawlEmitter.emit('crawled', resultObj);
    return page.close();
}

function handleError(url, err) {
    const errorObj = { success: false, url, err };
    resultGroup.push(errorObj);
    crawlEmitter.emit('error', errorObj);
}

All very cool and useful stuff.  We can run this process over our application to refresh the local cache at a regular interval- preventing an actual user from ever experiencing the slow load.

Next week will be part 2 of this saga.  I wanted two options: run this from the command line or as a cron job, and have a nice UI that will show the last crawl, stats, and allow a non-technical user to crawl (including crawl by different categories- our url routing comes partly from a JSON config file).  This meant I would need to log to the console sometimes (running as a command line tool), or return data to a browser others.  This is where crawlEmitter comes in- it’s a custom Node EventEmitter (not hard to create) that sends data to a frontend via a websocket.  The fun part was aliasing my crawlEmitter- if it doesn’t exist when the main function starts, it falls back to a good old console.log!

Little Johnny Angular Has Trouble Focusing in Class

Short post this week, but working on a longer one in the next couple weeks regarding Google’s new Puppeteer project.  Spoiler- I think it’s pretty cool.

I have worked with latest generation angular for a couple years now.  It’s generally great, occasionally frustrating, and often interesting.  However, until this past week, I hadn’t really had a need to work with directives.  Specifically, attribute directives, but it turns out they can be very useful!

The challenge: I was creating a search feature for a new website.  I had the actual search function working well (and the data being searched over was small enough to include in an in-memory JSON structure, so it’s a nice and fast search as you type setup).  All that was left was to put a UI on this thing.  I put together a nice full screen display with large input element that fires the search function as a user types.  It will show/hide based on an ngIf conditional.  All simple enough in Angular land- and something we do all the time (by the way, I also tried this with the [hidden] element attribute, but ran into the same difficulty below).

So, click the search icon, the background fades in and search input slides down.  The user can search and results pop right up.  All great- but the input field autofocus was not cooperating.  It would fire properly the very first time ngIf became true, but never again after that.  If a user closed the search display, then opened again for another search, no autofocus.

Not a huge deal- but an inconvenience for users, and something that I should be able to make work.  It makes sense- the built in autofocus attribute was probably only ever designed to fire on a page load- but that only happens once (ideally) with an angular application.  I tried firing a .focus() event in my controller, but it didn’t seem to have any effect.

The answer was a directive.  Specifically, an attribute directive.  Directives were all over angular V1, but in latest generation, you can usually get away with just having components and services.  In this case, I could just make a super simple directive that grabs a reference to the element (the angular way!) and fires the .focus() in the right lifecycle hook (some references said it would work in onInit, but I had to use afterViewInit).

Boom- the search input element gets focused every time the search display opens!

 

Deep Streams

Usually when I think of Node, I think about web servers.  That’s mostly what I use it for when writing code- setting up a simple test server for a proof of concept, or bringing in Express and its ecosystem for more production-ready projects.

But in reality, I use Node for a whole lot more.  Of course, just about anything NPM related is a use of Node- but it also powers all the awesome developer tools that we don’t even really need to think about much anymore.  When a minifier runs over your Javascript code before you push to production- Node is probably doing that magic.  Same for a bundler.  It’s behind the scenes on quite a bit of frontend workflow these days.  I use it for such all the time, but I hadn’t had much chance to really write any of those dev tools until recently.

The task was pretty simple- I had a js file with a bunch of arrays and objects someone else had entered.  The formatting had been mangled somewhere along the way- there were long strings of spaces everywhere- I wanted to strip them out, but leave any single spaces (to preserve multi word strings).  Now I know: any halfway good code editor will have the search/replace feature to handle this, but I could see this being a nice little utility to write in Node.  That way, I could run it over an entire directory if necessary (probably won’t ever be necessary, but I really wanted to do this short little side project).

My first iteration was a super simple script using the fs module.  First, a nice path splitter utility in a separate file.  This takes a string, splits it out by path and extension, and inserts ‘-‘ and whatever new string you want.  This prevents overwriting the original file (though this part would be unnecessary if you do want to overwrite- just pass the same file name to the write function):

Then we can use that in our script to strip multi-spaces and return a new file:

All very cool. But I really like the notion of streams in Node. What if the file I need to manipulate is really large? With my current setup, it might take a lot of memory to read the entire file, then write it out. But that’s what streams are for! So I rewrote the script with a custom transform stream. It wasn’t all that difficult- as soon as I realized that the required method on your custom stream (that extends the Transform class) has to be named _transform. If you leave out the underscore, it will not work (recognizes Transform as undefined).

Again, in a separate file (small modules for the win!), I defined my custom stream:

Then it was just a matter of importing that and the path splitting utility created for the original fs version (code reuse for the win!) and running a couple Node-included streams (createReadStream and createWriteStream) that can make a stream from a file automatically:

Both methods (fs and stream) are simple and concise. Both have their place. Creating a custom transform stream was probably unnecessary for this task, but would be very useful for very large files. Either way, it was a fun quick dive into some other corners of Node!

Static Signal

Express is awesome.

I’ve worked with a few backend technologies, but Node will probably always be my favorite because JS was the first language I really picked up.  Python (Django) and C# (.NET) are both cool, but being able to write JS on the server is great- no more context switching and ending my Python lines with semicolons!  You can create some really cool stuff with Node if you mind the event loop.  If you’re experienced coding for a browser, it shouldn’t be too difficult- the principle of not blocking the loop applies in both places!

Anyway, back to Express.  It is a very useful wrapper for Node- making it easy to handle requests and send responses, set headers, and even bring in security packages (hello helmet!).

One of the best parts is the static file serving process.  As a learning exercise, I wrote a Node process to serve static files- it works, but it’s not pretty, and I’m sure it’s not as robust as Express offers.  Just do the normal Express app setup, then call:

app.use(express.static(path.join(__dirname, 'directory-path-here'));

If you want your code to be a little more flexible, use Node’s __dirname global.  It gives you the directory name of the current module- very useful to make sure your code is portable (for running in dev vs production, for example).

However, there is one little “gotcha” with serving static via express- particularly when creating a single page application.  express.static can be passed an options object as its second argument.  Without anything passed, it uses sensible defaults- one of which is looking for an ‘index’ file in the directory it’s given.  Makes sense- it’s a good place to start when looking for what file to serve.  If you’re creating a single page application, you probably want to return index.html for just about everything.

But we wanted to include a nice alternate homepage for people using very old browsers (think IE9 and below).  Those won’t load our fancy Angular application, but we didn’t want them to just see a page of gibberish.  So we created a simple HTML page with some vanilla JS interactions on a contact form (felt like the good old days!).  But when we went to configure the routing, we just couldn’t get the server to return anything but index.html- even when we set a conditional in our route to check the browser name/version and return a different file.

The answer was in the documentation (of course).  That options object can be passed an “index” property.  This is a boolean that defaults to true, but if you set it to false, Express won’t automatically serve up the index.  You can control what gets served- just remember that now you have to return index.html manually (after any other possible files, depending on your setup).

app.use(express.static(path.join(__dirname, '/'), {index: false}));

There we go!  Now people forced to use IE8 (and I can’t think of anyone who would voluntarily use it these days…) can still view our super important content!

Living in a Material World

The more I work with React, the cooler it seems.  I know- I’m way behind.  React is already uncool, Vue is the new hotness.

However, every time I start writing custom HTML components in plain ‘ol Javascript, I end up with a pattern that looks a lot like React.  I then say to myself: “Why am I recreating a crappy version of React instead of just using React?  People smarter than me have spent many hours making a much more optimized version of this monstrosity I’m building”.  Then I look around to make sure no one heard me talking to myself.

It’s at that point that I run the “create-react-app my-app” command and really get down to business.

I started a project with React almost a year and a half ago, but just couldn’t wrap my head around the router.  It’s not that React Router (v4) wasn’t well built (though I might have thrown some insults at it in a frustration-fueled rage), but I’d been using Angular at work, and the router philosophies are pretty different between those two.

I started a new project with React (integrating some Phillips Hue lights into a ui) a couple weeks ago, and decided to give it all another shot.  The first few hours of figuring out the routing was still a struggle, but then it kind of clicked.  The <Router> declaration in React is a bit like the router-outlet in Angular, with the declaration of the route all in one.  I don’t think I’d really grasped that originally, and it led to a mess of an application.

But the router isn’t the point of this short post.  This one is about integrating Material-UI with React and how cool composable components are when you start to understand how to really use them (note that I said “start to understand”- I have a long way to go).

So, I’d already created some simple components to get everything working.  I’d also cooked up a simple Node server to serve as a fake API.  I hadn’t bought the smart lights yet- but wanted to test, so I just copied the JSON structure from the documentation, pasted it into a file on my computer, and set up a server on localhost to respond to the same endpoints the actual API would and serve up the JSON.  It works surprisingly well!

One of those components displays all the lights available (2 currently).  That LightPanel component is made up of LightSwitch components- one for each light.  The focus early on was to get them working.  Once I reached that point, the time had come to actually make them look presentable.

Too often this is where I end up burning a lot of time.  I have a habit of trying to create nice looking css on my own- but this time I learned my lesson.  I’d let the experts help out, and use the Material UI library.  It integrated pretty easily, As a test, I switched my LightSwitch jsx from simple custom radio buttons to the Toggle component and it instantly looked better.  Success!

But the really cool part was when I decided to use the Card element for my main LightPanel grid.  Again- this will show all the switches, their status, and allow a user to turn them on or off.  Each light gets its own Card in the grid.  The functionality to get this info and do these actions was already there, I just had to integrate it into the Card element.  I thought the best way would be to just embed my LightSwitch component into the Card’s ‘text’ section.  This worked, but didn’t look quite right.  I realized that it would really go best in the Card’s ‘title’ section.

But the title section is meant to be passed info as attributes- not have text inside tags.  Instead of <CardTitle>My text here</CardTitle>, it should be <CardTitle title=”My text here” />.  I wanted to basically embed a custom React component into the “title” attribute.

Thinking “there’s no way this will work”, that’s exactly what I did:

I saved and waited for my command line output to give me a wall of red text, but none appeared.  Fingers crossed, I opened my browser, and there it was, in all its glory.  The card with my custom switch component embedded as the title attribute- toggle functionality and all!

So thanks all around: to React, to Material UI, to the wonders of Javascript!

When in Rome

I really like learning new programming concepts and methods.  The beautiful and frustrating thing about software development is that it’s kind of endless.  No matter how much you learn, there’s almost always going to be something you don’t know.  As long as you don’t let that though overwhelm you, and focus on the process instead of the outcome, it can be a fun ride.

And the internet is a great place for learning.  Don’t get me wrong- it’s also full of inaccurate, opinion-tainted drivel, but you can definitely find really good resources.  How to tell them apart?  The compiler (or browser in the case of the web) doesn’t lie.  Learn something, try it on your own- if it works, on a real project you’ve found a good resource.

A couple months back I posted about Wes Bos’ great courses (seriously- if you want to learn, go get his stuff).  I also find Pluralsight very helpful (sometimes you feel like laying down and being lazy- why not watch videos too!).  Free Code Camp is another great avenue for learning.  I started that track a few years ago but drifted away due to work and time limitations.  When I went back a few weeks ago, I found even more great problems and projects to practice on.

One of them is the Roman Numeral Conversion algorithm problem.  It’s listed as an “intermediate” difficulty question, but the instructions are simple: “Convert the given number into a roman numeral”.  For some reason, this one was giving me a really hard time- I just couldn’t figure out how to do it without making a big map of numbers linked to their corresponding roman numerals.

So I decided to do exactly that.  I would solve the problem in the worst way I possibly could.  If my only idea was to make a big map, I’d just make a big map.  And something interesting happened along the way.  Patterns began to emerge.  Using my big map, I had a bunch of conditional logic, depending on the length of the number.  But I realized I could break that into a single loop.  I had a whole bunch of logic in that loop that really belonged inside a helper function, so that was the next step- abstract it out to a helper.  Finally, I realized that instead of the map (much smaller now), if I used an array, I could match the index up with the loop counter variable in my main function- passing the correct Roman Numeral letters without having to manually do so each time.

The code is available in some Github Gists- first, second, and final passes.  Apologies if there are errors- never used the Gist feature before, but it’s pretty cool!

I bet there are many ways to improve it- the logic in the helper seems unwieldy, but the best thing I learned from this exercise was the general process.  Get something working, see patterns, edit, repeat.

Fat Model in a Skinny View

Working with Django can be pretty fun.  I’m not quite as familiar with the patterns and recommended practices as with Javascript and it’s associated frameworks, but there is a cool mantra I learned recently: Fat Models – Skinny Views.

I’ve been working on creating an API for a product website/blog.  The front end is latest Angular, the back end uses Django to return JSON (in as close to a REST API as we can get).  Because my Python’s a bit rusty (our latest project used .NET as a backend), I was catching up with some Pluralsight videos.

Side note: Pluralsight is a great resource for learning more about many programming languages, frameworks, best practices, and so on.  I’m not an employee there, just a satisfied customer.  The series I was using to brush up on Django was called Django Fundamentals.

Anyway, the instructor mentioned a phrase that resonated with me: Fat models and skinny views.  Basically, keep your logic in methods on your models, and make your views responsible only for returning data as much as possible.  Until this point, we’d been haphazardly filling up our views with all the logic and it was a bit of a mess.

The fat models/skinny views pattern allows for more abstraction and reusability.  Example time!

One of the views is to simply get the blog data in JSON format for display.  If the request passes a ‘slug’ (the identifying string for an individual blog), we get just that blog entry.  Otherwise, we get all.  Nothing super complicated- but joins on a database table can be tricky in Django’s ORM (the layer on top of any SQL transaction).  The ORM is useful, but I just couldn’t get it to join our main blogs table with the category table and the authors table.  As a result, the frontend was just getting the author id and category id instead of the actual string name/title (because only the id was stored in the blogs table- in a Foreign Key relationship).

From a bit of digging into Django, I found that you can ‘prefetch’ that related info using the select_related method.  So our view started as:

def blogs(request, slug=""):
    if request.method == 'GET':
        if slug != "":
            entry = Blog.objects.select_related('category', 'author').get(slug=slug)
            return JsonResponse(json.dumps({"result": True, "msg": "Blogs Loaded", "data": data}), safe=False)
        else:
            blog_list = Blog.objects.filter(posted__lte=datetime.today()).order_by('-posted').select_related('category', 'author')
            return JsonResponse(json.dumps({"result": True, "msg": "Current Blogs Loaded", "data": data}), safe=False)

We check the request (only GET is allowed on this unprotected route- there’s a different one with JWT auth for updating/adding entries) and then check for existence of a slug.  If one is passed, get the individual entry with the related info prefetched.

But there was an issue.  On the frontend, we logged the JSON response, but no author or category info was included.  Just the data from the blog table.  I went back to the Django side and printed out my “entry” and “blog_list” variables- everything was there!  From a little more research, it seems that the above method works just fine if you’re using the default Django view template, but we weren’t- we just wanted JSON data passed to our Angular components for display.  It looks like when the entry/blog_list collection was converted to JSON, only the original object info was coming along for the ride.

But the related data (category and author) was available on the original entry/blog_list object.  So, we could just loop through and grab all the properties we need, add them to a simple Python dict, and return that.  With that solution, our view officially passed from slightly chubby into full on fat.  The loop logic was repeated in multiple places and multiple views- not good for future developers who might have to update this code (probably me!).

So, we made a slight update.  As an example, for the above code, just after we get the Blog.objects collection (either the individual blog or group of blogs), we call to a method on the model:

data = Blog.get_blog_and_join(entry)

That one’s for the individual entry. On the model, we created a new method to handle that logic:

def get_blog_and_join(entry):
    data = {"pk": entry.id, "title": entry.title, "slug": entry.slug, "author": entry.author.username, "body": entry.body, "posted": entry.posted.strftime("%Y-%m-%d")}
    if entry.category:
        data["category_name"] = entry.category.title
        data["category"] = entry.category.id
    else:
        data["category_name"] = "No Category"
    return data

This just loops through the entry passed (the Blog object returned by Django’s ORM with related author and category data) and assigns the values to properties in a new dict.  Note the author prop- it references entry.author.username.  This didn’t work in the original method of just returning a JSON encoded version of the Blog.Object returned by the ORM.  A little extra logic to check if a category exists (they’re optional in our setup), and now we have a simple object with all the info we need.  It can be returned as JSON and the front end team has their precious data!

Strippin’ Tags

A while back, I helped create a Django back end for an update of an existing website.  The website’s front end was AngularJs, and we didn’t really get how the two should integrate.  The result?  A bit of a mess.  Django template views inside Angular html template files.  We had to modify the bracket style used for Angular (as it conflicts with Django’s).  The folder structure was confusing, the code was confusing- it worked, but it wasn’t a great solution.

Now I understand what we should have done: use Django to create an API for the Angular front end.  Just return that sweet, sweet JSON to the app and have Angular do the templating.  We are in the process of this update (as well as migrating this one from AngularJS to Angular the Next Generation, and I ran into an interesting project.

The original site had a Django-driven blog.  This was not integrated into the main AngularJS app in any way- a user clicked the ‘news’ link and they were taken to a completely new page with a traditional Django blog setup.  It made creation and updating easy- we just used Django’s admin panel for new posts, and the default templating views to handle any sorting (by category, date, etc).

But it doesn’t really fit.  Having the blog as a module within the Angular (now to be v4.2.4) application makes more sense.  With our new API approach it will work, but it will take some extra work.

One aspect is the admin panel.  We won’t be using the built in Django admin panel- instead, we’ve created an Angular-driven admin panel.  Converting the data returned by Django’s ORM to JSON was a bit tricky at first, but it’s flowing smoothly at this point (might cover that in a future post, as it was a bit of a process).

Another aspect is searching through blog posts (on the user interface), and that’s where we come to the cool project of the week.  One of the benefits of doing this extra work is getting the hip “instant updates” feel of a single page app within our blog’s UI.  When someone types in the search bar, they see the blog list below filter immediately.

But I noticed some test posts were coming up in almost all search results.  They happened to be the test posts with images in them.  The reason?  We are using an HTML editor toolbar for the admin area.  When a site admin posts a new blog, they use this toolbar to format text, add links, or post images (not everyone posting will be a developer or have source code access).

The toolbar has a cool feature where it encodes any images uploaded to base-64 format.  According to Wikipedia,  “Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation.

I don’t really know what that means, but I do know it means I don’t have to set up an ‘images’ folder for my blog module and save a path to those images with a relation to the blog post in my database.  When it converts to base-64, it gives me a string of text that a modern browser can translate into an image.  That image src can be saved right along with the rest of the HTML body.

Very helpful – but back to the search feature.  It turns out that the search was iterating over everything stored in the blog’s body- including the html code.  I would have had to fix this anyway, but it became really apparent with the image issue.  Because the string of text representing an image was so long, it naturally contained matches to most strings I was searching (at least the simple ones people would start with).  It might not have mattered on a traditional request-response site- a person would type their whole search string then submit and avoid seeing the wrong results.  But in a live reload search, the problem was obvious.

My first thought was a terrible one: Why not store a plain text version of the blog body in the database?  That way, we just return it alongside the rest of the JSON data and use that for the search.  But that really doesn’t make sense- it’s just bloating our database tables with duplicate info and increasing the size of the JSON object returned on each request.

Then I remembered how cool Javascript can be- and that it comes with awesome built in array features like ‘map’.  So, why not return the data as we have been (with the body as HTML) and manipulate the body just within the search function?  We can search over that manipulated body, but display the original HTML.

Turns out that this works pretty well.  In our main blog component, we initialize a search array on a service:

this.blogViewService.searchArray = ['author', 'title', 'body_plain_text', 'category'];

This is a convention we’re using on a different project- the array contains the property names we want to search by (those property names appear within each object in the array we will be searching over). That searchArray is passed to our search service with some other info.  In this case, ‘body_plain_text’ doesn’t exist on the object- but we’re going to create it as we go.

For the search, we cheat a bit and use Angular’s built in form input subscription.  A form input field in Angular has an observable you can hook into to get the data as a user types (called valueChanges).  You can then subscribe and do any searching there.  All we need to do is make sure to transform each object a bit in the process:

this.blogViewService.searchSubscription = this.blogViewService.term.valueChanges
    .debounceTime(200)
    .subscribe(result => {
        let itemsToSearch = this.blogViewService.originalItems
            .map(item => {
                item['body_plain_text'] = String(item['body']).replace(/<[^>]+>/gm, ' ');
                return item;
            });
        //we pass to another service to do the actual filtering of originalItems here
    });

We start with assigning a reference to this subscription (searchSubscription) so we can unsubscribe on the component’s destruction (to avoid memory leaks).  Then we hook into the form input observable (valueChanges).  I put a slight delay on the process at this point with debounceTime(200)- when the news/blog section gets big enough that returning it all up front doesn’t make sense, we will have to hit the database in this search.  debounceTime(timeHereInMs) is a great rxJS built in that handles debouncing your calls.

Finally, we get to the actual change- the originalItems array is mapped, but none of the properties within each object are actually changed.  Instead, we append a ‘body_plain_text’ property to each object that uses a regex to strip HTML tags.  Originally, it replaced with nothing, but then we had words joining together (if they were on the other side of tags), so replacing with a single blank space preserves the integrity of the search.  We never change the original ‘body’ property- this is where our HTML lives and is used for the actual display.

I’m sure there will be edge cases where this might not work and we have to tweak the process, but it’s a good start.  I also don’t think this is technically how you’re supposed to use .map- it’s a functional programming staple, and I’m using it to append a property to an object then return it back to the array.  Definitely not functional programming!