Strippin’ Tags

A while back, I helped create a Django back end for an update of an existing website.  The website’s front end was AngularJs, and we didn’t really get how the two should integrate.  The result?  A bit of a mess.  Django template views inside Angular html template files.  We had to modify the bracket style used for Angular (as it conflicts with Django’s).  The folder structure was confusing, the code was confusing- it worked, but it wasn’t a great solution.

Now I understand what we should have done: use Django to create an API for the Angular front end.  Just return that sweet, sweet JSON to the app and have Angular do the templating.  We are in the process of this update (as well as migrating this one from AngularJS to Angular the Next Generation, and I ran into an interesting project.

The original site had a Django-driven blog.  This was not integrated into the main AngularJS app in any way- a user clicked the ‘news’ link and they were taken to a completely new page with a traditional Django blog setup.  It made creation and updating easy- we just used Django’s admin panel for new posts, and the default templating views to handle any sorting (by category, date, etc).

But it doesn’t really fit.  Having the blog as a module within the Angular (now to be v4.2.4) application makes more sense.  With our new API approach it will work, but it will take some extra work.

One aspect is the admin panel.  We won’t be using the built in Django admin panel- instead, we’ve created an Angular-driven admin panel.  Converting the data returned by Django’s ORM to JSON was a bit tricky at first, but it’s flowing smoothly at this point (might cover that in a future post, as it was a bit of a process).

Another aspect is searching through blog posts (on the user interface), and that’s where we come to the cool project of the week.  One of the benefits of doing this extra work is getting the hip “instant updates” feel of a single page app within our blog’s UI.  When someone types in the search bar, they see the blog list below filter immediately.

But I noticed some test posts were coming up in almost all search results.  They happened to be the test posts with images in them.  The reason?  We are using an HTML editor toolbar for the admin area.  When a site admin posts a new blog, they use this toolbar to format text, add links, or post images (not everyone posting will be a developer or have source code access).

The toolbar has a cool feature where it encodes any images uploaded to base-64 format.  According to Wikipedia,  “Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation.

I don’t really know what that means, but I do know it means I don’t have to set up an ‘images’ folder for my blog module and save a path to those images with a relation to the blog post in my database.  When it converts to base-64, it gives me a string of text that a modern browser can translate into an image.  That image src can be saved right along with the rest of the HTML body.

Very helpful – but back to the search feature.  It turns out that the search was iterating over everything stored in the blog’s body- including the html code.  I would have had to fix this anyway, but it became really apparent with the image issue.  Because the string of text representing an image was so long, it naturally contained matches to most strings I was searching (at least the simple ones people would start with).  It might not have mattered on a traditional request-response site- a person would type their whole search string then submit and avoid seeing the wrong results.  But in a live reload search, the problem was obvious.

My first thought was a terrible one: Why not store a plain text version of the blog body in the database?  That way, we just return it alongside the rest of the JSON data and use that for the search.  But that really doesn’t make sense- it’s just bloating our database tables with duplicate info and increasing the size of the JSON object returned on each request.

Then I remembered how cool Javascript can be- and that it comes with awesome built in array features like ‘map’.  So, why not return the data as we have been (with the body as HTML) and manipulate the body just within the search function?  We can search over that manipulated body, but display the original HTML.

Turns out that this works pretty well.  In our main blog component, we initialize a search array on a service:

this.blogViewService.searchArray = ['author', 'title', 'body_plain_text', 'category'];

This is a convention we’re using on a different project- the array contains the property names we want to search by (those property names appear within each object in the array we will be searching over). That searchArray is passed to our search service with some other info.  In this case, ‘body_plain_text’ doesn’t exist on the object- but we’re going to create it as we go.

For the search, we cheat a bit and use Angular’s built in form input subscription.  A form input field in Angular has an observable you can hook into to get the data as a user types (called valueChanges).  You can then subscribe and do any searching there.  All we need to do is make sure to transform each object a bit in the process:

this.blogViewService.searchSubscription = this.blogViewService.term.valueChanges
    .subscribe(result => {
        let itemsToSearch = this.blogViewService.originalItems
            .map(item => {
                item['body_plain_text'] = String(item['body']).replace(/<[^>]+>/gm, ' ');
                return item;
        //we pass to another service to do the actual filtering of originalItems here

We start with assigning a reference to this subscription (searchSubscription) so we can unsubscribe on the component’s destruction (to avoid memory leaks).  Then we hook into the form input observable (valueChanges).  I put a slight delay on the process at this point with debounceTime(200)- when the news/blog section gets big enough that returning it all up front doesn’t make sense, we will have to hit the database in this search.  debounceTime(timeHereInMs) is a great rxJS built in that handles debouncing your calls.

Finally, we get to the actual change- the originalItems array is mapped, but none of the properties within each object are actually changed.  Instead, we append a ‘body_plain_text’ property to each object that uses a regex to strip HTML tags.  Originally, it replaced with nothing, but then we had words joining together (if they were on the other side of tags), so replacing with a single blank space preserves the integrity of the search.  We never change the original ‘body’ property- this is where our HTML lives and is used for the actual display.

I’m sure there will be edge cases where this might not work and we have to tweak the process, but it’s a good start.  I also don’t think this is technically how you’re supposed to use .map- it’s a functional programming staple, and I’m using it to append a property to an object then return it back to the array.  Definitely not functional programming!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s