Strippin’ Tags

A while back, I helped create a Django back end for an update of an existing website.  The website’s front end was AngularJs, and we didn’t really get how the two should integrate.  The result?  A bit of a mess.  Django template views inside Angular html template files.  We had to modify the bracket style used for Angular (as it conflicts with Django’s).  The folder structure was confusing, the code was confusing- it worked, but it wasn’t a great solution.

Now I understand what we should have done: use Django to create an API for the Angular front end.  Just return that sweet, sweet JSON to the app and have Angular do the templating.  We are in the process of this update (as well as migrating this one from AngularJS to Angular the Next Generation, and I ran into an interesting project.

The original site had a Django-driven blog.  This was not integrated into the main AngularJS app in any way- a user clicked the ‘news’ link and they were taken to a completely new page with a traditional Django blog setup.  It made creation and updating easy- we just used Django’s admin panel for new posts, and the default templating views to handle any sorting (by category, date, etc).

But it doesn’t really fit.  Having the blog as a module within the Angular (now to be v4.2.4) application makes more sense.  With our new API approach it will work, but it will take some extra work.

One aspect is the admin panel.  We won’t be using the built in Django admin panel- instead, we’ve created an Angular-driven admin panel.  Converting the data returned by Django’s ORM to JSON was a bit tricky at first, but it’s flowing smoothly at this point (might cover that in a future post, as it was a bit of a process).

Another aspect is searching through blog posts (on the user interface), and that’s where we come to the cool project of the week.  One of the benefits of doing this extra work is getting the hip “instant updates” feel of a single page app within our blog’s UI.  When someone types in the search bar, they see the blog list below filter immediately.

But I noticed some test posts were coming up in almost all search results.  They happened to be the test posts with images in them.  The reason?  We are using an HTML editor toolbar for the admin area.  When a site admin posts a new blog, they use this toolbar to format text, add links, or post images (not everyone posting will be a developer or have source code access).

The toolbar has a cool feature where it encodes any images uploaded to base-64 format.  According to Wikipedia,  “Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation.

I don’t really know what that means, but I do know it means I don’t have to set up an ‘images’ folder for my blog module and save a path to those images with a relation to the blog post in my database.  When it converts to base-64, it gives me a string of text that a modern browser can translate into an image.  That image src can be saved right along with the rest of the HTML body.

Very helpful – but back to the search feature.  It turns out that the search was iterating over everything stored in the blog’s body- including the html code.  I would have had to fix this anyway, but it became really apparent with the image issue.  Because the string of text representing an image was so long, it naturally contained matches to most strings I was searching (at least the simple ones people would start with).  It might not have mattered on a traditional request-response site- a person would type their whole search string then submit and avoid seeing the wrong results.  But in a live reload search, the problem was obvious.

My first thought was a terrible one: Why not store a plain text version of the blog body in the database?  That way, we just return it alongside the rest of the JSON data and use that for the search.  But that really doesn’t make sense- it’s just bloating our database tables with duplicate info and increasing the size of the JSON object returned on each request.

Then I remembered how cool Javascript can be- and that it comes with awesome built in array features like ‘map’.  So, why not return the data as we have been (with the body as HTML) and manipulate the body just within the search function?  We can search over that manipulated body, but display the original HTML.

Turns out that this works pretty well.  In our main blog component, we initialize a search array on a service:

this.blogViewService.searchArray = ['author', 'title', 'body_plain_text', 'category'];

This is a convention we’re using on a different project- the array contains the property names we want to search by (those property names appear within each object in the array we will be searching over). That searchArray is passed to our search service with some other info.  In this case, ‘body_plain_text’ doesn’t exist on the object- but we’re going to create it as we go.

For the search, we cheat a bit and use Angular’s built in form input subscription.  A form input field in Angular has an observable you can hook into to get the data as a user types (called valueChanges).  You can then subscribe and do any searching there.  All we need to do is make sure to transform each object a bit in the process:

this.blogViewService.searchSubscription = this.blogViewService.term.valueChanges
    .debounceTime(200)
    .subscribe(result => {
        let itemsToSearch = this.blogViewService.originalItems
            .map(item => {
                item['body_plain_text'] = String(item['body']).replace(/<[^>]+>/gm, ' ');
                return item;
            });
        //we pass to another service to do the actual filtering of originalItems here
    });

We start with assigning a reference to this subscription (searchSubscription) so we can unsubscribe on the component’s destruction (to avoid memory leaks).  Then we hook into the form input observable (valueChanges).  I put a slight delay on the process at this point with debounceTime(200)- when the news/blog section gets big enough that returning it all up front doesn’t make sense, we will have to hit the database in this search.  debounceTime(timeHereInMs) is a great rxJS built in that handles debouncing your calls.

Finally, we get to the actual change- the originalItems array is mapped, but none of the properties within each object are actually changed.  Instead, we append a ‘body_plain_text’ property to each object that uses a regex to strip HTML tags.  Originally, it replaced with nothing, but then we had words joining together (if they were on the other side of tags), so replacing with a single blank space preserves the integrity of the search.  We never change the original ‘body’ property- this is where our HTML lives and is used for the actual display.

I’m sure there will be edge cases where this might not work and we have to tweak the process, but it’s a good start.  I also don’t think this is technically how you’re supposed to use .map- it’s a functional programming staple, and I’m using it to append a property to an object then return it back to the array.  Definitely not functional programming!

Advertisements

The Formatting Dance is Your Chance to Write Some Bugs

I love Javascript- but formatting numbers can be tricky.

Take money for example.  The Angular tracking application we’re working on has a grid to display data- and some of the columns contain dollar totals.  So, in our ngFor loop, we can just output the number in most cases.  You could even tack a dollar sign on the front: ${{data.money}}.  Works great if the amount is 19.99, but there are no trailing zeros allowed in numbers in JS, so if the amount is 20.00, the last two zeros are trimmed, leaving you with 20.

Not a big deal, but not exactly what people expect to see either- making the grid easy to scan and reducing confusion is very important in this app.  Not a problem- Angular comes with pipes.  It even has a built in pipe for currency.  Just pass your data, a ‘pipe’ character, and some config options in your template and you’re good to go:

{{data.money | currency:’USD’:true:’1.2-2′}}

That would tell your template it’s US dollars, show the $ character, and display at least 1 number to the left of the decimal and at least 2 but no more than 2 numbers to the right (this will round your number so be careful!).

Great!  Now it’s displaying as one would expect.  But we also have a search function on the grid.  Type in the input box and see your grid live update- thanks Angular magic!  But with a pipe, the data being displayed doesn’t match the data being searched through.  What was a number (5) is now a string (‘5.00’).  Search for ‘5’ and it works great, but a user may see the rest of the string and try searching for ‘5.00’- which will not work in this case.  No results found and a confused user.

Luckily, the awesome folks at Angular gave us an option for that as well.  Pipes can be used in a component instead of in a template.  Just import the pipe(s) you want to use at the top of your component file, inject them in your controller, and you can pipe away right in your methods.  It’s not quite as easy as just putting the logic in the template, but is a great option in this case.  We created a formatDataForSearch method:

formatDataForSearch(data) {
    data.map(d => {
        //loop through and convert specific items that need it- others remain as-is
        d.rate = this.numberPipe.transform(d.rate, '1.4-4');
        d.originalCost = this.numberPipe.transform(d.originalCost, '1.2-2');
        d.costBasis = this.numberPipe.transform(d.costBasis, '1.2-2');
    });
}

Works great- the data is formatted in the array that will be passed to our search function and everyone’s happy.  A user can type anything they see in the grid and search will find it, but the actual info stored in the database is the correct number type.

But wait- there’s one more problem.  We also have a sort function on the grid.  Click any table header and the grid live-sorts using that data.  I’ve chronicled my journey from buggy, terrible sort to less buggy, mediocre sort in a few blog posts, but the actual function is at a point where it works pretty well.  However, a string sort does not do the same thing as a number sort.  Pass [100, 2] to a sorting function and you’ll get [2, 100] (assuming an ascending order sort).  Pass [‘100’, ‘2’] to a sorting function and you’ll get [‘100’, ‘2’].  Why?  According to MDN’s entry on Array.prototype.sort(), “The default sort order is according to string Unicode code points”.

This means two things for our use case
1) If you don’t pass any comparison function to .sort, you’re probably ok for strings, but will not get the correct order for numbers.  Be sure to pass at least a simple comparison (a > b) will do for simple numbers.
2) Even with #1 covered, we will need to convert the data back to a number.  Otherwise, 100 will come before 2 in our grid and chaos reigns!

So, one more edit to our orderby.service.ts (which houses our sorting logic) was in order (ha).  And it gave me one of the very few legitimate (I think) use cases for == I’ve had in a couple years of writing Javascript.  We added a check in our sorting method’s data formatting logic:

if(a[param] == +a[param]) {
    first = +a[param];
    second = +b[param];
}

The first and second variables are initialized a little further up in the code- this conditional comes in a block checking the type of the data being sorted.  a and b are the things being compared in each loop of the .sort function (objects- with the ‘param’ variable being the title text of the clicked column in the grid).  So, if a comparison of the data in the sort column and that same data converted to a number (the ‘+’ operator is just a quick way of converting a number to a string) is truthy, we convert both back to numbers and compare.  The original display remains a string, so we keep those padded zeros in $5.00, but the grid sorts properly.

I’ve always avoided the double equals check, as it doesn’t really do a true comparison.  It does some type coercion and can give you unexpected results (0 == false, for example).  It’s generally safer to use the strict comparison operator (===) but type coercion does seem to have it’s use case.

What started off as a simple edit request in my job queue (display this number data in this specific way) became a rabbit hole of conversions and back again. To be honest, it was pretty fun.  And the journey wasn’t quite over.  I was feeling pretty good after this victory (however small), when I got an update on this job: “The current shares column should be formatted with commas.  For example: 2,004,100.00.”

Next week- the saga continues!  I can use the currency pipe to convert and add the commas, but those commas are going to break my nice x == +x check in the sort function.  Spoiler: my solution is ugly, but it does work- stay tuned!

Still haven’t found what I’m searching for

Just a quick TIL post this week.  Don’t worry, I’ve got another rambler lined up for next week involving converting data for display then to another form for sorting, then back again- what a ride!

I thought the search function was complete.  It had a bit of complexity to format the content being searched correctly, but the ‘guts’ were fairly simple, using Javascript’s built in string.search method to identify a the submitted substring within a larger object’s group of properties.  This is for an Angular application, so throw it all in an observable and watch your grid/list filter as you type!

Side note: we are only searching through content that’s already been loaded from the server, so it’s nice and fast, but if we were hitting the server on search, there’s a great rxJs operator- debounceTime.  Pass it a number and it will wait that period (in milliseconds) after your event before actually firing the action.  Basically, chain it to your subscription before your .subscribe operator and you’re good to go!  Super simple client side rate limiting!

Back on topic: the search worked, except when I tested with a special character.  The dollar symbol (“$”) in this case.  The grid had a column where this symbol would regularly appear in the text, but each time I tried to search for “MM$vx”, I would get no results- even if that text appeared in my grid.

Turns out- string.search doesn’t appear to work with some (or maybe all?) special characters inside a string.  I did a quick Google search as to why, but didn’t come up with any specific reasons for this so I might be totally wrong, but I did test this out in an awesome little tool at repl.it – it’s great for quickly checking if code works like you think it should.


const test = 'U$11M'.toLowerCase();
const nope = test.search('u$11m');
const works = test.includes('u$11m');
console.log(nope); // -1 not found
console.log(works); // true
console.log(test, 'u$11m', test === 'u$11m'); //same string

Interactive version at https://repl.it/H5AJ/2

You’ll see the solution in that snippet too- switch to string.includes.  That method doesn’t have any issues with any symbol I tested so far, so that’s what we’re using now.  I’ll be honest- I didn’t even know string had an “includes” method- that’s why we were using search!

One thing to note: string.includes returns boolean, while string.search returns the index of your substring, so be sure to do your checks for the proper thing (particularly when using the all powerful === check).

Filter Fridays

Performance is a huge aspect of Angular 2.  It seems that one of the main complaints about version 1 was the performance.  While performance is important, it’s also important to remember that this team created a framework that has to constantly check your app, monitoring for changes, in order to maintain bindings and data properly.  I compare it to adding a function to the onscroll event – it’s going to fire on every single pixel scroll of a user’s screen.  Not efficient at all, but sometimes necessary for the effect/functionality you want.

So some overhead is to be expected- but with version 2, they did seem to take into account the criticisms of 1’s performance.  One result- they removed the built in sort/filter pipes from ngFor.  While those pipes were very useful, they also caused some performance issues (as well as adding to the size of the framework- another complaint being addressed in version 2).

The upshot is, if you want to sort or filter you list items, you have to create a service for it yourself.  Which was actually pretty fun.

First up: search/filter.  It seemed to work well as a service instead of a pipe, so that’s the direction we took.  The input field is just a simple text input with a binding to an ngFormControl element.  By using this built in element, we can hijack the Observable that comes with it for some cool shortcuts.

So, as a user types, the list is filtered according to the key they press.  Done in a service, we set up a function to take the item (the string being searched for), group (the greater json array being searched through), and an optional property (allowing search by name or by email).  That function returns a promise- this promise uses the array.filter function that loops over each letter to check for a match.  First, though, we convert everything to lowercase (seemed like a case insensitive search would be best):

return new Promise(
    function (resolve, reject) {
        var filtered = group.filter((g) => {
        var lowerItem = item.toLowerCase();
        var lowerProp = g[property].toLowerCase();

        if (lowerProp.search(lowerItem) !== -1) return g[property];
    });
    resolve(filtered);
});

It would also work to regex search the string, but this seemed easier for now (than creating a regex with a variable input).  This gets passed back to that ngFormControl aspect.  We can use the valueChanges built in check- this will give us an Observable to subscribe to.  We can then pass that subscription on to the filter service function, and we are checking on each keystroke.  If there’s a match, the list is filtered.  If not, no results show (until the input is cleared, then all listings are back).

this.term.valueChanges
  .debounceTime(400)
  .subscribe(term => this._searchService.findItems(term, this.originalContacts, this.filterParam)
  .then((filteredContacts) => {
    if (term.length > 0) {
      this.contacts = filteredContacts;
    } else {
      this.contacts = this.originalContacts;
    }
}));

There’s one more aspect that I thought was really cool.  In the original example of bad performance sometimes being necessary, I used the onscroll event.  But there’s a way to mitigate that performance hit: a debounce function.  Essentially, this puts a delay on a function, so instead of firing on every single pixel scroll, it will fire, then ‘sleep’ for a set amount of scroll lines, then fire again.  This prevents unnecessary function calls.

Angular 2 Observables (like valueChanges on the form module) come with a built in debounce option (that’s the debounceTime chained in the above code).  Just pop it on and give it the number of milliseconds to wait before firing again.  So, this function will wait 400ms after the user stops typing before actually firing.

Super useful stuff- thanks Angular team!  Next week: the orderBy pipe reborn as a service!

Non breaking spaces are breaking my balls

We’re still working on the search function.  The little preview text section seems to be coming through properly from the back end now, but I kept getting weird ‘character not found’ placeholders in the text.

You know- the graphic that shows when the browser doesn’t recognize the character.  It’s a ? inside a little black diamond.  In this case, it was the ‘&nbsp;’   – a non breaking space character.

I’d written a few regex checks in the php code to strip out certain things from the search results.  Anything inside style or script tags and any html code. Ok,fine – in truth, I hacked together these regexs in a terrible fashion.  There were angle brackets and backslash body parts all over my monitor.

Then, I threw the code sanitized results through a html_entity_decode call to make sure everything looks right.  But those stupid ? diamonds were still there.

Naturally, I thought something was wrong with my regex checks.  I wasted too long trying to rewrite them before I thought to check into the default html_entity_decode function.  Turns out, it doesn’t decode the   character.  Also turns out, I should have read the php official documentation a little more closely, as it states:

the ‘ ‘ entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.

I don’t really know what that means, except that it meant my function wasn’t working.  The nice part- the fix is easy, just do a str_replace to remove them, and make sure to do it in the right order!

Where were we?

That’s right- the search feature.

After a bit of a delay, we’re back to work.  The current issue: adding the placement of the search button.  We wanted 3 options- in the header, in the footer, and in the left menu.  This was complicated by the fact that there is a layout option that doesn’t include the left menu (the horizontal menu).  In the end, we decided that if the user has the horizontal menu enabled, and they try to select the left menu placement for the search button, it will simply default back to the header.  Should eliminate any confusion (probably not).

The database tables are updated, the php conditionals are in place- the location of the search button is now customizable by the user.  But it still doesn’t really work very well.  Currently, the function just uses a simple sql ‘Like’ clause with the search keyword(s) inserted in dynamically.  But it returns the whole text area where the keyword is found.  We’re trying to find a way to limit it to a certain number of words- should be a simple matter of getting x number of characters before and after the keyword, and then finding the nearest space to cut it off at a word.  Actually, as I typed that, I realized there is a better way.  First, check to see if the keyword appears in the first sentence- if so, return that sentence.  If not, grab back to the previous punctuation and forward to the next punctuation (or end of the area) and display that.

Time to try it out!

On another note- I’ve started experimenting with Django.  It seems really cool- the model to database table relationship seems quite intuitive, and the templating system has been fairly easy to use so far.  The difficult area (for me) has been the url mapping.  It looks like once it’s set up, it is a great feature, but figuring out the regex based system is not going well for me.

Searching

I find that I get sidetracked a bit too easily.

I should be working on an upgrade of the recordkeeping feature.  As stated before, we’re looking to add a lot more functionality, as well as a nicer user experience, and it’s all likely going to be pretty difficult with the existing code base.

But I thought we might start with a simple facelift on the search feature.  The current interface is clunky and confusing- opening results in an ugly new window on top of the site, fullscreen.  Clicking any result in the listing will reload the page behind the search screen- so in effect, it looks like it’s doing nothing at all.

So we started sprucing it up a bit- open in a small modal, ajax requests for the results instead of leaving the page, term highlighting, and more.  But the results were off.  Searches were returning pages that didn’t have the term- why?  Turns out the search function was also searching any embedded html code.  Search for ‘document’, and you get a listing of every page with a jquery function embedded.  Search for ‘style’ and get every page with custom css inlined (bad practice, I know, but having a wysiwyg toolbar is a necessity).

How did the original developer get around this?  Simple- they didn’t.  The search results just returned a list of pages- no preview text.  Users would see the list and click on any link to a page, but there was no guarantee that their keyword would actually be on that page (it could just be a term in the source code somewhere).

So that’s been the bulk of the work so far- sanitizing the search results from the db to remove any code.  That way, a user only sees relevant results.  I naively searched for a way to only get non-code text from the db using some SQL magic, but (as anyone with a tiny bit of knowledge should know), that doesn’t seem to be possible.  So we’re reduced to getting all the results, then using some PHP regex checks to remove stuff that looks like code (anything between <script> or <style> tags, for starters).  The final loop over the results in the PHP file checks if the row includes the search keyword after the checks for code- if not, it reduces the count by 1 in order to keep the ‘total results’ number correct.

Is this the best or most efficient way to do it?  Probably not, but it works for now.  And we still have a lot of work to do before getting back on the recordkeeping track!