Searching

I find that I get sidetracked a bit too easily.

I should be working on an upgrade of the recordkeeping feature.  As stated before, we’re looking to add a lot more functionality, as well as a nicer user experience, and it’s all likely going to be pretty difficult with the existing code base.

But I thought we might start with a simple facelift on the search feature.  The current interface is clunky and confusing- opening results in an ugly new window on top of the site, fullscreen.  Clicking any result in the listing will reload the page behind the search screen- so in effect, it looks like it’s doing nothing at all.

So we started sprucing it up a bit- open in a small modal, ajax requests for the results instead of leaving the page, term highlighting, and more.  But the results were off.  Searches were returning pages that didn’t have the term- why?  Turns out the search function was also searching any embedded html code.  Search for ‘document’, and you get a listing of every page with a jquery function embedded.  Search for ‘style’ and get every page with custom css inlined (bad practice, I know, but having a wysiwyg toolbar is a necessity).

How did the original developer get around this?  Simple- they didn’t.  The search results just returned a list of pages- no preview text.  Users would see the list and click on any link to a page, but there was no guarantee that their keyword would actually be on that page (it could just be a term in the source code somewhere).

So that’s been the bulk of the work so far- sanitizing the search results from the db to remove any code.  That way, a user only sees relevant results.  I naively searched for a way to only get non-code text from the db using some SQL magic, but (as anyone with a tiny bit of knowledge should know), that doesn’t seem to be possible.  So we’re reduced to getting all the results, then using some PHP regex checks to remove stuff that looks like code (anything between <script> or <style> tags, for starters).  The final loop over the results in the PHP file checks if the row includes the search keyword after the checks for code- if not, it reduces the count by 1 in order to keep the ‘total results’ number correct.

Is this the best or most efficient way to do it?  Probably not, but it works for now.  And we still have a lot of work to do before getting back on the recordkeeping track!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s