Non breaking spaces are breaking my balls

We’re still working on the search function.  The little preview text section seems to be coming through properly from the back end now, but I kept getting weird ‘character not found’ placeholders in the text.

You know- the graphic that shows when the browser doesn’t recognize the character.  It’s a ? inside a little black diamond.  In this case, it was the ‘ ’   – a non breaking space character.

I’d written a few regex checks in the php code to strip out certain things from the search results.  Anything inside style or script tags and any html code. Ok,fine – in truth, I hacked together these regexs in a terrible fashion.  There were angle brackets and backslash body parts all over my monitor.

Then, I threw the code sanitized results through a html_entity_decode call to make sure everything looks right.  But those stupid ? diamonds were still there.

Naturally, I thought something was wrong with my regex checks.  I wasted too long trying to rewrite them before I thought to check into the default html_entity_decode function.  Turns out, it doesn’t decode the   character.  Also turns out, I should have read the php official documentation a little more closely, as it states:

the ‘ ‘ entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.

I don’t really know what that means, except that it meant my function wasn’t working.  The nice part- the fix is easy, just do a str_replace to remove them, and make sure to do it in the right order!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s