UTF-8 issues in wordpress with update_post_meta and json_encode

published on

April 11th, 2012 and tagged with

wordpress logo PHP sucks. No question about it. Read more on @eevee's blog. Wordpress tries and tries to get around that fact by escaping and converting values endlessly just to make sure the programmer isn't storing anything stupid. Today it shone through for real. I wanted to store a set of values in a single DB-row, and thought.. Hey.. JSON might be a good format for that. The values I wanted to store was image paths, with international characters, that went well...

So my entire setup is UTF8:

But all string functions in PHP work in ISO8859-1 except json_encode and json_decode which works with UTF-8 strings. No big issue.

  1. Take the value from post
  2. Validate
  3. Sanitize
  4. json_encode
    So now the sanitized non-ascii UTF-8-characters are encoded like “\u00e5″ by json_encode. All fine and dandy and should be storable everywhere right?
  5. update_post_meta

Except, update_post_meta from WordPress does it’s own validation and sanitizing. So “\u00e5″ gets converted to “u00e5″ which of course is impossible to restore using json_decode, and get_post_meta does not really reverse it’s counter parts process since it has no way of knowing where to put in the backslash for the unicode character.

What I naively did from the start was:

Saving:

$value = json_encode(sanitize($_POST['value']));
update_post_meta($post_id, 'value', $value);

Reading:

$value = get_post_meta($post->ID,'value', true););
$value = json_decode(desanitize($value[0]));

And here is the ugly hack that I ended up with in the end:

Saving:

$value = json_encode(htmlentities(utf8_decode(sanitize($_POST['value']))));
update_post_meta($post_id, 'value', $value);

Reading:

$value = get_post_meta($post->ID,'value', true););
$value = desanitize(utf8_encode(html_entity_decode(json_decode($value[0]));

I did a fair amount of googling and testing before reaching to this conclusion so I’m putting it out there for peoples perusal. The sanitize and desanitize-functions are my own close-sourced functions for dealing with unsafe text.

Comments are closed.