ZF-6777: Is it completely necessary to encode UTF-8 to \uXXXX during JSON encoding, while JSON standart allows pure unicode strings and (all?) browsers supports them?

Description

Hi!

I'm not sure, this is a valid feature request, but I've spent almost a half day googling and failed to find an answer. If I am right, it would be very nice, if some one adds an easy and small option to Zend_Json_Encode to make this feature work.

So. As I understand, the JSON standard (json.org) allows a string to consist of unicode characters. Though, it looks like the standard doesn't specify exact unicode encoding type, it is normal (from my point of view) to expect, that during browser-server data transmissions (I think, it is a huge (if not the largest) class of JSON use-cases) "unicode" means utf-8, because it is common today for both sides (browser and server software) to provide a full utf-8 string-support.

If it is true, then why an UTF-8 strings gets encoded to an ugly bloated \uXXXX encoding, while both sides (client-side browser and server) support a normal utf-8 strings?

PHPs native function json_encode() produces \uXXXX by default and there is no way to alter this behavior. Till ZF1.8 Zend_Json_Encoder didn't mangle utf-8 and produced a normal result, but since 1.8 (or 1.8.1? I didn't catch that) it also makes a bloated \uXXXX strings :(

What are the benefits of 3-times bloated and very slow \uXXXX encoding over a plain small-simple-fast utf-8? Is \uXXXX encoding absolutely necessary to send data from an utf-8 php script to a "modern" browser (like ff2-3, ie 6-7, or Opera9.64 - I tested them all with an old-style Zend_Json_Encoder) ?

If it isn't necessary, can somebody add a small option to Zend_Json_Encoder class to leave Utf-8 as it is, without converting to \uXXXX? The question is related to ZF-4054, if its effect could be reverted.

Thanks for an answers...

Comments

Forget to say, I know about max 2-bytes per symbol limitation in utf-8 support in browsers. But I think, there are a lot of use-cases, when it is possible to guarantee that max 2-bytes utf-8 is used. And, by the way, \uXXXX encoding also supports only 2-bytes per symbol. I can't understand, what are the risks of using plain text utf-8?

I would also very much like to see such an option. Ascii is really on the way out, and UTF is more or less standard in most applications today. Although an exchange format, one have to read json from time to time. And with these escapings it's harder. As I've understood it a simple flag for turning off encodeUnicodeString() would do.

http://stackoverflow.com/questions/583562/…