Full Unicode strings strawman
brendan at mozilla.com
Tue May 17 10:05:23 PDT 2011
On May 16, 2011, at 8:13 PM, Allen Wirfs-Brock wrote:
> I think it does. In another reply I also mentioned the possibility of tagging in a JS visible manner strings that have gone through a known encoding process.
Saw that, seems helpful. Want to spec it?
> If the strings you are combining from different sources have not been canonicalize to a common encoding then you better be damn care how you combine them.
Programmers miss this as you note, so arguably things are not much worse, at best no worse, with your proposal.
Your strawman does change the game, though, hence the global or cross-cutting (non-modular) concern. I'm warm to it, after digesting. It's about time we get past the 90's!
> The DOM seems seems to canonicalize to UTF-16 (with some slop WRT invalid encoding that Boris and others have pointed out). I don't about other sources such as XMLHttpRequest or the file APIs. However, in the long run JS in the browser is going to have to be able to deal with arbitrary encodings. You can hide such things form many programmers but not all. After all, people actually have to implement transcoders.
Transcoding to some canonical Unicode representation is often done by the browser upstream of JS, and that's a good thing. Declarative specification by authors, implementation by relative-few browser i18n gurus, sparing the many JS devs the need to worry. This is good, I claim.
That it means JS hackers are careless about Unicode is inevitable, and there are other reasons for that condition anyway. At least with your strawman there will be full Unicode flowing through JS and back into the DOM and layout.
More information about the es-discuss