New full Unicode for ES6 idea
wes at page.ca
Tue Feb 21 05:29:17 PST 2012
On 21 February 2012 00:03, Brendan Eich <brendan at mozilla.com> wrote:
> These are byte-based enodings, no? What is the problem inflating them by
> zero extension to 16 bits now (or 21 bits in the future)? You can't make an
> invalid Unicode character from a byte value.
One of my examples, GB 18030, is a four-byte encoding and a Chinese
government standard. It is a mapping onto Unicode, but this mapping is
table-driven rather than algorithm driven like the UTF-* transport
formats. To provide a single example, Unicode 0x2259 maps onto GB 18030
You're right about Big5 being byte-oriented, maybe this was a bad example,
although it is a double-byte charset. It works by putting ASCII down low
making bytes above 0x7f escapes into code pages dereferenced by the next
byte. Each code point is encoded with one or two bytes, never more. If I
were developing with Big5 in JS, I would store the byte stream 4a 4b d8 00
c1 c2 4c as 004a 004b d800 c1c2 004c. This would allow me to use JS
regular expressions and so on.
Anyway, Big5 punned into JS strings (via a C or C++ API?) is *not* a strong
> use-case for ignoring invalid characters.
Agreed - I'm stretching to see if I can stretch far enough to find a real
problem with BRS -- because I really want it.
But the data does not need to arrive from C API -- it could easily be
delivered by an XHR request where, say, the remote end dumps database rows
into a transport format based around evaluating JS string literals (like
Ball one. :-P
If I hit the batter, does he get to first base?
We still haven't talked about equality and normalization, I suppose that
Wesley W. Garland
Director, Product Development
+1 613 542 2787 x 102
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss