UTF-16 Strings not-strawman

Mike Samuel mikesamuel at gmail.com
Thu May 19 13:42:30 PDT 2011


2011/5/19 Shawn Steele <Shawn.Steele at microsoft.com>:
> I don’t have time to make a real strawman, but what would people need if we
> went the UTF-16 route (instead of full-Unicode)?  (This thread is to collect
> requirements, which are somewhat getting lost in the merits of UTF-16 vs 32
> bit thread).  Basically, just replace UCS-2 with UTF-16, allowing irregular
> UTF-16 for compatibility.
>
>
>
> Things that come to mind immediately are:
>
> ·         Some sort of convenience notation for string literals and regular
> expressions.

> ·         Extend string.fromCharCode() to allow generating UTF-16 pairs for
> values 10000-10ffff.

+1

> ·         Something to allow values 10000-10ffff from string.charCodeAt.  I
> assume it’d have to be new function.

+1 for new function


> ·         Make encodeURIcomponent and decodeURIcomponent use UTF-8 instead
> of CESU-8.  (The current behavior actually breaks the specifications because
> CESU-8 as generated != UTF-8 as defined, but I’m not sure the bug can be
> fixed.)  So either fix the bug (probably too breaking?) or make at least a
> new “correctlyEncodeURIcomponent”.  (I don’t think decoding is breaking).

+1


> Things I’m less certain about:
>
> ·         There is apparently some desire to walk a string by +=1 or +=2
> depending on if it’s a surrogate pair or not.  I’m not sure it’s worth
> formalizing, as, to me, it’s more interesting to walk it by graphemes or
> other more appropriate text elements.  And most applications don’t seem to
> care much about whether they break strings.

Such a thing would make it marginally easier to write escaping
functions for a few languages:

    "\ud800\udc00" -> "&#x10000;"

but instead of putting it in A UTF-16 strawman, we could just keep it
in mind as a criterion for judging any string related stuff in the
loop/iterators/enumeration strawmen.

> ·         A strict mode that disallows the irregular UTF-16?

I think this can be best left to JSLint.


> - Shawn
>
>
>
>  
>
> http://blogs.msdn.com/shawnste
>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>


More information about the es-discuss mailing list