Full Unicode strings strawman

Jungshik Shin (신정식, 申政湜) jungshik at google.com
Wed May 18 11:12:09 PDT 2011

On Tue, May 17, 2011 at 11:09 AM, Shawn Steele
<Shawn.Steele at microsoft.com>wrote:

> I would much prefer changing "UCS-2" to "UTF-16", thus formalizing that
> surrogate pairs are permitted.  That'd be very difficult to break any
> existing code and would still allow representation of everything reasonable
> in Unicode.
> That would enable Unicode, and allow extending string literals and regular
> expressions for convenience with the U+10FFFF style notation (which would be
> equivalent to the surrogate pair).  The character code manipulation
> functions could be similarly augmented without breaking anything (and maybe
> not needing different names?)
> You might want to qualify the UTF-16 as allowing, but strongly
> discouraging, lone surrogates for those people who didn't realize their
> binary data wasn't a string.
> The sole disadvantage would be that iterating through a string would
> require consideration of surrogates, same as today.  The same caution is
> also necessary to avoid splitting Ä (U+0041 U+0308) into its component A
> and   ̈ parts.  I wouldn't be opposed to some sort of helper functions or
> classes that aided in walking strings, preferably with options to walk the
> graphemes (or whatever), not just the surrogate pairs.  FWIW: we have such a
> helper for surrogates in .Net and "nobody uses them".  The most common
> feedback is that it's not that helpful because it doesn't deal with the
> graphemes.

Hmm... I proposed break iterators for 'character/grapheme', word, line and
sentence as a part of  i18n API, but it's "shot down" (at least for version
0.5). Are you open to adding them now ? Once this discussion is settled and
the proposal to support the full unicode range is in place, we can revisit
the issue.


> - Shawn
> Shawn.Steele at Microsoft.com
> Senior Software Design Engineer
> Microsoft Windows
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110518/52e4e2bf/attachment.html>

More information about the es-discuss mailing list