New full Unicode for ES6 idea

Wes Garland wes at
Mon Feb 20 04:19:16 PST 2012

On 20 February 2012 00:45, Allen Wirfs-Brock <allen at> wrote:

> 2) Allow invalid unicode characters in strings, and preserve them over
> concatenation – ("\uD800" + "\uDC00").length == 2.

> I think 2) is the only reasonable alternative.

I think so, too -- especially as any sequence of Unicode code points --
including invalid and reserved code points -- constitutes a valid Unicode
string, according to my recollection of the Unicode specification.

In addition to the reasons you listed, it should also be noted that
- 2) is cheaper to implement
- 2) keeps more old code working; ignoring the examples where developers
use String as uint16[], there are also the cases where developers scan
strings for 0xD800. 0xD800 is a reserved code point.

I don't think 1) would be a very good choice, if for no other reason the
> set of valid unicode characters is a moving target that you wouldn't want
> to hardwire into either the ES specification or implementations.

To play the devil's advocate, I could point out that the spec language
could say something about reserved code points.  Those code points are
reserved because, IIRC, they are not representable in UTF-16; they include
the ranges for the surrogate pairs.


Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the es-discuss mailing list