Code points vs Unicode scalar values

Brendan Eich brendan at mozilla.com
Wed Sep 4 08:58:36 PDT 2013



> Anne van Kesteren <mailto:annevk at annevk.nl>
> September 4, 2013 7:48 AM
> ES6 introduces String.prototype.codePointAt() and
> String.codePointFrom()

String.fromCodePoint, rather.

> as well as an iterator (not defined). It struck
> me this is the only place in the platform where we'd expose code point
> as a concept to developers.
>
> Nowadays strings are either 16-bit code units (JavaScript, DOM, etc.)
> or Unicode scalar values (anytime you hit the network and use utf-8).
>
> I'm not sure I'm a big fan of having all three concepts around.

You can't avoid it: UTF-8 is a transfer format that can be observed via 
serialization. String.prototype.charCodeAt and String.fromCharCode are 
required for backward compatibility. And ES6 wants to expose code points 
as well, so three.

> We
> could have String.prototype.unicodeAt() and String.unicodeFrom()
> instead, and have them translate lone surrogates into U+FFFD. Lone
> surrogates are a bug and I don't see a reason to expose them in more
> places than just the 16-bit code units.

Sorry, I missed this: how else (other than the charCodeAt/fromCharCode 
legacy) are lone surrogates exposed?

/be
>
>


More information about the es-discuss mailing list