Code points vs Unicode scalar values
brendan at mozilla.com
Wed Sep 4 08:58:36 PDT 2013
> Anne van Kesteren <mailto:annevk at annevk.nl>
> September 4, 2013 7:48 AM
> ES6 introduces String.prototype.codePointAt() and
> as well as an iterator (not defined). It struck
> me this is the only place in the platform where we'd expose code point
> as a concept to developers.
> or Unicode scalar values (anytime you hit the network and use utf-8).
> I'm not sure I'm a big fan of having all three concepts around.
You can't avoid it: UTF-8 is a transfer format that can be observed via
serialization. String.prototype.charCodeAt and String.fromCharCode are
required for backward compatibility. And ES6 wants to expose code points
as well, so three.
> could have String.prototype.unicodeAt() and String.unicodeFrom()
> instead, and have them translate lone surrogates into U+FFFD. Lone
> surrogates are a bug and I don't see a reason to expose them in more
> places than just the 16-bit code units.
Sorry, I missed this: how else (other than the charCodeAt/fromCharCode
legacy) are lone surrogates exposed?
More information about the es-discuss