Code points vs Unicode scalar values
Anne van Kesteren
annevk at annevk.nl
Wed Sep 4 09:06:17 PDT 2013
On Wed, Sep 4, 2013 at 4:58 PM, Brendan Eich <brendan at mozilla.com> wrote:
> String.fromCodePoint, rather.
Oops. Any reason this is not just String.from() btw? Give the better
method a nice short name?
>> I'm not sure I'm a big fan of having all three concepts around.
> You can't avoid it: UTF-8 is a transfer format that can be observed via
Yes, but it cannot encode lone surrogates. It can only deal in Unicode
> String.prototype.charCodeAt and String.fromCharCode are
> required for backward compatibility. And ES6 wants to expose code points as
> well, so three.
Unicode scalar values are code points sans surrogates, i.e. completely
compatible with what a utf-8 encoder/decoder pair can handle.
Why do you want to expose surrogates?
> Sorry, I missed this: how else (other than the charCodeAt/fromCharCode
> legacy) are lone surrogates exposed?
"\udfff".codePointAt(0) == "\udfff"
It seems better if that returns "\ufffd", as you'd get with utf-8
(assuming it accepts code points as input rather than just Unicode
scalar values, in which case it'd throw).
The indexing of codePointAt() is also kind of sad as it just passes
through to charCodeAt(), which means for any serious usage you need to
use the iterator anyway. What's the reason codePointAt() exists?
More information about the es-discuss