Code points vs Unicode scalar values

Anne van Kesteren annevk at annevk.nl
Wed Sep 4 09:06:17 PDT 2013


On Wed, Sep 4, 2013 at 4:58 PM, Brendan Eich <brendan at mozilla.com> wrote:
> String.fromCodePoint, rather.

Oops. Any reason this is not just String.from() btw? Give the better
method a nice short name?


>> I'm not sure I'm a big fan of having all three concepts around.
>
> You can't avoid it: UTF-8 is a transfer format that can be observed via
> serialization.

Yes, but it cannot encode lone surrogates. It can only deal in Unicode
scalar values.


> String.prototype.charCodeAt and String.fromCharCode are
> required for backward compatibility. And ES6 wants to expose code points as
> well, so three.

Unicode scalar values are code points sans surrogates, i.e. completely
compatible with what a utf-8 encoder/decoder pair can handle.

Why do you want to expose surrogates?


> Sorry, I missed this: how else (other than the charCodeAt/fromCharCode
> legacy) are lone surrogates exposed?

"\udfff".codePointAt(0) == "\udfff"

It seems better if that returns "\ufffd", as you'd get with utf-8
(assuming it accepts code points as input rather than just Unicode
scalar values, in which case it'd throw).

The indexing of codePointAt() is also kind of sad as it just passes
through to charCodeAt(), which means for any serious usage you need to
use the iterator anyway. What's the reason codePointAt() exists?


-- 
http://annevankesteren.nl/


More information about the es-discuss mailing list