Code points vs Unicode scalar values

Anne van Kesteren annevk at annevk.nl
Wed Sep 4 12:51:43 PDT 2013


On Wed, Sep 4, 2013 at 5:34 PM, Brendan Eich <brendan at mozilla.com> wrote:
> Because of String.fromCharCode precedent. Balanced names with noun phrases
> that distinguish the "from" domains are better than longAndPortly vs. tiny.

I kinda liked it as analogue to what exists for Array and because
developers should probably move away from fromCharCode so the
precedent does not matter that much.


> Sure, but you wanted to reduce "three concepts" and I don't see how to do
> that. Most developers can ignore UTF-8, for sure.

The three concepts are: 16-bit code units, code points, and Unicode
scalar values. JavaScript, DOM, etc. deal with 16-bit code units.
utf-8 et al deal with Unicode scalar values. Nothing, apart from this
API, does code points at the moment.


> Probably I just misunderstood what you meant, and you were simply pointing
> out that lone surrogates arise only from legacy APIs?

No, they arise from this API.


> Here, from the latest ES6 draft, is 15.5.2.3 String.fromCodePoint (
> ...codePoints):
>
> No exposed surrogates here!

Mathias covered this.


> Here's the spec for String.prototype.codePointAt:
>
> 8. Let first be the code unit value of the element at index position in the
> String S.
> 11. If second < 0xDC00 or second > 0xDFFF, then return first.
>
> I take it you are objecting to step 11?

And step 8. The indexing is based on code units so you cannot actually
do indexing easily. You'd need to use the iterator to iterate over a
string getting only code points out.


>> The indexing of codePointAt() is also kind of sad as it just passes
>> through to charCodeAt(),
>
> I don't see that in the spec cited above.

How do you read step 8?


-- 
http://annevankesteren.nl/


More information about the es-discuss mailing list