Code points vs Unicode scalar values

Brendan Eich brendan at mozilla.com
Wed Sep 4 14:28:58 PDT 2013


Anne van Kesteren wrote:
>> Here's the spec for String.prototype.codePointAt:
>> >
>> >  8. Let first be the code unit value of the element at index position in the
>> >  String S.
>> >  11. If second<  0xDC00 or second>  0xDFFF, then return first.
>> >
>> >  I take it you are objecting to step 11?
>
> And step 8. The indexing is based on code units so you cannot actually
> do indexing easily. You'd need to use the iterator to iterate over a
> string getting only code points out.
>
>
>>> >>  The indexing of codePointAt() is also kind of sad as it just passes
>>> >>  through to charCodeAt(),
>> >
>> >  I don't see that in the spec cited above.
>
> How do you read step 8?

8. Let first be the code unit value of the element at index position in 
the String S.

This does not "[pass] through to charCodeAt()" literally, which would 
mean a call to S.charCodeAt(position). I thought that's what you meant.

So you want a code point index, not a code unit index. That would not be 
useful for the lower-level purposes Allen identified. Again it seems 
you're trying to abstract away from all the details that probably will 
matter for string hackers using these APIs. But I summon Norbert at this 
point!

/be


More information about the es-discuss mailing list