Full Unicode strings strawman

Allen Wirfs-Brock allen at wirfs-brock.com
Mon May 16 15:27:08 PDT 2011


See the section of the proposal about String.prototype.charCodeAt

On May 16, 2011, at 2:20 PM, Mike Samuel wrote:

> Allen, could you clarify something.
> 
> When the strawman says without mentioning "codepoint"
> 
> "The String type is the set of all finite ordered sequences of zero or
> more 16-bit\b\b\b\b\b\b 21-bit unsigned integer values (“elements”)."
> 
> does that mean that String.charCodeAt(...) can return any value in the
> range [0, 1 << 21)?
> 
> 
> When the strawman says using "codepoint"
> 
> "SourceCharacter ::
> any Unicode codepoint"
> 
> that excludes the blocks reserved for surrogates?

Does the Unicode spec. refer to those surrogate codes as "codepoints"?  My understanding is that it does not, but I could be wrong.  My intent is that the answer is no.

Note that this section is defining the input alphabet of the grammar that .  It has nothing to do this the actual character encodings used for source programs.  The production essentially says that the input alphabet of ECMAScript is all defined Unicode characters.  The actual encoding of source programs (bother external and internal) is up to the implementation and the host environment.  (the string input to eval is an exception to this).

Allen










More information about the es-discuss mailing list