Full Unicode strings strawman

Mark Davis ☕ mark at macchiato.com
Mon May 16 15:36:01 PDT 2011


Mark

*— Il meglio è l’inimico del bene —*


On Mon, May 16, 2011 at 15:27, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:

> See the section of the proposal about String.prototype.charCodeAt
>
> On May 16, 2011, at 2:20 PM, Mike Samuel wrote:
>
> > Allen, could you clarify something.
> >
> > When the strawman says without mentioning "codepoint"
> >
> > "The String type is the set of all finite ordered sequences of zero or
> > more 16-bit\b\b\b\b\b\b 21-bit unsigned integer values (“elements”)."
> >
> > does that mean that String.charCodeAt(...) can return any value in the
> > range [0, 1 << 21)?
> >
> >
> > When the strawman says using "codepoint"
> >
> > "SourceCharacter ::
> > any Unicode codepoint"
> >
> > that excludes the blocks reserved for surrogates?
>
> Does the Unicode spec. refer to those surrogate codes as "codepoints"?  My
> understanding is that it does not, but I could be wrong.  My intent is that
> the answer is no.
>

Yes, it does. See my message, with a pointer to the Unicode glossary.


>
> Note that this section is defining the input alphabet of the grammar that .
>  It has nothing to do this the actual character encodings used for source
> programs.  The production essentially says that the input alphabet of
> ECMAScript is all defined Unicode characters.


> all defined Unicode characters.

That would also not be correct. The defined *characters* are only about 109K
(more if you consider private use); nowhere near the number of code points,
because there are over 800K code points that are reserved for the allocation
of *future* characters. For a breakdown, see
http://www.unicode.org/versions/Unicode6.0.0/#Character_Additions

Sorry to seem picky, but we have found over time that you have to be very
careful about the use of terms. The term "character" is especially fraught
with ambiguities.



>  The actual encoding of source programs (bother external and internal) is
> up to the implementation and the host environment.  (the string input to
> eval is an exception to this).
>
> Allen
>
>
>
>
>
>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/98ab6f6c/attachment.html>


More information about the es-discuss mailing list