Full Unicode strings strawman

Allen Wirfs-Brock allen at wirfs-brock.com
Mon May 16 16:50:41 PDT 2011


On May 16, 2011, at 3:36 PM, Mark Davis ☕ wrote:

> > all defined Unicode characters.
> 
> That would also not be correct. The defined characters are only about 109K (more if you consider private use); nowhere near the number of code points, because there are over 800K code points that are reserved for the allocation of future characters. For a breakdown, see http://www.unicode.org/versions/Unicode6.0.0/#Character_Additions

Sorry about the terminology issues, I work on fixing them.

I actually think "character" is the right term for use in:

SourceCharcter ::
  any Unicode character

This is defining the alphabet of the grammar.  The alphabet is composed of logical characters, not specific encodings.  The actual program might be encoded in EBCDIC or Hollerith card codes as long as there is a mapping of the characters actually used in that encoding to Unicode characters.

The intent is that any defined Unicode character can be used.  That is the 109K but growing in the future as Unicode adopts additional characters.  In practice, there are actually very view places in the grammar when any SourceCharacter is allowed but in those places we really do me the valid logical characters defined by the current Unicode standard.

Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/1069e643/attachment.html>


More information about the es-discuss mailing list