Re: Question about the “full Unicode in strings” strawman

Allen Wirfs-Brock allen at
Tue Jan 24 09:33:24 PST 2012

Note that this proposal isn't currently under consideration for inclusion in, but the answer to you question is below
On Jan 22, 2012, at 10:59 PM, Mathias Bynens wrote:

> states:
>> To address this issue, a new form ofUnicodeEscapeSequence is added that is explicitly tagged as containing var variable number (up to 8) of hex digits. The new definition is:
>> UnicodeEscapeSequence ::
>> u HexDigit HexDigit HexDigit HexDigit
>> u{ HexDigit HexDigitopt HexDigitopt HexDigitopt HexDigitopt HexDigitopt HexDigitopt HexDigitopt }
>> The \u{ } extended UnicodeEscapeSequence is a syntactic extension that is only recognized after explicit versioning opt-in to the extended “Harmony” syntax.
> Why up to 8 hex digits? Shouldn’t 6 hex digits suffice to represent
> every possible Unicode character (in the range from 0x0 to 0x10ffff)?
> Is this a typo or was this done intentionally to be future-compatible
> with potential Unicode additions?

Just as the current definition of string specifies that a String is a sequence of 16-bit unsigned integer values, the proposal would specify that a String is a sequence of 32-bit unsigned integer values.  In neither cause is it required that the individual String elements must be valid Unicode code point or code units. 8 hex digits are required to express a the full range of unsigned 32-bit integers.


More information about the es-discuss mailing list