Re: Question about the “full Unicode in strings” strawman

Mark Davis ☕ mark at macchiato.com
Wed Jan 25 11:11:55 PST 2012


(oh, and I agree with your other points)

Mark
*— Il meglio è l’inimico del bene —*
*
*
*
[https://plus.google.com/114199149796022210033]
*



On Wed, Jan 25, 2012 at 11:11, Mark Davis ☕ <mark at macchiato.com> wrote:

> You can't use \u10FFFF as syntax, because that could be \u10FF followed by
> literal FF. A better syntax is \u{...}, with 1 to 6 digits, values from 0
> .. 10FFFF.
>
> Mark
> *— Il meglio è l’inimico del bene —*
> *
> *
> *
> [https://plus.google.com/114199149796022210033]
> *
>
>
>
> On Wed, Jan 25, 2012 at 10:59, Gillam, Richard <gillam at lab126.com> wrote:
>
>> > The current 16-bit character strings are sometimes uses to store
>> non-Unicode binary data and can be used with non-Unicode character encoding
>> with up to 16-bit chars.  21 bits is sufficient for Unicode but perhaps is
>> not enough for other useful encodings. 32-bit seems like a plausable unit.
>>
>> How would an eight-digit \u escape sequence work from an implementation
>> standpoint?  I'm assuming most implementations right now use 16-bit
>> unsigned values as the individual elements of a String.  If we allow
>> arbitrary 32-bit values to be placed into a String, how would you make that
>> work?  There seem to only be a few options:
>>
>> a) Change the implementation to use 32-bit units.
>>
>> b) Change the implementation to use either 32-bit units as needed, with
>> some sort of internal flag that specifies the unit size for an individual
>> string.
>>
>> c) Encode the 32-bit values somehow as a sequence of 16-bit values.
>>
>> If you want to allow full generality, it seems like you'd be stuck with
>> option a or option b.  Is there really enough value in doing this?
>>
>> If, on the other hand, the idea is just to make it easier to include
>> non-BMP Unicode characters in strings, you can accomplish this by making a
>> long \u sequence just be shorthand for the equivalent sequence in UTF-16:
>>  \u10ffff would be exactly equivalent to \udbff\udfff.  You don't have to
>> change the internal format of the string, the indexes of individual
>> characters stay the same, etc.
>>
>> --Rich Gillam
>>  Lab126
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120125/ff420ad5/attachment-0001.html>


More information about the es-discuss mailing list