Full Unicode strings strawman

Mark Davis ☕ mark at macchiato.com
Mon May 16 15:06:08 PDT 2011


In practice, the supplemental code points don't really cause problems in
Unicode strings. Most implementations just treat them as if they were
unassigned. The only important issue is that *when* they are converted to
UTF-xx for storage or transmission, they need to be handled; typically by
converting to FFFD (never just deleted - a bad idea for security).

Mark

*— Il meglio è l’inimico del bene —*


On Mon, May 16, 2011 at 14:46, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> On 5/16/11 5:16 PM, Mike Samuel wrote:
>
>> The strawman says
>>
>> "The String type is the set of all finite ordered sequences of zero or
>> more 21-bit unsigned integer values (“elements”)."
>>
>
> Yeah, that's not the same thing as an actual Unicode string, and requires
> handling of all sorts of "what if someone sticks non-Unicode in there?"
> issues...
>
> Of course people actually do use JS strings as immutable arrays of 16-bit
> unsigned integers right now (not just as byte arrays), so I suspect that we
> can't easily exclude the surrogate ranges from "strings" without breaking
> existing content...
>
>
> -Boris
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/4c959d44/attachment.html>


More information about the es-discuss mailing list