Full Unicode strings strawman

Boris Zbarsky bzbarsky at MIT.EDU
Tue May 17 11:51:42 PDT 2011


On 5/17/11 2:24 PM, Allen Wirfs-Brock wrote:
>> In the substance of having strings in different encodings around at
>> the same time. If that doesn't force developers to worry about
>> encodings, what does, exactly?
>
> This already occurs in JS. For example, the encodeURI function produces
> a string whose character are the UTF-8 encoding of a UTF-16 string
> (including recognition of surrogate pairs).

Last I checked, encodeURI output a pure ASCII string.  Am I just missing 
something?  The ASCII string happens to be the %-escaping of the UTF-8 
representation of the Unicode string you get by assuming that the 
initial JS string is a UTF-16 representation of said Unicode string. 
But at no point here is the author dealing with UTF-8.

>> OK, but still allows sticking non-Unicode gunk into the strings,
>> right? So they're still vectors of "something". Whatever that
>> something is.
>
> Conceptually unsigned 32-bit values. The actual internal representation
> is likely to be something else.

I don't care about the internal representation; I'm interested in the 
author-observable behavior.

> Interpretation of those values is left to the functions (both built-in and application) that operate upon them.

OK.  That includes user-written functions, of course, which currently 
only have to deal with UTF-16 (and maybe UCS-2 if you want to be very 
pedantic).

> Most built-in string methods do not apply any interpretation and will
> happily process strings as vectors of arbitrary uint32 values. Some
> built-ins (encodeURI/decodeURI, toUpperCase/toLowerCase) explicitly deal
> with Unicode characters or various Unicode encodings and these have to
> be explicitly defined to deal with non-Unicode character values or
> invalid encodes.

That seems fine.  This is not where problems lie.

> These functions already are defined for ES5 in this
> manner WRT the representation of strings as vectors of arbitrary uint16
> values.

Yes, sure.

-Boris


More information about the es-discuss mailing list