Re: Question about the “full Unicode in strings” strawman

Allen Wirfs-Brock allen at
Wed Jan 25 12:42:55 PST 2012

On Jan 25, 2012, at 12:25 PM, John Tamplin wrote:

> On Wed, Jan 25, 2012 at 2:55 PM, Allen Wirfs-Brock <allen at> wrote:
> The primary intent of the proposal was to extend ES Strings to support a uniform represent of all Unicode characters, including non-BMP.  That means that any Unicode character should occupy exactly one element position within a String value.  Interpreting \u{10ffff} as an UTF-16 encoding does not satisfy that objective.  In particular, under that approach "\{10ffff}".length would be 2 while a uniform character representation should yield a length of 1.
> When this proposal was originally floated, the much of debated seemed to be about whether such a uniform character representation was desirable or even useful.  See the thread starting at also and  
> That seems highly likely to break existing code that assumes UTF16 representation of JS strings.

The proposal was design to not break any existing JavaScript code.  Just to be clear, ES5.1 and previous do not perform UTF-16 encoding of non-BMP characters in the course of normal string processing.  Any UTF-encoding of non-BMP characters is either being done by user code, the built-in decodeURI functions, or host provided functions (for example XDR??). None of those should break under my proposal.  (for external libraries such as XDR that may depend upon internal implementation data, it is really up to the platform implementation).  Nothing in the proposal prevents application level UTF-16 string encodings using 32-bit String elements.  This is complete analogous to how UTF-8 encodings are sometimes performed using current 16-bit ECMAScript string elements.


> -- 
> John A. Tamplin
> Software Engineer (GWT), Google

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the es-discuss mailing list