New full Unicode for ES6 idea

Brendan Eich brendan at
Mon Feb 20 12:32:38 PST 2012

Allen Wirfs-Brock wrote:
> On Feb 20, 2012, at 10:52 AM, Brendan Eich wrote:
>> Allen Wirfs-Brock wrote:
>> ...
>>> Another way to express what I see as the problem with what you are 
>>> proposing about imposing such string semantics:
>>> Could the revised ECMAScript be used to implement a language that 
>>> had similar but not identical semantic rules to those you are 
>>> suggested for ES strings.  My sense is that if we went down the path 
>>> you are suggesting, such a implementation would have to use binary 
>>> data arrays for all of its internal string processing and could not 
>>> use ES string functions to process them.
>> If you mean a metacircular evaluator, I don't think so. Can you show 
>> a counterexample?
>> If you mean a UTF-transcoder, then yes: binary data / typed arrays 
>> are required. That's the right answer.
> Not necessarily, could be support for any language 
> that imposes different semantic rules on string elements.

In that case, binary data / typed arrays, definitely.

> You are essentially saying that a compiler targeting ES for a language 
> X  that includes a string data type that does not confirm to your 
> rules (for example, by allowing occurrences of surrogate code points 
> within string data)
First, as a point of order: yes, JS strings as full Unicode does not 
want stray surrogate pair-halves. Does anyone disagree?

Second, binary data / typed arrays stand ready for any such 
not-full-Unicode use-cases.

> could not use ES strings as the target representation of its string 
> data type.  It also could not use the built-in ES string functions in 
> the implementation of language X's built-in functions.

Not if this hypothetical source language being compiled to JS wants 
other than full Unicode, no.

Why is this a problem, even hypothetically? Such a use-case has binary 
data and typed arrays standing ready, and if it really could use 
String.prototype.* methods I would be greatly surprised.

>  It could not leverage any optimizations that a ES engine may apply to 
> strings and string functions.

Emscripten already compiles LLVM source languages (C, C++, and 
Objective-C at least) to JS and does a very good job (getting better day 
by day). The utility of string function today (including uint16 indexing 
and length) is immaterial. Typed arrays are quite important, though.

> Also, values of X's string type can not be directly passed in foreign 
> calls to ES functions. Etc.

Emscripten does have a runtime that maps browser functionailty exposed 
to JS to the guest language. It does not AFAIK need to encode surrogate 
pairs in JS strings by hand, let alone make pair-halves.


More information about the es-discuss mailing list