New full Unicode for ES6 idea
brendan at mozilla.com
Mon Feb 20 12:32:38 PST 2012
Allen Wirfs-Brock wrote:
> On Feb 20, 2012, at 10:52 AM, Brendan Eich wrote:
>> Allen Wirfs-Brock wrote:
>>> Another way to express what I see as the problem with what you are
>>> proposing about imposing such string semantics:
>>> Could the revised ECMAScript be used to implement a language that
>>> had similar but not identical semantic rules to those you are
>>> suggested for ES strings. My sense is that if we went down the path
>>> you are suggesting, such a implementation would have to use binary
>>> data arrays for all of its internal string processing and could not
>>> use ES string functions to process them.
>> If you mean a metacircular evaluator, I don't think so. Can you show
>> a counterexample?
>> If you mean a UTF-transcoder, then yes: binary data / typed arrays
>> are required. That's the right answer.
> Not necessarily, metacircular...it could be support for any language
> that imposes different semantic rules on string elements.
In that case, binary data / typed arrays, definitely.
> You are essentially saying that a compiler targeting ES for a language
> X that includes a string data type that does not confirm to your
> rules (for example, by allowing occurrences of surrogate code points
> within string data)
First, as a point of order: yes, JS strings as full Unicode does not
want stray surrogate pair-halves. Does anyone disagree?
Second, binary data / typed arrays stand ready for any such
> could not use ES strings as the target representation of its string
> data type. It also could not use the built-in ES string functions in
> the implementation of language X's built-in functions.
Not if this hypothetical source language being compiled to JS wants
other than full Unicode, no.
Why is this a problem, even hypothetically? Such a use-case has binary
data and typed arrays standing ready, and if it really could use
String.prototype.* methods I would be greatly surprised.
> It could not leverage any optimizations that a ES engine may apply to
> strings and string functions.
Emscripten already compiles LLVM source languages (C, C++, and
Objective-C at least) to JS and does a very good job (getting better day
by day). The utility of string function today (including uint16 indexing
and length) is immaterial. Typed arrays are quite important, though.
> Also, values of X's string type can not be directly passed in foreign
> calls to ES functions. Etc.
Emscripten does have a runtime that maps browser functionailty exposed
to JS to the guest language. It does not AFAIK need to encode surrogate
pairs in JS strings by hand, let alone make pair-halves.
More information about the es-discuss