Close review of Language Overview whitepaper
Maciej Stachowiak
mjs at apple.com
Wed Nov 14 14:03:25 PST 2007
Hello ES4 fans,
I have now read the recently posted whitepaper. I marked up my printed
copy with many comments in the margins, and I am sharing them with the
list now.
Please note that this does not constitute an official Apple position,
just some personal off-the-cuff opinions. I have discussed the
proposal with some of my colleagues, including Geoff Garen who
attended the recent f2f, but we have not figured out a consensus
overall position or anything. With the disclaimers out of the way,
here are my review comments:
Section I.
Goals: I strongly agree with the stated goals of compatibility and
enabling large software development. I wonder if perhaps performance
should be added as a goal. At the very least we want it to be possible
to achieve performance on par with ES3 engines, and ideally we want to
enable better performance.
Section II.
Programming in the small: "... make the writing and reading of
fragments of code simpler and more effortless." That is somewhat
dubious gramatically, I suggest (with additional style fixes) "make
the reading and writing of code fragments easier."
Portability: This section first it says that the full language must be
supported - subset profiles are not desirable. Then it says that, to
allow ES4 to be practically implementable on small devices and in
hosted environments, certain features, like extensive compile-time
analysis and stack marks cannot be part of the language. Then it says
those features are part of the language, but optional.
I hope the problems here are clear: first, the section plainly
contradicts itself. It argues against subsets and certain classes of
features, and then says the spec includes such features as optional,
thus defining a subset. So that needs to be fixed in the whitepaper.
More significantly, I think this may be an indication that the
language has failed to meet its design goals. My suggestion would be
to remove all optional features (though I could be convinced that
strict mode is a special case).
Section III.
Syntax: The new non-contextual keywords, and the resulting need to
specify dialect out of band, are a problem. I'll have more to say
about compatibility under separate cover.
Behavior:
- This section has says that "variation among ES3 implementations
entails a license to specify behavior more precisely for ES4".
However, the example given is a case where behavior among two
implementations was already the same, due to compatibility
considerations. I actually think both convergence on a single behavior
where variation is allowed, and variation that leads to practical
compatibility issues are license to spec more precisely,
- The RegExp change - is this really a bug fix? It's likely that this
is not a big compatibility issue (Safari's ES3 implementation had
things the proposed ES4 way for some time) but I think ES3's approach
may be more performance and generating a new object every time does
not seem especially helpful.
Impact: This section talks a lot about incompatibilities between ES4
and ES3, however I think incompatibilities with ES3 as specced are in
themselves almost irrelevant. What matters is incompatibilities with
existing implementations and the content that depends on them. This
section also appears to talk disparagingly about some implementations
prioritizing compatibility over ES3 compliance, implies that any
deviations may be due to "inadequate engineering practices", and
implies that only "some" implementations are not compatible with ES3.
Is there any significant implementation that anyone would claim is
100% free of ECMAScript 3 compliance bugs? I doubt it, and so I think
we should make this section less judgmental in tone.
The web: Here especially, the actual concern is real-world
compatibility, not compatibility with the ES4 spec. Furthermore, it
completely ignores forward compatibility (the ability to serve ES4 to
older browsers that do not support it). It implies that this is just
an issue of aligning the timing of implementations. Ignoring for the
moment how impractical it is to expect multiple implementations to
roll out major new features in tandem, I note that there were similar
theories behind XHTML, XSL, XHTML 2, and many other technologies that
have largely failed to replace their predecessors. Again, I'll say
more about compatibility (and in particular how the WHATWG approach to
compatibility can be applied to ES4) under separate cover.
Section IV.
Classes: If any of the new type system is worthwhile, surely this is.
The impedance mismatch between the class model used by most OO
languages and by specifications like the DOM, and ES3's prototype
model, is needlessly confusing to authors. So I approve of adding
classes in a reasonable and tasteful way.
Dynamic properties: the fact that the "dynamic" behavior is not
inherited makes class inheritence violate the Liskov Substitution
Principle. I think this is a problem. Subclassing should be subtyping
in the LSP sense. I am not sure offhand how to fix this.
Virtual Properties: I wish the keyword for catchall getters and
setters was something other than "meta", which is a vague word that
doesn't mean much. Why not "catchall" or "fallback" or something along
similarly concrete lines? (I realize now upon re-reading my margin
comments that this is supposed to match meta invoke, but there too I
am not sure the relationship is worth the vagueness.)
Wrappers: The whitepaper implies that providing catchall getters and
setters for primitive types and skipping boxing isn't a compatibility
issue. However, it is possible in ES3 to capture an implicit wrapper:
var x;
String.prototype.myFunc = function() { this.foo = "foo"; x = this; };
"bar".myFunc();
Prototype hacking allows you to observe identity of the temporary
wrappers, save them for later, and store properties. Perhaps there is
evidence that practices relying on techniques like this are
exceedingly uncommon (I'd certainly believe it), if so it should be
cited.
Literals:
- I am surprised to see a decimal type (a type that is not directly
supported in current mainstream hardware) even though generally
popular types like single-precision IEEE floating point and 64 bit
integers are not present.
- Since ints/uints overflow to doubles, then either all int math must
be performed in double space (requiring constant conversions when
working with int variables), or every operation must check for
overflow and possibly fall back to double space. Even when the final
result cannot overflow, certainly in many expressions the difference
between int and double intermediates can be observed. It seems likely,
then, that math on variables declared int will be slower than math on
variables declared double, which will surely be confusing to
developers. This seems pretty bogus. Is there any case where int math
using the normal operators can actually be efficient? Would it be
plausible to make ints *not* overflow to double unless there is an
actual double operand involved (in which case int constants would
always need a special suffix, or perhaps can somehow be determined
contextually).
Section V.
Record and array types: Structural types are confusingly similar to
yet different from classes. Mostly they offer a subset of class
functionality (though reading ahead I did see a few features limited
to them). Also, already having prototype-based objects and class-based
objects it seems excessive to add yet a third way. I recommend
removing them and adding any features that are sorely missed as a
result to classes.
"Any": The spec explains vaguely that the "any" type is not identical
to the union (null, undefined, Object). How is it different? Is the
difference observable to ES4 programs or is it purely a matter
internal to the spec (in which case the difference is not relevant)?
Type definitions: Seeing the example of a type definition for a record
makes this feature seem even more redundant with classes.
Data Types: If structural types cannot be recursive, then one of the
canonical applications of record-like types, the linked list, cannot
be implemented this way. I assume it can be with classes. Yet another
reason to fold any interesting record features into classes.
Nullability: Are non-nullable types really worth it? I am not sure.
Does any other explicit type system for a dynamic OO language have
such a concept? The whitepaper says that "the ability to store null is
occasionally the source of run-time errors" but will not dynamic-
checking result in runtime errors anyway when assigning null to a non-
nullable variable (except in strict mode)?
"wrap": Seems like a version of this feature and/or "like" founded on
classes would work just as well.
Conversions: "In addition, any value in the language converts to a
member of AnyBoolean", but the conversions specified are all to the
more specific "boolean" type, so perhaps it should be expressed that
way to avoid confusion.
Section VI.
Predefined namespaces: ES4 predefines and automatically opens the
__ES4__ namespace. What will happen in ES5 (or ES4.1 or whatever)?
Will they still name the primary namespace __ES4__? Will it have
__ES5__ instead? Will it have both? I don't care that much about the
specifics as long as this has been thought through.
Bindings: The sheer number of constructs that bind names is a little
scary. I count 16 in the list. I don't think anyone has raised the
paucity of binding constructs as a critical flaw in ES3. Are all these
different constructs really necessary?
Bonding objects and scopes: It seems like introducing lexical block
scopes makes things more challenging for online implementations.
Creating a dynamic scope object per block scope is clearly
unacceptable, but more work may be needed to build a per-function
symbol table that can properly accomodate block scope. Is block scope
worth it? Yes, "var" is a little weird, but having both "var" and
"let" increases conceptual footprint and may overall lead to more
author confusion.
package: Now that I have learned more about them, I think that
exposing packages and namespaces as separate user-level concepts is
confusing. Let's get this down to a single concept that developers
have to learn. Namespaces can just have a paired internal namespace
implicitly, I do not think it is helpful to give the public/internal
pair a special different name.
let, let const: Are expression let and block let really that useful,
other than to make old-school Lisp/Scheme hackers smile? To
programmers mainly used to imperative paradigms I think these will
come off as syntactic salt. See also my previous comments about
whether lexical block scope is worth adding to the language at all.
Program units:
- Is there any need for the concept of "unit" to be exposed in the
syntax? Why not just allow "use unit" at top level, and implicitly
make each file (or in the browser context each inline script) a unit?
- I think the difference between using units and importing packages is
going to be confusing to authors. Seriously, can anyone explain in one
sentence of 12 words or less how Joe Random Developers will decide
whether to use a namespace, import a package, or use a unit? Can we
get this down to only one kind of thing that needs to be mentioned in
the syntax? This would be a big win in reducing conceptual footprint.
Section VII.
Versioning: I am suspicious of versioning mechanisms, especially big
giant switch versioning. Is there any use of __ECMASCRIPT_VERSION__
that is not better handled by feature testing? (Maybe there is and I
am not thinking of it.)
Type annotations and type checking: This section implies that type
annotations are not at all being added for performance reasons and may
indeed be harmful to performance. Wow! Seriously? I think runtime
assertions are interesting when debugging but I do would not want them
happening for every assignment statement in a release build of my C++
code. I am not sure why ECMAScript programmers would want that. Later
this section says "it is plausible" that typed programs will run
faster and not slower with enough analysis, but this issue seems far
too crucial to take such a blase attitude. Unless we can show that
type annotations won't cause a performance hit in practice, and in
particular give a convincing argument that the relevant analysis can
be done with reasonable speed and without introducing an ahead-of-time
compile phase, then it is irresponsible to include type annotations as
currently designed. I am willing to believe that this is the case, but
I cannot sign on to an attitude that we don't care if typed programs
get faster or slower. Nor am I willing to take experience based on
ahead-of-time compilers as definitive.
Pragmas: The "use decimal" pragma highlights how much complexity there
is to the decimal type. Seriously, is it worth it? Is the problems it
solves really that common?
"for each" statement: This seems like a convenient piece of syntactic
sugar.
Generators: Do ordinary programmers really understand coroutine
control flow? Is this really a significantly better paradigm than
passing a visitor function? Not really convinced in this one yet.
Operator overloading through global multimethods: Overloading? Yikes.
Seems complicated. Aren't we worried that this could make the common
case of existing untyped code slower than it is already?
Tail calls:
- The whitepaper doesn't define very precisely what "accumulate
control stack" means. Are recursive calls allowed to accumulate other
kinds of space (in which case the usefulness of the requirement is
dubious)? Do functions that may be implemented in native code count
(so for instance if you eval an expression that calls your function in
tail position repeatedly, does the requirement apply?)
- "The use of procedural abstraction for iteration requires the use of
un-abstract control structures to consumption of control stack space,
among other things." This sentence seems to be buggy and has triggered
a parse error in my brain.
- It seems odd to mention goto here, since it is not a feature of the
language.
"this": The most common reason that I know of for trying to copy this
into a variable is for lexically nested functions that are set as
event listeners or similar, and not called immediately by name. So I
don't think the this-passing feature actually addresses the common
likely use-case for such a thing, and so may be more confusing than
helpful.
"eval" operator and the "eval" function: This seems like a good
approach to sanitizing eval. Perhaps it should be highlighted that
splitting the eval function and eval operator is a potential
performance benefit through opening significant new optimization
opportunities.
arguments: It seems strange to both deprecate a feature and improve it
at the same time.
"typeof" operator: I think it's been decided to back out the typeof
"null" change so this may as well be dropped from the whitepaper.
Section VIII.
Strict:
- I would strongly prefer if strict mode did not alter behavior of
programs at all, except to reject those that do not pass the checks.
Otherwise, since strict mode is optional, this risks interop issues.
So I'm curious what the eval detail is. Perhaps strict mode could
remove the eval operator and allow only the eval function, with some
suitably named version made available ahead of time, if the difference
is just removing local eval behavior.
- I am somewhat concerned about having strict mode at all. It seems
like it could create the same kinds of problems we see today with
content that is served as application/xhtml+xml to some browsers and
text/html to others. It's not infrequent to see such content break
only in the browsers that really support XML, due to sloppy testing of
changes and the fact that the 78% browser doesn't support XHTML.
Verification:
- Does strict mode actually allow for any optimizations that couldn't
be done to the exact same program in standard mode?
Section IX.
"switch type" statement: I guess this beats switching on typeof, but
is it really significantly better than a series of "if" statements
using the "is" operator?
Expression closures: I actually find the examples hard to follow given
my expectation of ES3-like syntax. I think this may actually be
syntactic salt.
Array comprehensions: This seems pretty sugary to me but this kind of
syntax has proven useful for typical developers using Python.
Destructuring assignment and binding: I grudgingly accept that this
sort of construct has been proven in the context of Python and Perl.
"type": Are runtime meta-objects representing types ruly necessary?
What are they good for?
Slicing: This one I mildly object to. Array/String slicing is not, to
my knowledge, particularly common in ECMAScript code of today. I am
dubious that it merits its own operator syntax.
Semicolon insertion: I'd like more detail on the compatibility of the
return change. The do-while change adopts de facto reality and so is
good.
Trailing commas: Good to standardize this de facto extension.
Section X.
Map: Long overdue to have a real hashtable type.
Early binding, static type checking, and predictable behavior with
"intrinsic": Perhaps it should be highlighted more that this is a
potential significant performance improvement.
Reflection: This feature seems like it could be complex to implement
and potentially unnecessary for small implementations. I note that
J2ME omits reflection, which we can perhaps take as a sign that it is
not suitable for small implementations.
ControlInspector: I think an interface that's meant for debuggers and
similar tools, and not implementable in all interesting contexts, does
not need to be standardized. Better than having an optional feature.
JSON: Sounds good.
DontEnum: Overloading a getter to sometimes also be a setter seems to
be in poor taste. (1) It's confusing. (2) It makes it impossible to
separately feature-test for existence of the setter. I suggest adding
setPropertyIsEnumerable instead. Why this design choice? Also: can
built-in properties that are naturally DontEnum be made enumerable?
That seems like annoying additional complexity?
Math: I'm surprised to learn that the numeric value 4 is distinct in
int and double types, and yet int math still must (effectively) be
done in double space. This seems bad for performance all around. If
ints are to be a distinct type, then integer math should always be
done in int space.
uint-specific operations: This is syntactically ugly. Why can't
integer math just always work this way? Also, why only uint versions?
Surely it is desirable to do efficient math on signed integers as
well. Also, bitops already happen in integer math space, thus type-
specific versions should not be necessary since no floating point
conversion will need to occur if both operands of ^ or & are
statically typed as int or uint.
Things I didn't see:
What about standardizing the de facto <!-- comment syntax that is
necessary for web compatibility?
More information about the Es4-discuss
mailing list