May 24-26 rough meeting notes

Waldemar Horwat waldemar at google.com
Fri May 27 18:20:49 PDT 2011


On 05/27/11 16:00, Brendan Eich wrote:
> On May 27, 2011, at 12:27 PM, Waldemar Horwat wrote:
>
>>> Peter Hallam kindly offered to help come up with a new grammar formalism for the spec that can pass the "Waldemar test" (if that is possible; not as hard as the Turing test). IIRC Peter said he was (had, would) adding arrow support per the strawman to Traceur (http://code.google.com/p/traceur-compiler/). We talked about Narcissus support too, to get more user testing.
>>
>> If we need to come up with a new formalism, that's a very powerful signal that there's something seriously flawed in the design.
>
> Or the spec.
>
> LR(1) is good, I like it, but all the browser JS implementations, and Rhino, use top-down hand-crafted parsers, even though JS is not LL(1). That is a big disconnect between spec and reality.
>
> As you've shown these can look good but be future hostile or downright buggy, so we need a formalism that permits mechanical checking for ambiguities. We don't want two ways to parse a sentence in the language.
>
> But this does not mean we must stick with LR(1).
>
>
>> Even if it happens to work now, it will produce surprises down the road as we try to extend the expression or parameter grammar. The places where the grammar is not LR(1) up in C++ are some of the most frustrating and surprising ones for users to deal with, and C++ does not even have the feedback from the parser to the lexer. Perl does and its grammar is both ambiguous and undecidable as a result. Note that implementations of Perl exist, which in this case simply means that the documented Perl "spec" is not sound or faithful -- all implementations are in fact taking shortcuts not reflected in the spec.
>
> The problem is we are already cheating.
>
> /AssignmentExpression/ :
> /ConditionalExpression/
> /LeftHandSideExpression/ = /AssignmentExpression/
> /LeftHandSideExpression/ /AssignmentOperator/ /AssignmentExpression/
>
> This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?

This is an entirely different issue.  The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it.  We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.

> Building on this, destructuring assignment parses more of what was formerly rejected by semantic checking: {p: q} = o destructures o.p into q (which must be declared in Harmony -- it is an error if no such q was declared in scope).
>
> We can certainly write semantic rules for destructuring to validate the object literal as an object pattern; ditto arrays. But the LR(1) grammar is not by itself valid specifying sentences in the language, just as it did not all these years for assignment expressions.
>
> Now, for arrow functions (you already know this, just reciting for the es-discuss list) we could parse the /ArrowFormalParameters/ : /Expression/ and then write semantics to validate that comma expression as arrow function formal parameters.
>
> Right now, the expression grammar and the formal parameter list grammar are "close". They have already diverged in Harmony due to rest and spread not being lookalikes: spread (http://wiki.ecmascript.org/doku.php?id=harmony:spread) allows ... /AssignmentExpression/ while rest wants only ... /Identifier/.
>
> But we still can cope: the /Expression/ grammar is a cover grammar for /FormalParameterList/.
>
> Of course, the two sub-grammars may diverge in a way we can't parse via parsing a comma expression within the parentheses that come before the arrow. Guards seem like they will cause the parameter syntax to diverge, unless you can use them in expressions (not in the strawman).
>
> The conclusion I draw from these challenges, some already dealt with non-grammatically by ES1-5, is that we should not make a sacred cow out of LR(1). We should be open to a formalism that is as checkable for ambiguities, and that can cope with the C heritage we already have (assignment expressions), as well as new syntax.

Given that LR(1) is the most general grammar available before you start getting into serious complexity (it subsumes LALR and other commonly studied grammars), there is a big cliff here and I think it's foolish to plan to jump off it without completely understanding the consequences.  This is especially true because there are other paths available for compact function syntax that do not involve jumping off that cliff.
I realize that C++ and Perl put up with ambiguity, and it seriously bites them.  Quick, what's the difference between the following in C++?

   int x(int());
   int x(-int());

     Waldemar


More information about the es-discuss mailing list