May 24-26 rough meeting notes

Waldemar Horwat waldemar at google.com
Tue May 31 14:30:26 PDT 2011


On 05/31/11 13:34, Brendan Eich wrote:
> On May 31, 2011, at 1:21 PM, Waldemar Horwat wrote:
>
>> On 05/27/11 19:36, Brendan Eich wrote:
>>> On May 27, 2011, at 6:42 PM, Brendan Eich wrote:
>>>
>>>> On May 27, 2011, at 6:20 PM, Waldemar Horwat<waldemar at google.com>   wrote:
>>>>
>>>>>> This produces expressions such as 42 = foo(), which must be handled by semantic specification. Why can't we have a more precise grammar?
>>>>>
>>>>> This is an entirely different issue.  The LeftHandSideExpression is still evaluated as an expression; you just don't call GetValue on it.  We chose to prohibit 42 = foo(); we could equally well have chosen to prohibit foo = 42(), but neither situation has much to do with the grammar.
>>>>
>>>> That dodges the big "cover grammar" vs. Precise Grammar issue before us. It assumes the conclusion tha semantics such as Reference internal types are the way to go, because LR(1) can't hack it.
>>>
>>> Again, real browser JS engines use top-down parsing and no Reference type, instead specializing the AST for LeftHandSideExpressions to be precise, with early errors per ES5 (and even in ES3 era) for nonsense-LHS-expressions.
>>
>> Top-down LL parsers are subsumed by LR parsers.  The hierarchy is:
>>
>> LL(0)<  LL(1)<  LR(1). LR(0)<  SLR(1)<  LALR(1)<  LR(1).
>
> Yes, I know -- but that is not in practice so helpful to implementors, since they have to (at least) tediously refactor to remove left recursion.. The reason to use LR(1) in the spec is not to help implementors, as far as I can tell.
>
> The reason to use LR(1) that you have cited is to have a validated-unambiguous grammar using a well-known formalism. It's a good reason, but it applies to other approaches.
>
> Is there anything particularly compelling about LR(1) vs. say PEG (which is unambiguous by construction) -- linear time, memory proportional to PDA depth, or other? I have my own thoughts but I'd be interested in yours.

Yes.  I would not want to use anything like a PEG to standardize a grammar.  Here's why:

PEG being unambiguous by construction simply means that it resolves all ambiguities by picking the earliest rule.  This turns all rules following the first one into negative rules:  X matches Z only if it DOESN'T match a Y or a Q or a B or ....  You could pick the same strategy to disambiguate an LR(1) grammar, and it would be equally bad.

Negative rules are the bane of grammars and behind the majority of the problems with the C++ grammar, including the examples I listed earlier.  They make a grammar non-understandable because the order of the rules is subtly significant and makes it hard to reason about when an X matches a Z; a language extension might expand the definition of Y to make an X no longer match a Q, and you wouldn't know it just by looking at a grammar with negative rules.  In a positive-rule-only grammar you'd discover the problem right away because the grammar wouldn't compile.

Negative rules also interact badly with both semicolon insertion and division-vs-regexp lexer disambiguation.  One might naively think that semicolon insertion would be an ideal match for negative rules:  You first try to parse

   tokens-on-line1
   tokens-on-line2

as a single statement and, only if that fails, you move on to parsing it as two statements with a virtual semicolon between them.  That, however, doesn't work.  Here's a simple counterexample:

   a + b
   (c) = d

Negative rules would insert a virtual semicolon here because

   a + b(c) = d

is not a valid parse.  However, the correct ECMAScript behavior is not to insert a semicolon.

     Waldemar


More information about the es-discuss mailing list