LR(1) grammar/parser and lookahead-restrictions

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Thu Jan 12 05:45:51 UTC 2017


The "early errors" are just parser errors which are enforced not by grammar
rules, but by additional validation algorithms which follow the parsed
node. E.g. `eval` cannot be assigned in strict mode: it's not a grammar
issue (grammar allows `eval` on LHS), and it's not a runtime issue, it's an
"early error" enforced by static semantics, which still issues it as a
`ParseError`. Example in handling it in a semantic action for a production
(on a simplified `AssignmentExpression`):

```
%%

AssignmentExpression
  : IDENTIFIER '=' Expression {
      const id = $1;

      // Static semantics: early parser error:

      if (mode.isStrict && (id === 'eval' || id == 'arguments')) {
        throw new ParseError(
          '`eval` and `arguments` cannot be assigned in strict mode.'
        );
      }

      return {
        type: 'AssigmentExpression',
        left: $1,
        right: $3,
      };
    }
```

Similarly other "early errors".

BTW, here is the example I mentioned above of how Flow has to do
backtracking in parsing in some edge cases, making the grammar completely
not LR(1) anymore:
https://github.com/babel/babel/issues/1991#issuecomment-121708242

Dmitry


On Wed, Jan 11, 2017 at 7:13 PM, Isiah Meadows <isiahmeadows at gmail.com>
wrote:

> Okay. I see now. How do early errors factor into this?
>
> On Wed, Jan 11, 2017, 21:49 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
> wrote:
>
>> Well, it's still context-free (contex-sensitive may have terminals on LHS
>> too, JS cannot). It requires extra cover grammar, and additional static
>> semantics, but still potentially can be parsed by a LALR/CLR parser - yes,
>> you'll have to do parsed nodes reinterpretation in semantics handlers (e.g.
>> validate and transform parsed ObjectLiteral into an ObjectPattern when
>> parsing destructuring), but should be doable in general. It won't be doable
>> if lookaheafs won't be enough, and only a backtracking can recover (like
>> the cases with Flow parser).
>>
>> Dmitry
>>
>>
>> On Wed, Jan 11, 2017 at 6:09 PM Isiah Meadows <isiahmeadows at gmail.com>
>> wrote:
>>
>> Heuristically, I doubt it's even context-free at this point, considering
>> the concept and widespread prevalence of early errors now. I suspect it's
>> mildly context-sensitive (maybe tree-adjoining?), but I'm no formal
>> language expert here.
>>
>>
>>
>> On Wed, Jan 11, 2017, 14:23 Dmitry Soshnikov <dmitry.soshnikov at gmail.com>
>> wrote:
>>
>> One yet has to check the pure ES6+ grammar to be LR(1) compatible (the
>> Cover grammar with nodes reinterpretations makes it harder), but for a very
>> practical use-case, e.g. combined with Flow, it'll not be easy to have an
>> automated parser (since Flow's parser uses backtracking for some edge cases
>> which is already not LR-parser).
>>
>> I have some subset of ES6 working as an example language for Syntax
>> tool[1] (since I just took a bunch of productions from JS), but I haven't
>> tried the full ES6 grammar (not having time to work on this project, since
>> it'll be not a straightforward process of just matching the grammar 1:1 to
>> the parser generator -- based on the above reasons).
>>
>> Long time ago I also had a brief discussion with authors of another
>> popular parser generators for JS. Both didn't try it as well, but mentioned
>> that even for ES5 it was hard to parse e.g. regexp, etc.
>>
>> Someone (with enough time for the project) should just sit, and spend
>> some time actually porting the grammar from the spec on some powerful
>> parsers generator.
>>
>> [1] Syntax: https://github.com/DmitrySoshnikov/syntax
>> [2] https://twitter.com/DmitrySoshnikov/status/666327209959751680
>>
>> On Wed, Jan 11, 2017 at 10:28 AM, Michael Dyck <jmdyck at ibiblio.org>
>> wrote:
>>
>> In the past, it has been said (usually by Brendan Eich) that TC39 intends
>> that the ECMAScript grammar be LR(1). Is that still the case? (I'm not so
>> much asking about the "1", but more about the "LR".)
>>
>>
>>
>>
>>
>> If so, I'm wondering how lookahead-restrictions (e.g., [lookahead <!
>> terminals]) fit into the LR approach. It seems like there are three
>> possibilities:
>>
>>
>>   - modify the grammar (before the parser is constructed),
>>
>>
>>   - modify the construction of the parser, and/or
>>
>>
>>   - modify the operation of the parser.
>>
>>
>> It's not clear to me that any of these (or any combination of these) will
>> work.
>>
>>
>>
>>
>>
>> Has anyone algorithmically generated an LR parser from the spec grammar?
>> If so, how did you you handle lookahead-restrictions?
>>
>>
>>
>>
>>
>> -Michael
>>
>>
>> _______________________________________________
>>
>>
>> es-discuss mailing list
>>
>>
>> es-discuss at mozilla.org
>>
>>
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>>
>> es-discuss mailing list
>>
>>
>> es-discuss at mozilla.org
>>
>>
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170111/682ec400/attachment.html>


More information about the es-discuss mailing list