ES RegExp parser

Jason Orendorff jason.orendorff at gmail.com
Mon Mar 20 15:44:21 UTC 2017


> `{type: '+', expression: re}`

Sorry, I guess the real expression would be something more like:

    {
      type: 'Repetition',
      expression: re,
      quantifier: {
        type: '+',
        greedy: true
      }
    }

The point stands.

-j


On Mon, Mar 20, 2017 at 10:36 AM, Jason Orendorff <jason.orendorff at gmail.com
> wrote:

> The second approach, hands down.
>
> 1. With the first approach, you're setting up a situation where it's very
> easy to write buggy analysis code: if you forget to check `re.quantifier`
> anywhere, your code will run, but you have a bug. Much easier to only have
> to check `re.type`.
>
> 2. If you have a regexp `re` and you want to programmatically build a
> regexp that matches one or more repetitions of it, it's much easier to
> write `{type: '+', expression: re}` than to have to examine `re.quantifier`
> and (if it's already present) figure out how to modify it.
>
> 3. With the first approach, you don't have to represent `(?:)` group in
> the AST at all (rather like how Esprima drops redundant parentheses). With
> the latter, I think you have to, because it's possible for a single regexp
> "node" to have multiple quantifiers: `/(?:\d{4,6})?/`
>
> To me this is not even a question.
>
> -j
>
>
>
> On Sun, Mar 19, 2017 at 4:52 PM, Dmitry Soshnikov <
> dmitry.soshnikov at gmail.com> wrote:
>
>> I started working on a ECMAScript regular expressions parser[1] with an
>> AST format similar to Mozilla's Parser API. This might later be extended to
>> support more powerful constructs, like lookbehind assertions, multiline
>> regexes, groups naming, comments, etc.
>>
>> And while this is mostly an FYI post (probably someone will find it
>> useful for regexes analysis in source transformation tools, or source code
>> editors), I'd appreciate any feedback on the specification of AST nodes
>> (currently totally made up by myself).
>>
>> E.g. when we have a quantifier from ES spec for RegExp grammar, it
>> doesn't tell anything (and shouldn't of course) which AST node this
>> quantifier node produces.
>>
>> This leaves open questions like "whether a quantifier should be a part of
>> the parsed expression, or should it vice-versa be a main node itself, and
>> have the expression as a sub-node?":
>>
>> In other words, which format is more appropriate (taking into account AST
>> traversal tools in order to implement NFA/DFA engine for it later):
>>
>> ```
>> /a+?/
>> ```
>>
>> Char is main, quantifier is a sub-node:
>>
>> ```
>> {
>>   type: 'Char',
>>   value: 'a',
>>   quantifier: {
>>     type: '+',
>>     greedy: false,
>>   }
>> }
>> ```
>>
>> The quantifier is main (creating `Repetition` AST node), char is the
>> `expression` sub-node:
>>
>>
>> ```
>> {
>>   type: 'Repetition',
>>   expression: {
>>     type: 'Char',
>>     value: 'a'
>>   },
>>   quantifier: {
>>     type: '+',
>>     greedy: false,
>>   }
>> }
>> ```
>>
>> Currently I chose the second approach (with `Repetition` node) as more
>> practical when building an AST traversal -- it may have `onRepetition`
>> generic handler, and call `onChar` internally for its `expression`, instead
>> of making `onChar` (or any other node) to check, and handle its
>> `quantifier`, and do a repeat.
>>
>> Anyways, if you have any thought or feedback on AST nodes format, please
>> feel free to contact me.
>>
>> Dmitry
>>
>>  [1] https://www.npmjs.com/package/regexp-tree
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20170320/46215ccf/attachment.html>


More information about the es-discuss mailing list