RegExp `x` flag

Isiah Meadows isiahmeadows at gmail.com
Wed Jun 5 08:05:09 UTC 2019


1. Very. Just strip the extra whitespace and replace it with the
non-`/x` version.
2. Whitespace is negligible in parsing performance, and regexps have a
fairly simple grammar to begin with. (It can be done with a single
character of lookahead easily and the only thing that can nest more
than a single level is parentheses.) 90% of the actual time spent on
them is on compilation and `/x` would have zero effect on that.

The issue of detection is actually pretty trivial: a `/` is assumed to
be division any time you can continue an expression, and regexps are
only consumed when no binary operator could potentially be expected.
It's a rather obscure edge case often left out of ASI posts, one I've
yet to even hear about being used, although I could contemplate it
being used in code bases which use `cond && foo()` instead of `if
(cond) foo()` and `cond || foo()` instead of `if (!cond) foo()`.

`new RegExp(multilineString)` *is* a valid fallback, something I
already use today quite a bit, but I'd prefer to use one or the other
consistently for static regexps.

-----

Isiah Meadows
contact at isiahmeadows.com
www.isiahmeadows.com

On Tue, Jun 4, 2019 at 12:36 AM kai zhu <kaizhu256 at gmail.com> wrote:
>
> 1. is this minifier-friendly?
> 2. is parsing-impact minimal enough to not affect load-times?  regexp-detection/bounding is among the most expensive/complex part of javascript-parsing.
>
> those 2 nits aside, i'm on the fence.  regexp-spaghetti is a valid painpoint, and jslint's author has expressed desire for multiline regexp [1].  otoh, there is a good-enough solution by falling-back to constructor-form to improve readability:
>
> ```js
> // real-world spaghetti-regexp code in jslint.js
> const rx_token = /^((\s+)|([a-zA-Z_$][a-zA-Z0-9_$]*)|[(){}\[\],:;'"~`]|\?\.?|=(?:==?|>)?|\.+|[*\/][*\/=]?|\+[=+]?|-[=\-]?|[\^%]=?|&[&=]?|\|[|=]?|>{1,3}=?|<<?=?|!(?:!|==?)?|(0|[1-9][0-9]*))(.*)$/;
>
> // vs
>
> /*
>  * break JSON.stringify(rx_token.source)
>  * into multiline constructor-form for readability
>  */
> const rx_token = new RegExp(
>     "^("
>     + "(\\s+)"
>     + "|([a-zA-Z_$][a-zA-Z0-9_$]*)"
>     + "|[(){}\\[\\],:;'\"~`]"
>     + "|\\?\\.?"
>     + "|=(?:==?|>)?"
>     + "|\\.+"
>     + "|[*\\/][*\\/=]?"
>     + "|\\+[=+]?"
>     + "|-[=\\-]?"
>     + "|[\\^%]=?"
>     + "|&[&=]?"
>     + "|\\|[|=]?"
>     + "|>{1,3}=?"
>     + "|<<?=?"
>     + "|!(?:!|==?)?"
>     + "|(0|[1-9][0-9]*)"
>     + ")(.*)$"
> );
> ```
>
> [1] github jslint-issue #231 - ignore long regexp's (and comments)
> https://github.com/douglascrockford/JSLint/pull/231#issuecomment-421881426
>
>
>
> On Mon, Jun 3, 2019 at 10:27 PM Jacob Pratt <jhprattdev at gmail.com> wrote:
>>
>> Even if this flag were restricted to constructors instead of both constructors and literals, it could be worthwhile.
>>
>> On Mon, Jun 3, 2019, 23:19 Isiah Meadows <isiahmeadows at gmail.com> wrote:
>>>
>>> Let me clarify that previous message: I mean "newline restriction" in
>>> the sense that newlines are not permitted in regexp literals. A `/x`
>>> flag would make removing it practically required for it to have any
>>> utility.
>>>
>>> -----
>>>
>>> Isiah Meadows
>>> contact at isiahmeadows.com
>>> www.isiahmeadows.com
>>>
>>> On Mon, Jun 3, 2019 at 11:14 PM Isiah Meadows <isiahmeadows at gmail.com> wrote:
>>> >
>>> > I would personally love this (as well as interpolations in regexp
>>> > literals). I do have a concern about whether removing the newline
>>> > restriction creates ambiguities with division, but I suspect this is
>>> > *not* the case.
>>> >
>>> > -----
>>> >
>>> > Isiah Meadows
>>> > contact at isiahmeadows.com
>>> > www.isiahmeadows.com
>>> >
>>> > On Mon, Jun 3, 2019 at 10:03 PM Jacob Pratt <jhprattdev at gmail.com> wrote:
>>> > >
>>> > > Has there been any previous discussion of adding the `x` flag to JS? It exists in other languages, and can make relatively complicated regex _much_ easier to read. It also allows for comments, which are incredibly helpful when trying to understand some regexes.
>>> > >
>>> > > For prior art, XRegExp has this flag (though I've no idea to figure out how frequently it's used), as do a few other languages.
>>> > >
>>> > > Quick overview: https://www.regular-expressions.info/freespacing.html
>>> > >
>>> > > Language references:
>>> > > Python: https://docs.python.org/3/library/re.html#re.X
>>> > > Rust: https://docs.rs/regex/1.1.6/regex/
>>> > > XRegExp: http://xregexp.com/xregexp/flags/#extended
>>> > > .NET: https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#regular-expression-options
>>> > >
>>> > > Jacob Pratt
>>> > > _______________________________________________
>>> > > es-discuss mailing list
>>> > > es-discuss at mozilla.org
>>> > > https://mail.mozilla.org/listinfo/es-discuss
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss


More information about the es-discuss mailing list