Look-behind proposal in trouble
Claude Pache
claude.pache at gmail.com
Fri Oct 9 13:53:25 UTC 2015
> Le 9 oct. 2015 à 15:00, Nozomu Katō <noz.ka at akenotsuki.com> a écrit :
>
> Erik Corry wrote on Fri, 9 Oct 2015, at 10:52:09 +0200:
>> I made an implementation of .NET-style variable length lookbehinds. It's
>> not in a JS engine, but it's in a very simple (and very slow)
>> ES5-compatible regexp engine that is used in the tiny Dart implementation
>> named Fletch.
>>
>> No unicode issues arise since this engine does not support /u, but I don't
>> expect any issues since it's not trying to second-guess the length of the
>> string matched by an expression.
>>
>> Needs a lot more tests, but it seems to work OK and was surprisingly simple
>> to do. Basically:
>>
>> * All steps in the input string are reversed, so if you would step forwards
>> you step backwards.
>> * Check for start of string instead of end of string.
>> * Test against the character to the left of the cursor instead of to the
>> right.
>> * The parts of the Alternative (see the regexp grammar in the standard) are
>> code-generated in reverse order.
>>
>> Code is here: https://codereview.chromium.org/1398033002/
>
> Me too; I have once implemented lookbehind assertions by using this way
> in SRELL, my C++ template library whose engine is compatible with RegExp
> of ECMAScript but whose class design is compatible with std::regex of
> C++ [1].
>
> However, later I removed the code for such lookbehinds and adopted Perl5
> style lookbehinds instead. The core reasons are:
>
> 1. Right-to-left matchers are used only in lookbehind assertions;
> 2. Nevertheless, these cannot share code with normal (left-to-right)
> matchers and need their own optimization processes.
>
> Thus, I came to feel that what I can get and what I have to do are
> unbalanced.
>
> In my understanding, features that are available in .NET style
> lookbehinds but are not so and even cannot be emulated in Perl5 style
> lookbehinds are 1) the use of the backreference and 2) the use of the
> quantifiers other than {n}. The others can be emulated in some way.
>
> For example, the positive multiple-length lookbehind (?<=ab|cde) can be
> substituted by (?:(?<=ab)|(?<=cde)). The substitution of the negative
> multiple-length lookbehind is more simple, only to write assertions in
> succession; for example, (?<!ab|cde) can be written as (?<!ab)(?<!cde).
>
> I guess that oniguruma supports expressions like (?<=ab|cde) by doing
> such substitutions inside the library, but just my guess.
>
> So, I came to feel that Perl5 style lookbehinds are balanced. It may not
> be best, though. In fact, the current implementation for lookbehinds in
> my library is far simple; it shares code with lookaheads. If the count
> to rewind is 0 then it means lookahead, otherwise (if equal to or more
> than 1) it means lookbehind.
>
> If we would introduce .NET style lookbehinds into RegExp of ECMAScript,
> it would need someone who writes right-to-left versions of the most
> parts of the definitions under 21.2 of the specification.
>
> Nozomu
>
> [1] http://www.akenotsuki.com/misc/srell/en/ <http://www.akenotsuki.com/misc/srell/en/>
Note that full-featured lookbehind assertions (à la .NET) is not the only case where backward matching is useful.
Consider for instance, the following simple method:
```js
String.prototype.trimRight = function () {
return this.replace(/\s+$/u, '')
}
```
That implementation would be more efficient if we could instruct the regexp to be applied backwards.
—Claude
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20151009/c2ecb375/attachment.html>
More information about the es-discuss
mailing list