Look-behind proposal
Nozomu Katō
noz.ka at akenotsuki.com
Tue May 19 11:14:52 UTC 2015
Allen Wirfs-Brock wrote on Mon, 18 May 2015, at 15:08:13 -0700:
>
> On May 18, 2015, at 2:20 PM, Jason Orendorff wrote:
>
>> On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō wrote:
>>> It is a Syntax Error if Disjunction contains Quantifier ::
>>> QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.
>>
>> Backreferences must be ruled out, too.
Yes, I forgot to mention backreferences in my proposal. I have modified
my proposal to reflect this.
>>> 1. Let n be the exact number of the sequence of code points which
>>> Disjunction matches [NOTE].
>>
>> I don't think a NOTE is strong enough for spec purposes, so you can
>> improve the proposal by adding a "Static Semantics:" section that
>> formally specifies this.
>>
>> The spec uses attribute grammars for dozens of things like this, where
>> some piece of information has to be determined from the parse tree.
>> See "Static Semantics: ElisionWidth" for a simple example.
Thank you. This sample is helpful. I was unsure how to describe this
information.
> The proposal also needs to follow the conventions and terminology of
> the ES6 RegExp spec. In particular, the RegExp pattern matching
> algorithms use the term "character" with a specific meaning rather
> than using "code unit" or "code point"
> (see http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics ).
> It does this so that the same algorithms can be use to describe the
> matching semantic of both "normal" and "unicode" patterns. Any
> extensions also will need t work with both kinds of patterns.
Based on these comments, I re-submit a modified proposal:
----
Abstract
Basically, the evaluation for the look-behind assertion is performed by
"rewind the target sequence and do the same thing as the look-ahead
assertion":
1. When compiling a regular expression pattern, count the total number
of PatternCharacter and . and \ AtomEscape and CharacterClass in
Disjunction per look-behind assertion. Each Atom may be followed by
{n}, but may not be followed by the other quantifiers.
2. When a look-behind assertion evaluated, rewind the target sequence
by that number and evaluate as if the look-ahead assertion from
that point.
--
Static Semantics: Early Errors
Assertion :: ( ? < = Disjunction )
Assertion :: ( ? < ! Disjunction )
It is a Syntax Error if Disjunction contains Quantifier ::
QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.
It is a Syntax Error if Disjunction contains AtomEscape ::
DecimalEscape.
When Disjunction is separated by the | regular expression, if the
number of the sequence of characters which the left Alternative
matches and the number of the sequence of characters which the right
Disjunction matches are different, then it is a Syntax Error.
Static Semantics: DisjunctionSize
1. return the total sum of the following Atoms in the Disjunction:
a. Atom :: PatternCharacter
b. Atom :: .
c. Atom :: \ AtomEscape except AtomEscape :: DecimalEscape
d. Atom :: CharacterClass
If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }",
then that Atom is counted as the number of DecimalDigits instead of
one. If the Disjunction contains an "Atom :: ( Disjunction )" or
"Atom :: ( ? : Disjunction )" that is followed by a "QuantifierPrefix
:: { DecimalDigits }", then the total number of the sequence of Atoms
in that inner-Disjunction is multiplied by the number of
DecimalDigits.
Assertion
The production Assertion :: ( ? < = Disjunction ) evaluates as follows:
1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
x and a Continuation c, and performs the following steps:
1. Let n be the result of DisjunctionSize for Disjunction.
2. Let xe be x's endIndex.
3. If xe < n, return failure.
4. Let xcap be x's captures List.
5. Let y be the State (xe-n, xcap).
6. Let d be a Continuation that always returns its State argument as a
successful MatchResult.
7. Call m(y, d) and let r be its result.
8. If r is failure, return failure.
9. Let z be r's State.
10. Let zcap be z's captures List.
11. Let a be the State (xe, zcap).
12. Call c(a) and return its result.
The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:
1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
x and a Continuation c, and performs the following steps:
1. Let n be the result of DisjunctionSize for Disjunction.
2. Let xe be x's endIndex.
3. If xe < n, then call c(x) and return its result.
4. Let xcap be x's captures List.
5. Let y be the State (xe-n, xcap).
6. Let d be a Continuation that always returns its State argument as a
successful MatchResult.
7. Call m(y, d) and let r be its result.
8. If r is not failure, return failure.
9. Call c(x) and return its result.
----
Nozomu
More information about the es-discuss
mailing list