Look-behind proposal

Nozomu Katō noz.ka at akenotsuki.com
Tue May 19 11:14:52 UTC 2015


Allen Wirfs-Brock wrote on Mon, 18 May 2015, at 15:08:13 -0700:
> 
> On May 18, 2015, at 2:20 PM, Jason Orendorff wrote:
> 
>> On Mon, May 18, 2015 at 8:50 AM, Nozomu Katō wrote:
>>> It is a Syntax Error if Disjunction contains Quantifier ::
>>> QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.
>>
>> Backreferences must be ruled out, too.

Yes, I forgot to mention backreferences in my proposal. I have modified
my proposal to reflect this.

>>>  1. Let n be the exact number of the sequence of code points which
>>>     Disjunction matches [NOTE].
>>
>> I don't think a NOTE is strong enough for spec purposes, so you can
>> improve the proposal by adding a "Static Semantics:" section that
>> formally specifies this.
>>
>> The spec uses attribute grammars for dozens of things like this, where
>> some piece of information has to be determined from the parse tree.
>> See "Static Semantics: ElisionWidth" for a simple example.

Thank you. This sample is helpful. I was unsure how to describe this
information.

> The proposal also needs to follow the conventions and terminology of
> the ES6 RegExp spec.  In particular, the RegExp pattern matching
> algorithms use the term "character" with a specific meaning rather
> than using "code unit" or "code point"
> (see http://people.mozilla.org/~jorendorff/es6-draft.html#sec-pattern-semantics ).
> It does this so that the same algorithms can be use to describe the
> matching semantic of both "normal" and "unicode" patterns. Any
> extensions also will need t work with both kinds of patterns.

Based on these comments, I re-submit a modified proposal:

----
Abstract

Basically, the evaluation for the look-behind assertion is performed by
"rewind the target sequence and do the same thing as the look-ahead
assertion":

  1. When compiling a regular expression pattern, count the total number
     of PatternCharacter and . and \ AtomEscape and CharacterClass in
     Disjunction per look-behind assertion. Each Atom may be followed by
     {n}, but may not be followed by the other quantifiers.
  2. When a look-behind assertion evaluated, rewind the target sequence
     by that number and evaluate as if the look-ahead assertion from
     that point.

--
Static Semantics: Early Errors

  Assertion :: ( ? < = Disjunction )
  Assertion :: ( ? < ! Disjunction )

  It is a Syntax Error if Disjunction contains Quantifier ::
  QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }.

  It is a Syntax Error if Disjunction contains AtomEscape ::
  DecimalEscape.

  When Disjunction is separated by the | regular expression, if the
  number of the sequence of characters which the left Alternative
  matches and the number of the sequence of characters which the right
  Disjunction matches are different, then it is a Syntax Error.


Static Semantics: DisjunctionSize
1. return the total sum of the following Atoms in the Disjunction:
     a. Atom :: PatternCharacter
     b. Atom :: .
     c. Atom :: \ AtomEscape except AtomEscape :: DecimalEscape
     d. Atom :: CharacterClass

   If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }",
   then that Atom is counted as the number of DecimalDigits instead of
   one. If the Disjunction contains an "Atom :: ( Disjunction )" or
   "Atom :: ( ? : Disjunction )" that is followed by a "QuantifierPrefix
   :: { DecimalDigits }", then the total number of the sequence of Atoms
   in that inner-Disjunction is multiplied by the number of
   DecimalDigits.


Assertion

The production Assertion :: ( ? < = Disjunction ) evaluates as follows:

1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
   x and a Continuation c, and performs the following steps:
  1. Let n be the result of DisjunctionSize for Disjunction.
  2. Let xe be x's endIndex.
  3. If xe < n, return failure.
  4. Let xcap be x's captures List.
  5. Let y be the State (xe-n, xcap).
  6. Let d be a Continuation that always returns its State argument as a
     successful MatchResult.
  7. Call m(y, d) and let r be its result.
  8. If r is failure, return failure.
  9. Let z be r's State.
  10. Let zcap be z's captures List.
  11. Let a be the State (xe, zcap).
  12. Call c(a) and return its result.

The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:

1. Evaluate Disjunction to obtain a Matcher m.
2. Return an internal Matcher closure that takes two arguments, a State
   x and a Continuation c, and performs the following steps:
  1. Let n be the result of DisjunctionSize for Disjunction.
  2. Let xe be x's endIndex.
  3. If xe < n, then call c(x) and return its result.
  4. Let xcap be x's captures List.
  5. Let y be the State (xe-n, xcap).
  6. Let d be a Continuation that always returns its State argument as a
     successful MatchResult.
  7. Call m(y, d) and let r be its result.
  8. If r is not failure, return failure.
  9. Call c(x) and return its result.

----

Nozomu


More information about the es-discuss mailing list