RegExp.escape()

Benjamin Gruenbaum benjamingr at gmail.com
Sat Jun 13 19:03:06 UTC 2015


What about that part in particular?

> That said - I'm very open to allowing implementations to escape _more_
than `SyntaxCharacter` in their implementations and to even recommend  that
they do so in such a way that is consistent with their regular expressions.
What do you think about doing that?

If we go with `.escape` (and not tag at this stage) - implementations
extending the regexp syntax(which is apparently allowed?) to add
identifiers should be allowed to add identifiers to escape?

This sounds like the biggest barrier at this point from what I understand.
I'm also considering a bit of `as if` to allow implementations to, for
example, not escape some characters inside `[...]` as long as the end
result is the same.



On Sat, Jun 13, 2015 at 9:57 PM, Mark S. Miller <erights at google.com> wrote:

> On Sat, Jun 13, 2015 at 11:39 AM, Benjamin Gruenbaum <benjamingr at gmail.com
> > wrote:
>
>> On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com>
>> wrote:
>>
>>> On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote:
>>>
>>>>  All of these should be building on top of RegExp.escape :P
>>>>
>>>
>>> It's funny how, by considering it as leading to a proposal, I quickly
>>> saw deep flaws that I was previously missing.
>>>
>>>
>> That was a big part of making a proposal out of it - to find these things
>> :)
>>
>
> Indeed! Much appreciated.
>
>
>
>>
>>
>>> the overall result does not do this. For example:
>>>
>>>     const data = ':x';
>>>     const rebad = RegExp.tag`(?${data})`;
>>>     console.log(rebad.test('x')); // true
>>>
>>> is nonsense. Since the RegExp grammar can be extended per platform, the
>>> same argument that says we should have the platform provide RegExp.escape
>>> says we should have the platform provide RegExp.tag -- so that they can
>>> conisistently reflect these platform extensions.
>>>
>>>
>> This is a good point, I considered whether or not `-` should be included
>> for a similar reason. I think it is reasonable to only include syntax
>> identifiers and expect users to deal with parts of patterns of more than
>> one characters themselves (by wrapping the string with `()` in the
>> constructor). This is what every other language does practically.
>>
>> That said - I'm very open to allowing implementations to escape _more_
>> than `SyntaxCharacter` in their implementations and to even recommend  that
>> they do so in such a way that is consistent with their regular expressions.
>> What do you think about doing that?
>>
>> I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm
>> not sure if we have a way in JavaScript to not make a capturing group out
>> of it.
>>
>
> Better or different escaping is not issue of this first bullet, but
> rather, validating that a fragment is a valid fragment for that regexp
> grammar. For the std grammar, "(?" is not a valid fragment and the tag
> should have rejected the template with an error on that basis alone.
>
>
>
>
>>
>>
>>> * Now that we have modules, I would like to see us stop having each
>>> proposal for new functionality come at the price of further global
>>> namespace pollution. I would like to see us transition towards having most
>>> new std library entry points be provided by std modules. I understand why
>>> we haven't yet, but something needs to go first.
>>>
>>>
>> I think that doing this should be an eventual target but I don't think
>> adding a single much-asked-for static function to the RegExp function would
>> be a good place to start. I think the committee first needs to agree about
>> how this form of modularisation should be done - there are much bigger
>> targets first and I would not like to see this proposal tied and held back
>> by that (useful) goal.
>>
>
> I agree, but this will be true for any individual proposal.
>
> Perhaps we need a sacrificial "first penguin through the ice" proposal
> whose *only* purpose is to arrive as a std import rather than a std
> primordial.
> (Just kidding.)
>
>
>>
>>
>>> * ES6 made RegExp subclassable with most methods delegating to a common
>>> @exec method, so that a subclass only needs to consistently override a
>>> small number of things to stay consistent. Neither RegExpSubclass.escape
>>> nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because
>>> of the first bullet, RegExpSubclass.tag also cannot be derived from
>>> RegExpSubclass.escape. But having RegExpSubclass.escape delegating to
>>> RegExpSubclass.tag seem weird.
>>>
>>>
>> Right but it makes sense that `escape` does not play in this game since
>> it is a static method that takes a string argument - I'm not sure how it
>> could use @exec.
>>
>
> I agree that defining a class-side method to delegate to an instance-side
> method is unpleasant. But because we have class-side inheritance, static
> methods should be designed with this larger game in mind.
>
>
>
>>
>>
>>> * The instanceof below prevents this polyfill from working cross-frame.
>>> Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where
>>> RegExpSubclass2.source produces a regexp grammar string that
>>> RegExpSubclass1 does not understand, I have no idea what the composition
>>> should do other than reject with an error. But what if the strings happen
>>> to be mutually valid but with conflicting meaning between these subclasses?
>>>
>>> This is hacky, but in my code I just did `argument.exec ? treatAsRegExp
>> : treatAsString`.
>>
>
> Yes, as with instanceof, that's the difference between the quality needed
> in a polyfill for personal use vs a proposed std.
>
>
>
> --
>     Cheers,
>     --MarkM
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/f48a0b3e/attachment-0001.html>


More information about the es-discuss mailing list