RegExps that don't modify global state?

Claude Pache claude.pache at gmail.com
Tue Sep 23 07:04:43 PDT 2014


Le 16 sept. 2014 à 20:16, Domenic Denicola <domenic at domenicdenicola.com> a écrit :

> I had a conversation with Jaswanth at JSConf EU that revealed that RegExps cannot be used in parallel JS because they modify global state, i.e. `RegExp.$0` and friends.
> 
> We were thinking it would be nice to find some way of getting rid of this wart. One idea would be to bundle the don't-modify-global-state behavior with the `/u` flag. Another would be to introduce a new flag to opt-out. The former is a bit more attractive since people will probably want to use `/u` all the time anyway. I imagine there might be other possibilities others can think of.

Another idea is to  to define a variant of the `RegExp.prototype.exec()` method, that does the Right Thing (doesn't read/write stuff on the RegExp instance, nor on the RegExp global, nor I don't know where):

    RegExp.prototype.run (str, params):
    
        Do what is currently specified for `RegExp.prototype.exec`, except that:
        
            * global, sticky and lastIndex properties are read and written on `params` instead of `this`
            * implementations are not allowed to extend that method in order to mess with `RegExp`, etc.

All other (legacy) methods are rewritten in terms of `RE.p.run` (in the current ES6 draft, they are mostly written in terms of `RE.p.exec`).
For example:
            
    RegExp.prototype.exec (str):
    
        1. Check that `this` is a RegExp.
        2. Let `result = this.run(str, this)`.
        3. Populate `RegExp.$1`, etc.
        4. Return `result`.


    RegExp.prototype.split (str): — (somewhat simplified for expository purpose)
    
        1. Check that `this` is a RegExp.
        2. Coerce `str` to a string.
        3. Let `params = { lastIndex: 0, global: true, sticky: false }`.
        4. Do a series of calls to `this.run(str, params)` in order to find where to split the string.
        5. Return the splitted string.


More interestingly, it is now possible to write brand new methods based on `RegExp.prototype.run`, that are not handicapped with legacy stuff.
In particular, note that the following `RE.p.iterate` generator is not confused by unexpected changes of `this.lastIndex`, 
because it uses a locally scoped version of `lastIndex` instead:

    RegExp.prototype.iterate = function* (str) {
        if (!IsRegExp(this))
            throw new TypeError
        str = ToString(str)
        let params = { lastIndex: 0, global: true, __proto__: this }
        let previousLastIndex = 0
        let result
        while ((result = this.run(str, params)) !== null) {
            yield result
            if (params.lastIndex <= previousLastIndex)
                params.lastIndex = previousLastIndex + 1
            previousLastIndex = params.lastIndex
        }
    }

    String.prototype.replaceAll = function(rx, replacement) {
        var input = ToString(this)
        var result = ''
        var pos = 0
        for (let match of rx.iterate(str)) {            
            // ... left as nontrivial exercise to the reader ...
        }
        return result
    }
    
—Claude




More information about the es-discuss mailing list