"Approx-equal" operator

Dmitry Soshnikov dmitry.soshnikov at gmail.com
Sun Dec 18 10:43:10 PST 2011


On 17.12.2011 20:19, Lasse Reichstein wrote:
> On Sat, Dec 17, 2011 at 12:12 PM, Dmitry Soshnikov
> <dmitry.soshnikov at gmail.com>  wrote:
>> Hi,
>>
>> Just recently was working with Ruby's code. And found useful again its
>> (actually from Perl) "approximately equal" operator: =~
>>
>> The operator is just a sugar for `test' method of RegExp.
>>
>> if (/ecma/.test("ecmascript")) {
> ...
>> if ("ecmascript" ~= /ecma/) {
> So you save three characters (one, if you had paren-free invocation).
> I personally don't find it more readable.

Yep! (and the argument about three characters isn't essential) ;)

It seems obvious and goes without saying that ~= is better than .test(). 
Perhaps it's just IMO though, I can't insist. I just found it very 
convenient in other languages.

>> And the other thing is "RegExp-substringing" with using bracket notation:
>> string[RegExp, startIndex].
>>
>> "ecmascript"[/ecma/, 0]; // "ecma"
> That's already valid syntax (stupid code, but valid). The result is "e".

Oh, my bad. Yes, it's already valid. Well, then we may consider other 
options. Have to think.

>> This is actually the sugar for:
>>
>> "ecmascript".match(/ecma/)[0]; // "ecma"
> You would want to handle the case where match returns null.
>
> Add:
>   String.prototype.get = function(re, n) { var res = re.exec(this);
> return res ? res[n] : null; };
> and you have:
>
>    "ecmascript".get(/ecma/, 0) == "ecma"
>
> (feel free to make it non-enumerable).

My fault I described it not clear. In string[regexp, startIndex] is 
exactly start index -- from where to start search in the string. It's 
not related to the index of `match' result. Anyway, this syntax is 
already borrowed.

>> E.g. a simple lexer:
>>
>> var code = "var a = 10;"
>> var cursor = 0;
>>
>> while (cursor<  code.length) {
>>
>>     var chunk = code[cursor .. -1]; // sugar for slice: code.slice(cursor,
>> cursor.length);
>>
>>     if (identifier = chunk[/\A([a-z]\w*)/, 1]) {
>>         // handle identifier token
>>     }
>>
>>     else if (number = chunk[/\A([0-9]+)/, 1]) {
>>         // handle numbers
>>     }
> ...
>> Thoughts?
> I don't think the advantage of slightly shorter code is worth the
> extra syntactic complexity from adding two new constructions.

I love these arguments ;) But in fact -- of course it's worth. 
Especially, if the shortness makes it easier and more convenient.

By the way, are there syntactic complexity for the "~=" operator?

> Especially since they only work with RegExps. If it was more generic,
> in some way, it might be more reasonable to make operators for it.
>
> And it's not even more readable (IMO) than:
>
>     var chunk = code.substring(cursor);
>     if (identifier = getMatch(chunk, /\A([a-z]\w*)/, 1)) {
>         // handle identifier token
>     } else if (number = getMatch(chunk, /\A([0-9]+)/, 1])) {
>       // handle numbers
>    }

Of course, since you already used to. Had people already have such 
operators, nobody would write these function calls then.

> and for efficiency, I'd avoid the substring, and use single
> invocations of global regexps.

It's already another topic, you may still catch the regexps and with 
using proposed operators.

> This seems like something that can easily be abstracted into a helper
> function, and come
> out looking even better.
>
>    var code;  // some string.
>    var cursor;  // a position.
>    var idMatch = /[a-z]\w*/ig;
>    var numMatch =  /[0-9]+/g;
>    // ...
>    function check(re, n) {
>      n = n || 0;
>      re.lastIndex = cursor;
>      var res = re.exec(code);
>      if (res) {
>        cursor = re.lastIndex;
>        return res[n];
>      }
>      return null;
>    }
>
>    // and inside some loop:
>    ...
>    if (identifier = check(idMatch, 0)) {
>       // handle identifier
>    } else if (numeral = check(numMatch, 0)) {
>       // handle identifier
>    }
>
>
> But this might come from me preferring to hide regexps away inside
> abstractions. Using a RegExp is an implementation detail - it's just
> one way to find something in a string, and there might be other, and
> you might want to change implementation over time. Hard-coding regexps
> into an interface gives them too much exposure, and making extra
> operators just for regexps also puts too much focus on them.

Yes, this is also true, usually in such cases it's better to abstract 
things and provide some getters for this. But it was just an example to 
show the proposal, it's not the talk about lexer implementation.

> If the language is built as a text processor, like Perl being heavily
> influenced by AWK, it makes sense to have RegExps as a primary and
> preferred feature. In ECMAScript, which has a more general-purpose
> design, I don't think they should be given preferred treatment. A
> class with methods is perfectly fine for what they do.

Perhaps, but I don't see why we can't have strong and powerful regexp 
constructions too.

>   If ECMAScript
> had raw strings, the RegExp literal wouldn't even be necessary.

How that? Can you explain?

Dmitry.


More information about the es-discuss mailing list