RegExp.prototype.count

kai zhu kaizhu256 at gmail.com
Sun Jan 20 07:40:00 UTC 2019


+1 for string.count

i don’t think the g-flag is necessary in str.split, so the original performance claims are still valid:
- for counting regexp - use split + length
- for counting substring - use while + indexOf

> On 20 Jan 2019, at 12:22 AM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
> 
> Nit: you should use `.spilt(/\n/g)` to get all parts.
> 
> I like the benchmarks here. That's much appreciated, and after further
> investigation, I found a *giant* WTF:
> https://jsperf.com/regexp-counting-2/8
> 
> TL;DR: for string character counting, prefer `indexOf`.
> 
> For similar reasons to that JSPerf thing, I'd like it to be on the
> String prototype rather than the RegExp prototype, as in
> `str.count(/\n/)`.
> 
> -----
> 
> Isiah Meadows
> contact at isiahmeadows.com
> www.isiahmeadows.com
> 
> On Sun, Jan 20, 2019 at 12:33 AM kai zhu <kaizhu256 at gmail.com> wrote:
>> 
>> benchmarked @isiah’s while-loop test-case vs str.split vs str.replace for regexp counting on jsperf.com [1], and the results were surprising (for me).
>> 
>> benchmarks using 1mb random ascii-string from fastest to slowest.
>> 1. (fastest - 1,700 runs/sec) regexp-counting with ```largeCode.split(/\n/).length - 1```
>> 2. (40% slower - 1000 runs/sec) regexp-counting with ```while-loop (/n/g)```
>> 3. (60% slower - 700 runs/sec) regexp-counting with ```largeCode.replace((/[^\n]+/g), "").length```
>> 
>> looks like the go-to design-pattern for counting-regexp is ```str.split(<regexp>).length - 1```
>> 
>> [1] regexp counting 2
>> https://jsperf.com/regexp-counting-2
>> 
>> On 13 Jan 2019, at 9:15 PM, Isiah Meadows <isiahmeadows at gmail.com> wrote:
>> 
>> If performance is an issue, regular expressions are likely to be too slow to begin with. But you could always do this to count the number of lines in a particular string:
>> 
>> ```js
>> var count = 0
>> var re = /\n|\r\n?/g
>> while (re.test(str)) count++
>> console.log(count)
>> ```
>> 
>> Given it's already this easy to iterate something with a regexp, I'm not convinced it's necessary to add this property/method.
>> On Sat, Jan 12, 2019 at 17:29 kai zhu <kaizhu256 at gmail.com> wrote:
>>> 
>>> a common use-case i have is counting newlines in largish (> 200kb) embedded-js files, like this real-world example [1].  ultimately meant for line-number-preservation purposes in auto-lint/auto-prettify tasks (which have been getting slower due to complexity).
>>> 
>>> would a new RegExp count-method like ```(/\n/g).count(largeCode)``` be significantly more efficient than existing ```largeCode.split("\n").length - 1``` or ```largeCode.replace((/[^\n]+/g), "").length```?
>>> 
>>> -kai
>>> 
>>> [1] calculating and reproducing line-number offsets when linting/autofixing files
>>> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7377
>>> https://github.com/kaizhu256/node-utility2/blob/2018.12.30/lib.jslint.js#L7586
>>> 
>>> _______________________________________________
>>> es-discuss mailing list
>>> es-discuss at mozilla.org
>>> https://mail.mozilla.org/listinfo/es-discuss
>> 
>> 



More information about the es-discuss mailing list