Regexp microbenchmarks

Lars Hansen lhansen at adobe.com
Mon Jun 2 05:45:12 PDT 2008


> -----Original Message-----
> From: tamarin-devel-bounces at mozilla.org 
> [mailto:tamarin-devel-bounces at mozilla.org] On Behalf Of Thomas Reilly
> Sent: 29. mai 2008 18:53
> To: Mike Shaver; David Mandelin
> Cc: tamarin-devel
> Subject: RE: Regexp microbenchmarks
> 
> 
> Our hallway discussions around RE can be summarized as:
> 
> 5) Wouldn't it rock the kazbah to have the RE engine compile 
> its RE's to LIR and use TT's vm for RE execution?

It might.  Or it might not -- depending on how well the VM
handles typical compiled RE code, which does not look like
typical compiled JS code (a lot of tail calls, and then some
potentially really deep recursions).  

And of course it won't do any good at all unless the RE compiler
performs a number of back-end independent optimizations.  So I
would definitely not pursue going to native code before many
other things were in place.  Presumably the spidermonkey RE
engine performs some of these already.

--lars

> 
> It would be awesome to have an intern make #4 a reality.  If 
> he did #5 he should get a medal (and some job offers).
> 
> -----Original Message-----
> From: tamarin-devel-bounces at mozilla.org on behalf of Mike Shaver
> Sent: Thu 5/29/2008 6:19 AM
> To: David Mandelin
> Cc: tamarin-devel
> Subject: Re: Regexp microbenchmarks
>  
> 2008/5/28 David Mandelin <dmandelin at mozilla.com>:
> > An intern who arrives in about 3 weeks might be interested 
> in working 
> > on TT regular expressions, so I've been doing a bit of prep work to 
> > understand the problem. In particular, I ran a couple of 
> > microbenchmarks testing certain regular expressions using both the 
> > standard regular expression libraries and a hand-coded matching 
> > routine. I'm posting the code and results for just one, as 
> the results were similar in both.
> >
> >   regexp:      /\"([^\"\\]|\\\"|\\\\)+\"/
> >                (simplified C string literal token)
> >   text:        "This is \"my\" string." And this is not.
> >   iterations:  100,000
> >
> >   Time taken in seconds:
> >                              Code
> >   Runtime       Regexp Lib             Hand Coded
> >
> >   SM              .685                    .615
> >     TT            29.5                      .545
> 
> Given those numbers, would it not be perhaps be more 
> efficient to simply replace TT's regexp engine with SM's?  
> There are improvements yet to come to SM, so the gap to 
> hand-coded might close more, but there's also a lot of work 
> that goes into making an ECMA-and-web-compatible regexp 
> engine -- do we really want to undertake a from-scratch 
> rewrite when most of the gain is available through a refactor 
> and integration of SM's engine?
> 
> Mike
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel
> 
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel
> 


More information about the Tamarin-devel mailing list