Regexp microbenchmarks

Thomas Reilly treilly at
Mon Jun 2 06:10:31 PDT 2008

The VM handles recursion well, not sure about tail calls, I suspect that could be handled by the RE compiler (just generate a JUMP instead of NEST).

Anyways, yeah, the kazbah isn't rocked unless said RE implementation blows the doors off the state of the art and that might require additional VM work.

-----Original Message-----
From: Lars Hansen
Sent: Mon 6/2/2008 5:45 AM
To: Thomas Reilly; Mike Shaver; David Mandelin
Cc: tamarin-devel
Subject: RE: Regexp microbenchmarks
> -----Original Message-----
> From: tamarin-devel-bounces at 
> [mailto:tamarin-devel-bounces at] On Behalf Of Thomas Reilly
> Sent: 29. mai 2008 18:53
> To: Mike Shaver; David Mandelin
> Cc: tamarin-devel
> Subject: RE: Regexp microbenchmarks
> Our hallway discussions around RE can be summarized as:
> 5) Wouldn't it rock the kazbah to have the RE engine compile 
> its RE's to LIR and use TT's vm for RE execution?

It might.  Or it might not -- depending on how well the VM
handles typical compiled RE code, which does not look like
typical compiled JS code (a lot of tail calls, and then some
potentially really deep recursions).  

And of course it won't do any good at all unless the RE compiler
performs a number of back-end independent optimizations.  So I
would definitely not pursue going to native code before many
other things were in place.  Presumably the spidermonkey RE
engine performs some of these already.


> It would be awesome to have an intern make #4 a reality.  If 
> he did #5 he should get a medal (and some job offers).
> -----Original Message-----
> From: tamarin-devel-bounces at on behalf of Mike Shaver
> Sent: Thu 5/29/2008 6:19 AM
> To: David Mandelin
> Cc: tamarin-devel
> Subject: Re: Regexp microbenchmarks
> 2008/5/28 David Mandelin <dmandelin at>:
> > An intern who arrives in about 3 weeks might be interested 
> in working 
> > on TT regular expressions, so I've been doing a bit of prep work to 
> > understand the problem. In particular, I ran a couple of 
> > microbenchmarks testing certain regular expressions using both the 
> > standard regular expression libraries and a hand-coded matching 
> > routine. I'm posting the code and results for just one, as 
> the results were similar in both.
> >
> >   regexp:      /\"([^\"\\]|\\\"|\\\\)+\"/
> >                (simplified C string literal token)
> >   text:        "This is \"my\" string." And this is not.
> >   iterations:  100,000
> >
> >   Time taken in seconds:
> >                              Code
> >   Runtime       Regexp Lib             Hand Coded
> >
> >   SM              .685                    .615
> >     TT            29.5                      .545
> Given those numbers, would it not be perhaps be more 
> efficient to simply replace TT's regexp engine with SM's?  
> There are improvements yet to come to SM, so the gap to 
> hand-coded might close more, but there's also a lot of work 
> that goes into making an ECMA-and-web-compatible regexp 
> engine -- do we really want to undertake a from-scratch 
> rewrite when most of the gain is available through a refactor 
> and integration of SM's engine?
> Mike
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at

More information about the Tamarin-devel mailing list