Proposal to make gloda fulltext tokenizer treat '_' as punctuation without schema bump

Andrew Sutherland asutherland at
Tue Jul 17 17:15:43 UTC 2012

On 07/17/2012 03:34 AM, Tanstaafl wrote:
> But wouldn't it be better to simply write the index/reindex code so 
> that it simply doesn't and *can* not consume all CPU cycles? Is there 
> no way to throttle it so that it never uses more that say 20%? 

The code does use adaptive scheduling to try and detect how much 
CPU/system time it is using, as well as to notice when the system 
appears to be under load (many thanks to rkent for this!) in order to 
limit its activities so it doesn't harsh the system. Unfortunately, this 
is a tricky thing to do given the limited platform facilities at hand 
and how much stuff happens and needs to happen on the main thread in 
Thunderbird.  It is possible that virus checkers are making this much 
worse on windows, but I don't have any hard numbers.

Right now the CPU targets are for 50% utilization while the user is 
using Thunderbird and 83% while the user is not using Thunderbird.

This is an area where I would be very happy to work with someone who has 
the time to get some actual numbers by using profilers like Xperf and/or 
augmenting our telemetry reports and delving through them.


