Proposal to make gloda fulltext tokenizer treat '_' as punctuation without schema bump

Tanstaafl tanstaafl at
Tue Jul 17 10:34:50 UTC 2012

On 2012-07-17 6:23 AM, Gervase Markham <gerv at> wrote:
> On 17/07/12 01:27, Andrew Sutherland wrote:
>> I don't like bumping the gloda schema rev because it has the very bad UX
>> of "I upgraded Thunderbird and now Thunderbird is using a lot of my CPU
>> and if I do gloda searches right now, they might not find anything".
>> The argument for making the fix and not bumping the schema is that
>> treating underscores as part of the word is arguably messed up right now.

> Are there any other schema-breaking changes on the horizon which you
> could roll in to the same update? I'd say this one is worth waiting up
> to 6 months for if we can eliminate a second change later.

Good idea...

> Can we make the user more informed about what's happening - e.g. a
> "Database reindexing (X% complete)" status bar message?

As one who got bit really really bad when GLODA was first implemented 
and screamed bloody murder -

(I had 20+ IMAP accounts, many with many GB of messages and many with 
many dozens of folders, most of which were *not* set to offline mode but 
with a very few carefully configured selective offline folders defined, 
and when this change happened, all of my carefully/explicitly defined 
offline settings were just trashed, *all* folders set to full offline 
mode, and Thunderbird was ... well, it was just toast. It took me a 
while googling to figure out what happened) -

Yes, some kind of status message would imho be very desirable for 
*anything* that has a potential for impacting performance.

But wouldn't it be better to simply write the index/reindex code so that 
it simply doesn't and *can* not consume all CPU cycles? Is there no way 
to throttle it so that it never uses more that say 20%?

More information about the tb-planning mailing list