Proposal to make gloda fulltext tokenizer treat '_' as punctuation without schema bump

Andrew Sutherland asutherland at
Tue Jul 17 17:32:36 UTC 2012

On 07/17/2012 03:23 AM, Gervase Markham wrote:
> Are there any other schema-breaking changes on the horizon which you
> could roll in to the same update? I'd say this one is worth waiting up
> to 6 months for if we can eliminate a second change later.

Oops, forgot to cover this aspect.  Unless a new contributor shows up, 
no, there are no schema-breaking changes on the horizon.  If there were, 
the path would be more clear.

> Can we make the user more informed about what's happening - e.g. a
> "Database reindexing (X% complete)" status bar message?

Yes and no.  Gloda reports its activity to the activity manager, but its 
progress is usually not so precise; progress is usually knowing how many 
folders have been indexed out of how many, and then progress within a 
folder.  We could count the number of messages gloda knows about before 
blowing away the database, although that could cause bad jank because 
running a COUNT(*) query actually requires counting the rows, and that 
would currently happen synchronously at startup.  One possibility would 
be to synchronously move the database to the side via renaming it, 
create our new database like normal, and then run the COUNT 
asynchronously after some number of seconds (to give the rest of 
Thunderbird time to start up), then nuke the old database.

The activity manager currently reports all activity except for gloda on 
the status bar.  This is primarily because gloda's event-driven indexing 
gets triggered by most user activity and would be distracting and tend 
to hide the status bar report for the actions actually taken.  A 
higher-level status could certainly be whitelisted/logged to the status bar.


More information about the tb-planning mailing list