Proposal to make gloda fulltext tokenizer treat '_' as punctuation without schema bump
asutherland at asutherland.org
Tue Jul 17 17:32:36 UTC 2012
On 07/17/2012 03:23 AM, Gervase Markham wrote:
> Are there any other schema-breaking changes on the horizon which you
> could roll in to the same update? I'd say this one is worth waiting up
> to 6 months for if we can eliminate a second change later.
Oops, forgot to cover this aspect. Unless a new contributor shows up,
no, there are no schema-breaking changes on the horizon. If there were,
the path would be more clear.
> Can we make the user more informed about what's happening - e.g. a
> "Database reindexing (X% complete)" status bar message?
Yes and no. Gloda reports its activity to the activity manager, but its
progress is usually not so precise; progress is usually knowing how many
folders have been indexed out of how many, and then progress within a
folder. We could count the number of messages gloda knows about before
blowing away the database, although that could cause bad jank because
running a COUNT(*) query actually requires counting the rows, and that
would currently happen synchronously at startup. One possibility would
be to synchronously move the database to the side via renaming it,
create our new database like normal, and then run the COUNT
asynchronously after some number of seconds (to give the rest of
Thunderbird time to start up), then nuke the old database.
The activity manager currently reports all activity except for gloda on
the status bar. This is primarily because gloda's event-driven indexing
gets triggered by most user activity and would be distracting and tend
to hide the status bar report for the actions actually taken. A
higher-level status could certainly be whitelisted/logged to the status bar.
More information about the tb-planning