Gloda and large IMAP stores (was: Re: Worthwhile Thunderbird projects/addons?)

Andrew Sutherland asutherland at asutherland.org
Tue Dec 21 20:43:31 UTC 2010


(Starting a new thread in reply to message-id: 
<4D10AC85.6040407 at libertytrek.org>)

On 12/21/2010 05:32 AM, Tanstaafl wrote:
> I have multiple IMAP accounts, most with 10+GB of mail in each. This
> results in a HUGE performance hit when an account is first setup unless
> I disable GLODA completely, as well as a huge resultant GLODA index file
> that is stored in the roaming profile by default which is a non-starter
> for corporations that don't use redirected folders...

There are 3 issues here:

1) Gloda performance.  This can be addressed by trying to move more of 
the indexing process off of the main thread onto a worker that gets 
de-prioritized and being able to reduce the I/O burden of the main 
thread, such as leverage the newish ability to open .msf files 
asynchronously.

2) Gloda disk space usage.  This can be improved without too much work 
by improving the fulltext search tokenizer to avoid indexing stopwords 
and some column-ordering optimizations to reduce encoding overhead.  A 
fancier thing we could do, once we move to sharding and the like, is to 
avoid fulltext indexing messages that don't seem likely to be the kind 
of thing the user wants to search for, and instead rely on fallback 
searching for those cases.  However, that is somewhat more difficult to 
do and not a first step just because it is more complex.

3) Gloda databases and roaming storage.  This is a flat-out difficult 
problem.  The most clear-cut ways to deal with this and maintain 
correctness are: a) to require IMAP and have per-machine profiles using 
something like Weave/Firefox sync to keep the profiles vaguely in sync, 
or b) have a server-based gloda-like solution.  The bottom line is that 
this is really hard and Thunderbird is not focused on the Enterprise and 
so this one is unlikely to get solved other than making sure enterprises 
can disable gloda en masse (which I think is sorta addressed?)

Note: I believe there are bugs filed on all of these if people are 
interested in working any of these things.

> What I'd like to see is the IMAP stores moved to the Local Folders by
> default, and I'd like for GLODA to have a separate index file for each
> account... which could then be 'linked' to provide seamless searching
> across all accounts.

Although I am hoping/planning to shard gloda's data into separate 
databases, I don't think we would break things along account lines 
(although it would be made more tractable).  The rationale is that to 
improve performance by leveraging locality we want to put things that 
are likely to be accessed together in the same database.  If someone has 
completely segregated home/work setups, I could see this being useful.  
Of course, that would also be an argument for the user just using 
separate profiles, most likely by using separate user accounts as 
provided by their OS.


> I would also like to see some kind of GLODA option for indexing only the
> *full* *including custom) headers, since I rarely do body searches.

What's the use-case for this?  I ask because one of the goals of gloda 
has been to facilitate extensions providing a deeper understanding of 
messages at more than just a fulltext search level.  For example, though 
it's never gotten much exposure and may be bit-rotted currently, there's 
an extension that understands bugzilla daemon e-mails.  It associates 
bugmail explicitly with the bugs they are from or reference, as well as 
transforming the author of the messsage from the bugzilla daemon to the 
bugzilla user who caused the message to be sent.  If you wanted to 
search bugmail triggered by a specific user you could obviously use 
search on headers to accomplish this, but arguably being able to write 
an extension and have that picked up is pretty powerful and a better way 
to do things, especially because it's more easily shared with other users.


> Then, I'd love to see some discussion between TB devs and Timo (the
> dovecot developer) on how to best work together to provide fast body
> search capabilities without having to have a full local mirror of the
> entire IMAP store (could work with the above GLODA headers only option)
> - like, maybe, creation of a new 'special' folder on the IMAP server for
> storing things like sqlite databases (GLODA indexes, Address
> Books/Contacts, etc). I do realize this would be specific to one IMAP
> server, but I honestly believe that server-side indexing - like server
> side filtering - is the wave of the future, seeing as how huge IMAP
> based mail stores are getting more and more prevalent these days (as a
> result of how cheap storage is I guess)...

In short: I agree.

This is one of those unpleasant situations where many platforms are at 
odds with each other because they don't completely overlap which results 
in redundantly overlapping features.  Thunderbird is cross-platform and 
cross-server so it can't leverage operating system/environment-specific 
or server-specific mechanisms if it wants to benefit all its users.  
Likewise, it makes sense for GNOME to build their own thing that is not 
Thunderbird specific, and for Dovecot to do things that are not 
Thunderbird specific.

This is one of the reasons I have been pushing for and trying to port as 
much of our logic to reasonably well-abstracted JavaScript.  The less 
code there is that is Thunderbird and client-specific, the greater the 
potential for us to be able to share and reuse code across multiple 
platforms, especially as massive performance improvements in JavaScript 
engines makes it tractable for use on all tiers.  For example, if we 
were able to run the gloda faceting logic on the server/cloud tier as 
well as the client side, we can operate in both the 
1-computer-all-local-storage Thunderbird case as well as a thin-client 
web-browser type mode of operation (likely augmented by some local 
caching using IndexedDB) for both mobile devices and full-size computer 
browsers.

So I guess my most specific response here is that I do not think it 
makes sense to do something completely Dovecot/IMAP specific, but I 
completely agree that the future of messaging is not full-weight 
Thunderbird clients on all devices with all messages locally stored 
backed by a traditional IMAP server.  Mobile devices and other tablet-y 
devices that have power and storage limitations are real things whose 
popularity is not going to diminish.  And even as their storage 
capabilities increase, power is going to be a significant concern for a 
very long time which definitely means that mobile devices should not be 
performing heavyweight indexing of all of your messages.  I do think we 
definitely want to work with Dovecot and others who are interested in 
benefiting users to get to a situation where we can deal with this 
reality and do so without having a whole bunch of independent platforms 
that can only speak old-school IMAP at each other.

The major stumbling block is that this is not something we can do in one 
single step, especially given the resources we have on hand.  And not 
all of the steps are able to be palatable to all of the use-cases that 
Thunderbird has traditionally addressed.  One outcome of this is the 
potential unsuitability of gloda to roaming profiles at the current time 
(at least without very careful configuration of Thunderbird).  This is 
why a lot of work is happening as extensions; they allow us to operate 
within the Thunderbird framework while leaving most of the existing UI 
intact.

Having said that, there are obvious compromises that are frustrating to 
the non-gloda usage cases.  While I believe that the quick filter bar is 
a significant improvement on the previous quick-search implementation, 
splitting it out into its own bar that is not on the toolbar is 
obviously a net loss for non-gloda users in stock Thunderbird.  The 
trade-off is always between being able to reach the next step in the 
long-term plan more quickly versus trying to make sure there is no 
reduction is usability for the use-cases that are not our focus.  In 
this case, the choice for the quick filter bar was to leave it up to an 
extension to deal with the screen real-estate problem by dealing with 
all the complexity of toolbar customization.  While the netbook use-case 
(which has serious screen real-estate needs) was also somewhat 
regressed, we have been explicitly focusing on improving that use-case 
with the HomeTab/Thunderbird Air experiment.

Andrew



More information about the tb-planning mailing list