Thunderbird Conversations : feedback wanted!

Andrew Sutherland asutherland at asutherland.org
Tue Nov 30 03:54:42 UTC 2010


I saw some discusson on #maildev about slow gloda queries.  If you could 
augment your extension to do some basic statistics accumulation that 
information could be very useful.  Specifically, the total request 
latency as well as the number of returned items/messages as well as the 
number of items in the sub-collections.  (Each collection instance also 
creates collections to hold the related noun instances and stores them 
in a subCollections dictionary keyed by the noun id.  This is done for 
object identity reasons.)

In general, we can expect that every gloda noun instance retrieval is 
going to require the retrieval of a page that is unlikely to be 
prefetched.  In the case of messages, retrieval of the fulltext body 
requires row retrieval using the primary key which means at least one 
page.  Contacts and identities have a good chance of already being in 
memory.  Queries for related noun instances (identities/contacts) must 
be issued from the main thread subsequent to the asynchronous mozStorage 
activities which also brings main thread latency into the overall 
latency picture.

The point being that a conversation with 10 messages and 6 involved 
parties will tally out to an uncached lower bound of at least 2 seeks 
per message, 1 seek per identity, 1 seek per contact, plus btree 
traversal overhead at a page size of 512 (which should thankfully 
benefit from some locality related to key ordering) which we will call 8 
seeks with absolutely no justification whatsoever.  So (32 substantiated 
seeks + 8 wild guesstimate seeks) * 10 ms = 400ms.  JavaScript-handled 
dependent queries should probably number 3 in the worst case, so also 
add 3 * (event loop latency) into that.

Obviously, if gloda indexing is active during the process of display, 
there could be latency on the asynchronous database thread as well as 
interference on the main thread.  Likely major sources of latency 
(easily on the order of 100s of milliseconds) include database commit 
operations and fulltext index segment merges.  Any other I/O by 
Thunderbird, including your extension streaming messages, could also 
introduce additional I/O latency and what not.

Which is to say, it would not surprise me that these things could get 
really slow, but it would be great to know what this translates to in 
the real world.

The best mitigation strategy I can think of would be for you to try some 
of the following; hopefully you're not already doing any of this! :)

- Aggressively pre-issue queries against messages close by in the thread 
pane that the user might click on next.  I would still expect most 
worst-case gloda query latencies to take less time than it takes a human 
being to read an e-mail message of several seconds.

- Maintain a limited query cache of your own to fast-path users 
re-displaying a recently displayed conversation.  Gloda will not 
automatically re-use existing queries.  If you ask for information on a 
conversation you recently asked for, gloda will still dispatch the query 
to the asynchronous thread.  This will likely return quickly because the 
SQLite or OS caches should still be hot, and gloda may still have all of 
the message objects available in memory, but you still have to deal with 
the asynchronous database thread latency as well as the main thread 
event backlog/latency.

It is worth noting that thanks to SQLite 3.7.x and its WAL (Write-Ahead 
Log) we actually could avoid the worst-case asynchronous thread latency 
stuff.  SQLite 3.7.x is available on both mozilla 1.9.2 and trunk.

Andrew



More information about the tb-planning mailing list