post TB 3.1 mailnews backend plans

Joshua Cranmer Pidgeot18 at gmail.com
Sun Jul 18 00:58:38 UTC 2010


  This one is in response to the Mork->SQLite conversation:

> Asuth has suggested using a very simple schema, where each row 
> consists of a key and a json-like blob representation of a msg hdr.
>
A new database would need the following things:
1. Easy to add message metadata.
2. Different kinds of keys for account types. Message keys may be 
defined as uint32s, but not all account types can map to a uint32. For 
example, Giganews has moved to 64-bit keys for binary newsgroups (!), 
and my webforum work may need to eventually key off of thread URLs 
instead of being able to use an integer. For new account types, then, 
being able to lookup by the "real" keys may be useful. This need not be 
in the main message table though; an auxiliary table is probably fine. 
In any case, what exactly constitutes a message key may need to be 
reexamined.
3. Likewise, messages are not going to be only searched for by message 
keys. Subjects and message-IDs are also pseudo-canonical mappings for 
messaging, and searching by the from is probably useful for aggregation. 
Finally, a date column may be useful for limiting query (e.g., delete 
all messages older than 90 days, get the last 10 days of messages). 
Views also rely on some flags (particularly, unread, killed, and watched).
4. For many account types, I think threads or conversations (which is 
essentially a thread without hierarchy) are more important than 
individual messages. That is to say, in my web forums work, I need to 
key some stuff off of threads instead of messages. Any new database 
schema needs to be able to attach metadata to threads.

> We'd probably have one sqlite db per account, though gloda seems to be 
> able to get everything into one db.
>
Per-account DBs open up the following questions:
1. Would it be possible to get rid of the message folder cache? Opening 
up a 100 msfs just to get folder infos is untenable, but I don't think 
even the most power users have more than two dozen or so accounts. How 
expensive is opening up a large SQLite database to get a single table? 
Alternatively, the reference data for the folder info could be moved 
into a single file.
2. If metadata is moved to an account level, it is possible to 
deemphasize folder structure for those accounts which lack strong 
concepts of folders.
3. Furthermore, this makes it possible to create messaging equivalents 
of symbolic or hard links. Crossposting desperately wants this kind of 
linking functionality, at the very least. Similarly, this would make it 
easier to create "tag folders" while reusing most of the folder 
hierarchy. It also seems possible to implement intraccount virtual 
folders as symlinked folders, with a clever enough listening structure.

Other comments:
1. If we're modifying the database, one thing I would love to propose is 
that we move threading to before the filtering step and therefore 
allowing filters to modify thread information without causing the 
database to spaz. I would also love to see changes to filter action 
models, but that's another story.
2. Database interactions would also need to be specified more clearly in 
general. Going along with the synchronous versus asynchronous concerns, 
some methods on auxiliary interfaces (like nsIMsgDBHdr) are immediately 
reflected in the database and others are not. If we want so support 
asynchronous better, it may make sense to make nsIMsgDBHdr a snapshot of 
the database.
3. Again, some methods have the option of not informing listeners of 
database changes. Some account types need to synchronize database 
changes externally (arguably, so does something like Thunderbird Sync), 
so either non-informing needs to go away or some sort of superlistener 
functionality needs to be added. There is one edge case which kind of 
throws a wrench in things, and that is filtering (including Junk mail 
filtering), or any new message processing step. Combining with 1, I 
think there is a need to redesign how new messages get handled, 
especially when we want to start enticing extension authors to create 
new account types.
4. As important as asynchronous versus synchronous concerns is being 
able to sanely use the database from multiple threads. Import and biff 
probably want to access from a different thread (especially if people 
set up body filters); offline message download also probably wants it, 
but I don't know how much metadata access it needs.
5. Offline actions (for things which need to synchronize state to online 
servers) and undo/redo actions probably want to be more generic and easy 
for new account types to hook into. Seeing as a large portion of these 
are essentially database actions, it may make sense to key these into 
the database.

-- 
Joshua Cranmer
News submodule owner
JSHydra author




More information about the tb-planning mailing list