Gloda questions/suggestions.

Andrew Sutherland asutherland at asutherland.org
Mon Jun 14 18:30:32 UTC 2010


  On 06/14/2010 10:15 AM, Blake Winton wrote:
> DavidA let me know that we aren't indexing on date, so this query will 
> run slower than it might otherwise, and hopefully that can be fixed 
> fairly easily.

For clarity purposes, there is an index on the date column.  But we 
would need an index over the folder id and date together for it for your 
query to be cheap.  The query probably spends a lot of time copying 
message rows into an ephemeral result table sorted by date to satisfy 
the ordering constraint and then perform the limit.  The one upside is 
it only ever needs to keep 50 rows in the table at a time, but there 
still is a lot of busywork.


> The other idea I had (and I don't know if it's technically feasible), 
> was for partial updates to be sent through onItemsAdded, before the 
> first query is finished. Currently, I get a long pause, followed by a 
> single call to onItemsAdded, and then immediately afterwards, a call 
> to onQueryCompleted.  If I could get the data fed to me sooner, piece 
> by piece, I believe that would give the user the impression of us 
> being faster than we are.

That's unlikely to make a major difference.  SQLite can't start 
generating message result rows until it has seen literally every message 
row in the folder, and then it will zip through them quickly.  The 
potential savings of chewing the messages in smaller bites is that the 
follow-on conversation and identity/contact/identity queries we issue 
can be pipelined.  I'd be interested in seeing how long those queries 
take to service before doing too much more work with that.  Assuming you 
have gloda debug logging enabled, I think that should be pretty easy to 
figure.

(Note that most of the logic is in place to take smaller bites and it 
might even be a matter of constants, if you want to try your hand at 
that.  I still would like numbers though :)


> And in a related suggestion, some way of paging through the results 
> would let me load fewer initially, and then add more later.

Yeah, this is required for the virtual widget list mechanism too.  We 
could do this within the current schema by putting the output of the 
query into a temporary table and then slicing that instead of using 
limits.  Unfortunately the up-front query would still be pretty 
expensive as you're currently seeing.  We'll need a schema change with a 
full reindex required to make sure we bake dates into all of our indices 
to improve performance.  I was originally hoping to get that in for 3.1 
but that didn't work out at all.

I suspect I'm going to end up creating a (p)branch for changes to gloda 
in the near future, as there's a lot of stuff that needs to happen.  We 
should be able to improve the treatment of conversations in the process too.

One very useful thing would be if you can figure out what other 
efficient queries your UI is going to want for always-available instant 
access and what can perhaps suffer some on-demand processing.  With 
dates on the queryable indices we should be able to convince SQLite to 
do some pretty efficient intersections even when one of the query fields 
has a pretty large return set, but there are cases where we're going to 
need to generate synthetic/aggregate attributes to be indexed or opt for 
filtering.  For example, unread status is never going to be fast enough 
on straight set intersection; we need to specialize on that one.

Andrew



More information about the tb-planning mailing list