Paging through a collection

Richard Newman rnewman at mozilla.com
Mon Sep 29 08:55:50 PDT 2014


On  29 Sep 2014, at 8:12 AM, Stefan Arentz <sarentz at mozilla.com> wrote:

> I don’t understand the logic here. Why doesn’t the client use the limit and offset parameters to grab all history in three requests?

You’re exercising an old code path that was intended for use by mobile.

Nobody has stripped this out, because the risk of introducing bugs is higher than the benefit of simplification.

processIncoming does this:

* It fetches records up to the downloadLimit (no limit by default, a very low limit on mobile).
* If we fetched that many items, there are presumably more. Switch into batching mode. Fetch downloadLimit IDs. On mobile, this bumps from a very low limit up to 5000, so we expect to get mostly new records.
* Fetch batches of those items by ID (mobileGUIDFetchBatchSize, guidFetchBatchSize).

Obviously that looks weird if you hit downloadLimit in the first request, and you’re not on mobile. There’s no good way to say “give me a large number of relevant IDs, but not the records I just pulled”. One could modify the system to figure out how many records would be fetched, and then either fetch them or grab IDs, but I doubt anyone is motivated to do so.


Sync doesn’t use limit and offset for several reasons:

* Limit and offset don’t really make sense on the server. This is not a transactional system; it’s stateless, with an arbitrary amount of time between successive page fetches. The client can’t trust that it’ll get all the records by using paging, so it doesn’t use it at all.

* Server (and client!) writes could be occurring during this paging behavior. We don’t want to make Sync’s lack of safety worse by deliberately fast-forwarding past new records. You can see in engines.js that we set the timestamp before we start fetching batches.

* This batching persists across restarts. Paging would not (and wouldn’t make sense to do so, for the aforementioned reasons). This is/was important for XUL mobile.


Sync uses lots of requests for the last bit because we run into URL length limitations.

Most of this logic is thoroughly commented in _processIncoming in engines.js, if you want to dig a little deeper.


In a much better sync system, a client would do this (aping Git), or something equivalent to it:

* Pick an identified server state (HEAD -> hash)
* Incrementally fetch chunks of state — individually named — to be able to recreate the server state.
* Merge locally to create a new head.
* Push data for that head to the server.
* Fast-forward the server to that head.

We don’t have that system, alas, so instead it blindly grabs records and hopes for the best.




More information about the Sync-dev mailing list