Syncing Bookmark Tombsones

Kit Cambridge kit at mozilla.com
Wed Jan 31 03:00:16 UTC 2018


Thank you for the thorough write-up, Thom, and sorry for the delay in replying! Also, thanks Richard and Mark for responding. I'll also summarize and expand on what you wrote to make sure I understand, and suggest an idea.

The crux of the issue is that treating tombstones exactly like bookmarks means we'll do the wrong thing if the server is wiped, either due to a node reassignment or some other reason, and another client that didn't have the tombstones yet beats us to uploading its bookmarks.

That leads to the approach you took in your patch, where you always prefer a tombstone to a modification on either side. If we have a local tombstone for a remotely changed bookmark, we'll ignore the remote, and upload a tombstone for the remote "zombie". If we have a remote tombstone for a local bookmark, we'll take and store the tombstone, and delete the local bookmark. Is that accurate?

Stepping back some more, there's a lot of complexity in how tombstones interact with remote and local wipes. Here's my understanding of the three important cases, and how we handle them today.

1. Involuntary (not user-initiated) server wipe, like a node reassignment. This currently resets the sync ID and the last sync time, and sets `syncChangeCounter = 1` and `syncStatus = NEW` for all bookmarks via `PSU.bookmarks.reset`. The first device to sync after that wins, and uploads its complete tree with a new sync ID. (This also means that all server timestamps will be newer, even if the bookmarks didn't meaningfully change since before the wipe). Other devices will see the sync ID mismatch, reset the same state, and *merge* the tree that the first device uploaded.

2. Voluntary restore. As with (1), this resets the sync ID and the last sync time, and sets `syncChangeCounter = 1` and `syncStatus = NEW` for all restored bookmarks. Unlike (1), this also wipes the server and all clients before uploading the local tree. The effect is that receiving devices *follow*, not merge: they erase everything in Places and replace their local trees with the restored tree on the server.

3. Involuntary Places restore on startup. This is mostly held together with staples and duct tape. A restore sets `syncChangeCounter >= 1` and `syncStatus = UNKNOWN` for all restored items, flagging them all for upload, but does *not* reset the sync ID or last sync time. We do back up modification times, so at least we won't think everything in Places is newer. However, on the first sync after an involuntary restore, we'll only download new records since the last sync, because Sync still thinks that Places matches the server. If there *are* new records, we'll apply them to the restored tree, which might work just fine with a fresh enough backup, or make a mess otherwise. Let's ignore this one for now.

So:

* (1) and (2) both trigger the same "sync ID mismatch, reset sync flags and start over" logic on the receiving client. The only way a receiving client can distinguish between (1) and (2) is if it processes a `wipeEngine` command first. If the sender didn't see the restore, or failed to send the command, the receiver will assume (1), and merge instead of following. (Alternatively: a follow is just a special case of a merge with an empty local tree).

* But, durable restores are something we can solve in parallel to your work. Backing up, restoring, and uploading tombstones won't hurt; they just won't be meaningful after wiping Places. For (1), we don't want to resurrect; for (2), we don't care. Both (1) and (2) change the sync ID, so we can make a tweak to the same logic.

* In addition to `NORMAL`, `NEW`, and `UNKNOWN`, we could introduce a fourth sync status flag, like `PREVIOUSLY_SYNCED` ("was on the server in the past, but might not be now"). We then override the sync ID change logic in the bookmarks engine to set `syncChangeCounter = 1` and `syncStatus = PREVIOUSLY_SYNCED` for all `NORMAL` bookmarks and tombstones. We'll want to override this logic if we want to store the sync ID in Places, anyway, so I don't think we're introducing complexity for just this case. This is also cheap to add to the mirror, since it already fetches the sync status for every node.

* `PREVIOUSLY_SYNCED` items are functionally identical to `NEW` (for instance, deleting a `PREVIOUSLY_SYNCED` item does not write a `NEW` tombstone for it), except they use your approach for reconciliation. If we're merging a `PREVIOUSLY_SYNCED` tombstone and a remote item, we prefer the local tombstone and delete the zombie, even if the remote is "newer". If we're merging a `PREVIOUSLY_SYNCED` item and a remote tombstone, we prefer the remote tombstone and delete the item locally.

* After the first sync, `pushChanges` flips the sync status from `PREVIOUSLY_SYNCED` to `NORMAL`.

* `NORMAL` items use the same conflict resolution logic as now, allowing for resurrections. This might let us sidestep most data loss concerns with a normally syncing tree, while doing the right thing in edge cases.

I guess this is closer to Mark's suggestion to "make each client aware of the fact that the server was wiped and that some other device has uploaded *its* view of the world".

What happens if the first post-restore sync is interrupted? The next sync will not run the "sync ID changed" logic, because we stored the new sync ID at this point, but local items will remain in `PREVIOUSLY_SYNCED` until we actually mere them.

What happens if we missed a restore, and treat (2) as (1)? A device that didn't see the tombstone might resurrect the item, but, as long as at least one other device saw the tombstone, it'll be removed again when that device syncs and reconciles the resurrected item with its `PREVIOUSLY_SYNCED` tombstone. There's potential for data loss here if the user decides, "oh, actually, I didn't want to delete this bookmark", and moves it to another folder, but that's an edge case within an edge case. If we handle (2) correctly, all clients will wipe themselves before syncing bookmarks, and no one will be around to resurrect.

Sorry for the screed; you've probably thought about most of this, too, and a lot of it is just me thinking out loud and writing down how all this works for the record. Do you think a new sync status could work?

Cheers,
- kit

> On Jan 29, 2018, at 12:44, Thom Chiovoloni <tchiovoloni at mozilla.com> wrote:
> 
> 
> 
> On Mon, Jan 29, 2018 at 3:53 AM, Mark Hammond <mhammond at mozilla.com> wrote:
> Thanks for writing this up Thom (and thanks for replying already Richard - I briefly respond to you near the end of this mail)
> 
> First, for my own benefit, let me try and restate the issue we have here:
> 
> * Currently no devices persist tombstones - they exist only on the server. If the server data is lost, any clients which have not synced between an item being deleted and the server wipe will resurrect deleted items - because no tombstone exists on any client nor on the server.
> 
> * While we talk more about bookmarks, this is true for all collections with tombstones. This is also true no matter how we might lose the server data (of which there are many)
> 
> * For bookmarks in particular, bookmark restoring may introduce other issues that will require thought.
> 
> And that's pretty-much the full extent of our resurrection issue, right?
> 
> The solution to this is, roughly, "have all clients remember all tombstones". The for sake of this discussion, we assume the cost of locally persisting tombstones is reasonable.
> 
> The edge-case here is that:
> 
> * client-1 syncs.
> 
> * client-2 deletes a bookmark, creates a persistent local tombstone and uploads it to the server.
> 
> * server data is lost
> 
> * client-1 does the next initial sync and uploads the item (as it has no tombstone)
> 
> * client-2 does the next sync - sees the item it knows for sure was previously deleted *somewhere* and must decide what to do.
> 
> The only reason to not honor the deletion (ie, resurrect it) is the possibility that the bookmark might have been edited on one device but deleted on the other, so deleting it would be data-loss.
> 
> The possible solutions to *this* are:
> 
> * Always resurrect the item. This is what our current reconcillation logic will do as the last-modified timestamp on the server will be later than the time we recorded the deletion. IIUC, this is (1) below.
> 
> * Always delete the item. This is (2) below.
> 
> * Track more state so we can try and be confident we made the correct choice between resurrection and deletion. This is (3)
> 
> * Give up in disgust, deciding that this is "edge-cases all the way down", which is (4).
> 
> * Make each client aware of the fact that the server was wiped and that some other device has uploaded *its* view of the world and attempt to make decisions in a more holistic manner. I think this is what Richard was implying in part of his response.
> 
> Am I missing something here? Are there other edge-cases not related to bookmark restore that I've omitted?
> 
> If not, then I believe Thom is correct in that (2) below is the right thing to do. Closely followed by (4). (1) is clearly wrong, (3) is not worth the cost, and (holistic) is for some distant/alternate future.
> 
> More comments inline:
> 
> On Thu, Jan 25, 2018 at 2:45 PM, Thom Chiovoloni      1. Treat tombstones more-or-less like we treat bookmarks now, that
>         is, keep them around locally, store remote ones locally when
>         they come in, and resolve conflicts based on the modified
>         timestamp on the item.
> 
> To make sure I'm not missing some subtlety, is this "more-or-less like we treat bookmarks" or "exactly like we treat bookmarks"?
> 
> Exactly like we treat bookmarks. 
>  
> 
>      2. Store tombstones from the server locally in places (as in #1),
>         but always reconcile in favor of deleted items over new items.
>         Once an item is deleted it's gone. (There are some nuances, e.g.
>         If an incoming item has a local tombstone, we reupload that
>         tombstone to the server (to delete it), etc.)
>         This has issues around bookmark restoring, which aren't handled
>         by my patch (probably it's biggest TODO).
> 
> Restoring is a problem, particularly in edge-cases such as the restore is from a completely different set of bookmarks (where the normal case is typically a restore from an older version of the same bookmarks). Further, it's not clear to me that we should honor tombstones that exist on the server on restore - the user may be restoring precisely to get back bookmarks they accidentally deleted - so it may be that storing tombstones doesn't make much sense.
> 
> Storing tombstones with the restore would bring the state of our local tombstones back to the state they were at when the restorepoint was created.
>  
> 
> We certainly should do a better job on restore (and particularly automatic restore that happens before we are loaded, and better handle transient failures wiping the server), but otherwise I'm not sure we should conflate the 2 issues too much (which isn't to say we shouldn't consider how they interact)
> 
> Yeah, the concern is mostly with how they interact. Note that with solution #2 as implemented (e.g. ignoring restore), when a bookmark is resurected via a restore to bring it back after an accidental deletion (as you suggested), a remote client that happens to have seen the tombstone for the record will re-delete it, unless it gets a wipe command (which it will, unless we failed to notice the restore).
>  
> 
>         It also seems dangerous to never allow items to be undeleted --
>         once a guid is discarded, it's effectively gone forever (except
>         for cases where all clients with the guid wipeClient).
> 
> In what way is it dangerous? Is there any risk other than the "user deletes on device-1, edits on device-2, and gets upset we took the delete"?
> 
> The main other risk I can think of is around restores, but in general it prevents us from ever adding other cases where we decide an undeletion is the correct approach -- other clients simply won't respect it.
>  
> 
>      3. The "most correct" approach is probably to store the (best guess
>         for) the server timestamp when the bookmark was deleted in the
>         places database, which would then be used for reconciliation
>         (Possibly using a scheme similar to dateAdded for ratcheting
>         time backwards).
> 
> IIUC we already do that
> 
> We store a local timestamp currently, and don't do any dateAdded style ratcheting for it for tombstones.
>  
> 
> 
>         We'd also need to add the 'real' modification date to each
>         bookmark record, e.g. the server timestamp upon editing the
>         record. This would allow us to continue working as we do now
>         (sometimes undeleting bookmarks).
> 
> Server timestamps have their own set of problems too - all we know is the time of the last or next sync - we clearly can't always know the server timestamp for a local deletion or modification - which itself leads to edge-cases.
> 
> I think we'd we'd want to store the local timestamp until upload, and on upload, convert it to a remote timestamp, using how far in the past it is according to the local clock (since this is roughly the logic we use for reconciliation). Then, when you update the sync state to NORMAL, we'd change the timestamp from a local timestamp to the remote timestamp you just uploaded too. Yes, this is complicated, which is why I abandoned this option.
>  
> 
> A local timestamp would work (poorly) too - but yeah, this seems tricky and more complex than is worthwhile given it's an edge-case^2 (or are we up to ^3? :)
> 
>      4. I guess we could also do nothing. It's an edge case, there
>         doesn't seem to be a perfect solution (after writing this up I
>         still think #2 is probably the best). I also suspect that most
>         people who have seen resurrected items, it was really bug
>         1177516 <https://bugzil.la/1177516>, which this won't fix at all.
> 
> I'm skeptical that users who reported resurrected items were talking only about default bookmarks - these defaults will not be resurrected by a node reassignment and users reporting issues explicitly mentioned they did indeed sync an "old" device.
> 
> Yeah, you're probably right.
> 
> 
> IOW, I believe bug 1177516 is "a very limited set of bookmarks may get 'resurrected' when a new device is added", whereas we are talking about "potentially unlimited set of bookmarks may get resurrected on node reassignment" - so 2 quite different things.
> 
>         I don't think #4 is the right choice, but I'm including it since
>         Mark mentioned it in our one-on-one. I think we should still do
>         this, since the code complexity isn't *that* bad (well, for #3
>         it's a good amount worse), and it does solve a known issue -- we
>         also knew it wouldn't be trivial when we decided to do it
>         initially, and I don't think much has changed around this since
>         then.
> 
> Another thing I mentioned was the possibility of "solving" this via UX - eg, when we notice an old device is about to Sync, we could check what they want to do (there's a bug on file for this, but can't find it) - there will be devil in the detail there too though (and even then, if the user selects "merge them", we are stuck in the same position we are now.)
> 
> I'm not convinced there's a solution that would work well enough just via UX. Some of the cases probably would hit it, but not all of these involve disconnected accounts or substantial amounts of time between syncs (these make it more likely, but only as far as it's more likely that you've deleted items in this timeframe).
>  
> 
> (Admittedly, I'm biased though, I'd rather not have another OKR I'm assigned to where we end up doing nothing; as with the sync-repair system addon)
> 
> Yeah, that would suck, but I'm sure you agree that alone isn't a good enough reason to do something here :)
> 
> But all in all, and as mentioned above, I'm leaning towards (2) too, simply because I feel that a user deleting on one device and editing on another is (a) unlikely, (b) may do the wrong thing today even when we don't lose server data, and (c) having these items be deleted when we guess wrong seems less surprising to the user than having then resurrected when they didn't actually edit the item.
> 
> On 27/01/2018 7:46 am, Richard Newman wrote:
> 
> > A quick reply to this:
> >
> > - First, to get it out of the way, I don't think we need to be
> > concerned about cost of keeping tombstones. However, the more
> > we store — timestamps, states, etc. — the more we have to
> > worry about this. Imagine
> > that poor user with multi-duplicated bookmarks thanks to old bugs,
> > creating thousands of tombstones as they clean up.
> 
> I agree in general, although I think the cost of a tombstone without an extra timestamp vs one with an extra timestamp is noise. We could also use telemetry to help us understand how many tombstones we have and expire them if necessary - I'm really not too bothered by a device coming back after 2 years having somewhat strange things happen.
> 
> IOW, local tombstones *do* have a cost that we will need to manage forever, both in places and sync. This is another argument for (4) :)
> 
> I suspect the cost for local tombstones is rather small. Certainly the `moz_bookmarks_deleted` table is much more lightweight than the `moz_bookmarks` table, even with my changes (which add a single INTEGER column, which will only ever be 0, 1, or 2 -- my understanding is that SQLite will optimize this under the hood in some manner, but even if it doesn't its a small amount of data).
>  
> 
> > - Secondly, I think it's worth taking one further analytical step beyond
> > timestamps. Your node reassignment concern in #1 is an ordering concern,
> > and your approach of keeping and ratcheting server timestamps is a
> > heuristic for specifying happened-before relationships that aren't
> > currently modeled in Sync. Reconciling conflicts and handling empty
> > servers is, in part, a question of deciding which changes to discard
> > because they happened in a previous 'epoch'.
> 
> I don't understand what specifically you are getting at here? Is my "holistic" note above roughly what you mean?
> 
> TBH though, IMO the further analytical step should really be around bookmark restore - there will certainly be additional trade-offs here.
> 
> > - When considering whether it's worth doing this work: how often do we
> > have users who: (a) have a device that's out of sync, (b) have
> > tombstones on the server, and (c) are node reassigned? That is, how
> > often do users' devices drift out of sync and undelete items because
> > of  a failure to propagate deletions? And what else would this work
> > solve? I'm in favor of systems that behave predictably, but I'm not
> > usually in  favor of trying to achieve predictability by piling on
> > layers of additional behavior onto a system that simply isn't
> > expressive enough.
> 
> I agree with this in general, but it's tricky because node reassignments aren't a regular occurrence (let alone reassigning *every* user like we did last year) - but they are expected.
> 
> IIU Thom's (2) correctly, there's not a whole lot of extra work or state to carry around here - it really is just "reconcile always prefers deletions" - that doesn't seem like it's adding much extra complexity, the cases in which it fails don't seem worth worrying about, so is arguably justifiable. Without a more compelling reason, I'm against (3) for exactly the reason you imply above - the bang isn't worth the buck.
> 
> In practice it gets a bit more complicated than that, but yes, conceptually that's all there is. 
>  
> 
> I do agree that we should be careful to understand what tradeoffs we are proposing here and make sure we understand and accept when (2) would do worse than the status quo in the face of these strange events, particularly with bookmark restore.
> 
> But all of this has a cost - there's still space for someone to advocate for accepting (4) - the status-quo and doing nothing :)
> 
> Thanks!
> 
> Mark
> 
> _______________________________________________
> Sync-dev mailing list
> Sync-dev at mozilla.org
> https://mail.mozilla.org/listinfo/sync-dev



More information about the Sync-dev mailing list