Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up diverging structure in bookmark trees #19

Merged
merged 15 commits into from
Jan 30, 2019
Merged

Conversation

linabutler
Copy link
Contributor

@linabutler linabutler commented Jan 9, 2019

In a well-formed tree:

  • Each item exists in exactly one folder. Two different folder's
    children should never reference the same item.
  • Each folder contains existing children. A folder's children should
    never reference tombstones, or items that don't exist in the tree at all.
  • Each item has a parentid that agrees with its parent's children. In
    other words, if item B's parentid is A, then A's children should
    contain B.

Because of Reasons, things are (a lot) messier in practice.

Structure inconsistencies

Sync stores structure in two places: a parentid property on each item,
which points to its parent's GUID, and a list of ordered children on the
item's parent. They're duplicated because, historically, Sync clients didn't
stage incoming records. Instead, they applied records one at a time,
directly to the live local tree. This meant that, if a client saw a child
before its parent, it would first use the parentid to decide where to keep
the child, then fix up parents and positions using the parent's children.

This is also why moving an item into a different folder uploads records for
the item, old folder, and new folder. The item has a new parentid, and the
folders have new children. Similarly, deleting an item uploads a tombstone
for the item, and a record for the item's old parent.

Unfortunately, bugs (bug 1258127) and missing features (bug 1253051) in
older clients sometimes caused them to upload invalid or incomplete changes.
For example, a client might have:

  • Uploaded a moved child, but not its parents. This means the child now
    appears in multiple parents. In the most extreme case, an item might be
    referenced in two different sets of children, and have a third,
    completely unrelated parentid.
  • Deleted a child, and tracked the deletion, but didn't flag the parent for
    reupload. The parent folder now has a tombstone child.
  • Tracked and uploaded items that shouldn't exist on the server at all,
    like the left pane or reading list roots (bug 1309255).
  • Missed new folders created during a sync, creating holes in the tree.

Newer clients shouldn't do this, but we might still have inconsistent
records on the server that will confuse older clients. Additionally, Firefox
for iOS includes a much stricter bookmarks engine that refuses to sync if
it detects inconsistencies.

Divergences

To work around this, our tree lets the structure diverge. This allows:

  • Items with multiple parents.
  • Items with missing parentids.
  • Folders with children whose parentids don't match the folder.
  • Items whose parentids don't mention the item in their children.
  • Items with parentids that point to nonexistent or deleted folders.
  • Folders with nonexistent children.
  • Non-syncable items, like custom roots.
  • Any combination of these.

Each item in the tree has an EntryParentFrom tag that indicates where
its structure comes from. Structure from children is validated and
resolved at insert time, so trying to add an item to a parent that
doesn't exist or isn't a folder returns a MissingParent or
InvalidParent error.

Structure from parentid, if it diverges from children, is resolved at
merge time, when the merger walks the complete tree. You can think of this
distinction as similar to early vs. late binding. The parentid, if
different from the parent's children, might not exist in the tree at
insert time, either because the parent hasn't been added yet, or because
it doesn't exist in the tree at all.

Resolving divergences

Walking the tree, using Tree::node_for_guid, Node::parent, and
Node::children, resolves divergences using these rules:

  1. Items that appear in multiple children, and items with mismatched
    parentids, use the chronologically newer parent, based on the parent's
    last modified time. We always prefer structure from children over
    parentid, because children also gives us the item's position.
  2. Items that aren't mentioned in any parent's children, but have a
    parentid that references an existing folder in the tree, are reparented
    to the end of that folder, after the folder's children.
  3. Items that reference a nonexistent or non-folder parentid, or don't
    have a parentid at all, are reparented to the default folder, after any
    children and items from rule 2.
  4. If the default folder isn't set, or doesn't exist, items from rule 3 are
    reparented to the root instead.

The result is a well-formed tree structure that can be merged. The merger
detects if the structure diverged, and flags affected items for reupload.

Closes #18.

@codecov-io
Copy link

codecov-io commented Jan 18, 2019

Codecov Report

Merging #19 into master will increase coverage by 0.25%.
The diff coverage is 92.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   96.16%   96.41%   +0.25%     
==========================================
  Files           6        6              
  Lines        1797     3709    +1912     
==========================================
+ Hits         1728     3576    +1848     
- Misses         69      133      +64
Impacted Files Coverage Δ
src/lib.rs 0% <ø> (ø) ⬆️
src/tests.rs 99.92% <100%> (+0.1%) ⬆️
src/error.rs 33.33% <20%> (-9.53%) ⬇️
src/tree.rs 91.35% <91.9%> (+6.01%) ⬆️
src/guid.rs 93.54% <94.73%> (+2.24%) ⬆️
src/merge.rs 98.28% <95.8%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ea28a3...07fd161. Read the comment docs.

@linabutler linabutler changed the title [WIP] Fix up structure inconsistencies Fix up diverging structure in bookmark trees Jan 18, 2019
@linabutler linabutler requested review from thomcc and mhammond January 18, 2019 07:07
@linabutler
Copy link
Contributor Author

Whew! 😅 I think this is finally ready for review, sorry it took so long! The merger changes are pretty minimal, but the tree got a complete overhaul. The diff view for tree.rs isn't likely to be helpful; it's probably easier to read through it from scratch.

This needs a lot more tests, and there are still likely bugs (and TODOs), but I'm feeling comfortable with the algorithm itself. Rust makes this kind of refactoring far less error-prone. I edited the PR description with an explanation of how this all works now, which is also in the rustdoc comment for Tree.

The TL;DR is, the tree structure can diverge now, which is a fancy way of saying parents can have missing children, and children can have multiple parents, no parents, and mismatched parents. We manage this by storing two different sets of structure: from children, where we already have the parent in the tree, and from parentid, where we might not. When the merger walks the tree, the diverging structures will be automatically resolved. The merger notes that the structure diverged, and flags the nodes for reupload.

In the best (and, hopefully, common 🤞) case, the tree won't diverge at all, and parent and child lookups will be just as simple and fast as before. In the worst case, we should only need to do all this once...unless another client (👀 Android) scrambles the server again.

I've flagged the usual suspects for review. @bbangert and @rnewman, you two might be interested as well. 📓

Originally, I'd intended for the tree to expose diverging structure to
the merger, and for the merger to fix it. That was part of why I split
`Item` into `Existing` and `Missing` variants.

However, it turns out it's simpler to resolve divergences in the tree,
and expose a well-formed structure to the merger instead. Changing
`Item` to an enum means we now need a pattern match for _every_ item,
when all we really want is a flag on the parent that says "this folder
has diverged because it has a missing child".

`Entry::divergence` already flags diverging structure for multiple
parents, so let's use it to also flag parents with missing children,
instead of forcing the merger to handle invalid structure for this one
case.
* Add missing doc comments.
* Rename reparenting methods for clarity.
* Move `Tree::{children, parent}_for_entry` to
  `Node::{children, parent}`.
* Remove optional return value from `Node::level`.
* Add `Node::is_{root, default_parent_for_orphans}`.
* Iterate over `Tree::entries` instead of
  `Tree::entry_index_by_guid`. The two should point to the same
  entries, but `entries` is more direct.
* Don't mark trees with orphans as equal.
This reverts commit b5e38d4.
@linabutler
Copy link
Contributor Author

I was noodling 🍝 on this some more...

The tricky part of fixing up divergences is handling missing children and missing parentids. For other cases, we can find the right folder for the item. Reuploading the item, its new folder, and any other parents that mention the item, should be enough to fix up the structure on other devices.

However, for missing parentids, we move the item to unfiled, and, for missing children, we reupload the parent with a new list. That risks orphaning those items if they exist on other devices.

In theory, this won't confuse older clients, even if they have the missing items locally. Since they parent based on the parentid, and only use children for ordering (I think Android does this, too, but I'm not sure), they'll move the items to match the server's structure, and keep the missing item around. We can also try something like always preferring the local state if the remote structure diverged, since we know it's consistent, even if it's out of date.

This is the problem that Firefox for iOS faces: it doesn't know if the missing records won't be uploaded at all, and so fixing the server is the right thing to do...or just haven't been uploaded yet, in which case it should definitely not fix the server. When the iOS implementation was built, the server didn't support batch uploads, and Desktop and Android didn't track or merge items properly, so opting to wait for the missing records was a perfectly prudent thing to do.

Four years later, we've fixed many of those consistency issues. We also know that not syncing bookmarks at all is worse than syncing and getting them wrong. And we know that Android can get confused and scramble bookmarks on other devices. I think that all leaves us in a better position to fix the server, instead of refusing to sync or applying questionable changes and letting the local tree diverge.

The other concern with fixing up the server is getting into Sync battles with other clients, where each thinks its view is right, and reuploads the same records over and over again. As @thomcc suggested, we can track this in the mirror (and we know that, if merged_node.merge_state is LocalWithNewStructure(node) or RemoteWithNewStructure(node), but node.needs_merge == false, the node's structure was changed by the merger, not by the user), and emit telemetry.

Anything we do is going to be Hella Hacky and Probably Wrong™, so let's do the thing that has a chance of being right.

Copy link
Contributor

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very impressive. These are my first comments on the implementation, mostly written to help familiarize myself with the pieces and the tree.rs changes. I'll try to get another pass in tomorrow for the overall algorithm, but it might take some discussion.

src/tree.rs Outdated
enum Divergence {
/// The node's structure is already correct, and doesn't need to be
/// reuploaded.
Ok,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love reusing Ok (which usually refers to a Result) here. Eh, its probably fine though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent? 😄

src/tree.rs Outdated
impl ParentGuidFrom {
/// Notes that the parent `guid` comes from an item's parent's `children`.
pub fn children(self, guid: &Guid) -> ParentGuidFrom {
ParentGuidFrom(Some(guid.to_owned()), self.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In this whole file)

I find the to_owned use here confusing. Isn't this just a clone? Using to_owned implies a conversion is happening (it's usually used for "the conversion we want to do is too fancy for clone"), but AFAICT this is just Clone... That said, maybe I'm missing something?

If it is just a clone(), I think that would be a lot clearer.

src/tree.rs Outdated
}
}

fn indices(&self) -> Vec<Index> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this without allocating (e.g. returning an Iterator) as follows

    fn indices<'a>(&'a self) -> impl Iterator<Item = Index> + 'a {
        let entry_parents = match self {
            EntryParents::Root => &[],
            EntryParents::One(parent_from) => std::slice::from_ref(parent_from),
            EntryParents::Many(parents_from) => parents_from,
        };
        entry_parents.iter().filter_map(move |parent_from| match parent_from {
            EntryParentFrom::Children(index) => Some(*index),
            EntryParentFrom::Item(_) => None,
        })
    }

Not sure if it's worth it though (you need to do a slight change to part of insert to avoid issues with borrowck), but it seems like a shame to allocate for the common case of one parent.

guids below can be done using the same technique too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good idea, thanks!

src/tree.rs Outdated
}

fn is(&self, other: &Entry) -> bool {
self as *const _ == other as *const _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::ptr::eq(self, other) should work here. Also, probably worth an #[inline]

src/tree.rs Outdated
}

pub fn diverged(&self) -> bool {
match &self.2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: self.2 == Divergence::Diverged seems simpler to me but I don't really care.


impl<'t> Node<'t> {
/// Returns an iterator for all resolved children of this node, including
/// reparented orphans.
pub fn children<'n>(&'n self) -> impl Iterator<Item = Node<'t>> + 'n {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid the allocations in this function, but I guess they should only happen in the case where there are orphans, which should be rare enough that it doesnt matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I move self into the closure, and handle orphans and child_indices slightly differently...I guess I could zip the orphans and child_indices with an iter::repeat that passes along a tag (Option<Divergence>?).

Or, since everything in orphan_indices_by_parent_guid should already be Diverged, I can probably just chain orphans before filter_map.

src/merge.rs Outdated
pub struct Merger<'t> {
driver: &'t Driver,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a trait object here and not something like struct Merger<'t, D: Driver> or similar?

If so, nit: &'t dyn Driver to make it more obvious it's a trait object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using the generic form with a default parameter (D = DefaultDriver), but ran into rust-lang/rust#50822, and had to replace all uses of let merger = Merger::new with let merger = <Merger>::new for type inference to work. I didn't try it without the default type parameter, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default params should still work. Possibly worth noting that you use them frequently, HashMap<K, V> is actually HashMap<K, V, S = RandomState>. You'll probably have to add new (e.g. the constructor that doesn't take a Driver arg) to impl<'a> Merger<'a, DefaultDriver> instead of impl<'a, D: Driver> Merger<'a, D>, to avoid the issue you described though.

@@ -64,6 +64,21 @@ enum ConflictResolution {
Unchanged,
}

/// Controls merging behavior.
pub trait Driver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird, I'd almost think it should be something like enum InvalidGuidHandling { Forbid, Allow, Replace }... (And even then, I'm skeptical on Allow...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; MergeDriver sure is a roundabout way to implement GUID regeneration. I did it this way to avoid pulling in dependencies on base64 and rand, and leaving it up to the caller. (For example, on Desktop, we'd use nsINavHistoryService::MakeGuid). I was thinking we could also extend MergeDriver to provide sinks for logging and telemetry events.

But maybe this is all premature abstraction, and we should just take the dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, thats a decent point. It's probably not worth the dep. (I mean, really do we actually see these in the wild anymore? I don't think so, but could be wrong)

Copy link
Member

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - now I just need to understand it better ;)

src/error.rs Outdated
@@ -83,6 +84,9 @@ impl fmt::Display for ErrorKind {
},
ErrorKind::UnmergedRemoteItems => {
write!(f, "Merged tree doesn't mention all items from remote tree")
},
ErrorKind::GenerateGuid => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this because I can't really see how it will be used in real code, but I note that the places lib will panic if it can't generate a GUID, so it doesn't have any error similar to this. OTOH though, at least in the default trait below, this error is actually used as "I decline to create a new GUID" - so I'm not sure if this is intended for "can't" or "won't".

I guess that I'm suggesting we add a comment below...

/// Controls merging behavior.
pub trait Driver {
/// Generates a new GUID for the given invalid GUID.
fn generate_new_guid(&self, invalid_guid: &Guid) -> Result<Guid>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... here :) Expanding the comment for this function might make sense, to indicate in what conditions the error is returned.

src/merge.rs Outdated
let merged_child_node = self.two_way_merge(local_child_node, remote_child_node)?;
let mut merged_child_node = self.two_way_merge(local_child_node,
remote_child_node)?;
if remote_child_node.diverged() || merged_node.guid != remote_parent_node.guid {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.diverged() || guids_different seems a very common pattern here and used more often then a .diverged() alone is. It seems that it might be easy to get this wrong - I wonder if there's scope to capture this using the diverged mechanism somehow?

(This is more a rhetorical question than a suggestion as I'm still getting my head around this code)

@@ -1808,6 +1811,14 @@ fn mismatched_incompatible_bookmark_kinds() {
fn invalid_guids() {
before_each();

struct AllowInvalidGuids;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's "just" test code I think a comment here might be useful to help people get their head around the intent behind generate_new_guid()

src/tree.rs Outdated
}

fn is(&self, other: &Entry) -> bool {
self as *const _ == other as *const _
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this is a pointer comparison vs the fields deriving PartialEq (can you have dupe guids but don't want them to match?), can you clarify w/ a comment?

src/tree.rs Outdated
// check if we have any remaining orphans that reference
// nonexistent or non-folder parents (rule 3).
let needs_reparenting = |guid| {
match self.entry_index_by_guid.get(guid) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: map_or's maybe an improvement

Suggested change
match self.entry_index_by_guid.get(guid) {
self.entry_index_by_guid
.get(guid)
.map_or(true, |&index| !self.entries[index].item.is_folder())

* Clarify why `merge::Driver` exists.
* Make `Merger` generic over `Driver`, instead of using a trait object.
* Use `clone` instead of `to_owned`.
* Replace `Entry#is` with `std::ptr::eq`.
* Add `MergedNode::remote_guid_changed`.
In most cases, `node.2 == node.entry().divergence`, except for
orphans, default orphans, and diverging `parentid`s. This is a
surprising inconsistency that means `node_for_guid` and
`children` need to do more work to figure out if the node has
actually diverged.

This commit:

* Changes the tree to flag divergent `parentid`s at `insert` time.
* Cleans up `Tree::structure_for_insert`, to clarify what happens when
  a `parentid` is or isn't provided.
* Moves the logic for checking default folder divergences into
  `Node::diverged`.
* Replaces `EntryParents::{indices, guids}` with `EntryParents::iter()`,
  which doesn't allocate (thanks, @thomcc!).
* Clean up optionals with `.map_or(...)` and `.filter(...).map(...)`.
* Explain why we use `ptr::eq` to compare entries.
* Shorten import paths.
* Rename `Child::Existing` to `Child::Exists`.
* Rename `Divergence::Ok` to `Divergence::Consistent`, since `Ok`
  might be confused with `Result`.
};

self.entry().child_indices.iter()
.chain(orphans.into_iter())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomcc I thought about it some more, and I'm unsure how to avoid the allocation here. self.tree().orphan_indices_by_parent_guid borrow self (and default_orphans also borrows self in the closure passed to filter), so the returned iterator outlives the temporary...and we also move self into filter_map's closure.

But, as you said, orphans should be rare...and we already do much more work for diverged nodes, anyway. An extra heap allocation is likely the least of that.

We need this to apply the merged tree, when we join to the local and
remote trees.
* Fix infinite recursion in `fmt::Display::fmt()` for `Error`.
* Make `Store` generic over the error type. This allows callers to
  provide their own error types that unify with Dogear errors,
  `nsresult`s, and others, instead of requiring them to wrap their
  errors into `ErrorKind::Storage(...)`.
* Forward decoding errors from `Guid::from{utf8, uft16}()`.
* Rename `ErrorKind::GenerateGuid` to `ErrorKind::InvalidGuid`.
* Move `dogear::merge` into `Store::merge`.
@linabutler linabutler merged commit 07fd161 into master Jan 30, 2019
@linabutler linabutler deleted the fixup-structure branch January 30, 2019 04:09
linabutler added a commit that referenced this pull request Feb 10, 2019
This started out as a fix for structure corruption corner cases, grew
into simplifying tree construction for callers, and turned into a
full-blown rewrite. In a way, we've come full circle: the new tree
stores a fully consistent structure, and relies on the new builder
to resolve inconsistencies and flag diverging structure.

PR #19 added lots of complexity to the original tree. This came from
storing two different sets of structure, one that's resolved at
`insert` time, and the other when we actually walk the tree. This
separation exists because we `insert` items one at a time, in the
order based on the `children`, so the tree doesn't have a complete
picture of all items in it.

However, we _do_ have that picture, in the mirror database. By the
time we build the tree, we know exactly which `children` and
`parentid`s exist, if an item has multiple parents, or no parents.
Those entries might not be in the tree yet, but that's because our
implementation requires the tree to always be valid.

This requirement also forces callers to go through unnecessary
contortions. On Desktop, `Store::fetch_remote_tree` first buffers all
items into a pseudo-tree, then walks it recursively to inflate the
Dogear tree. That's a lot of busy-work to query and normalize the
data, and assumes we have a complete structure, which might not be
the case.

Instead, what if we built the tree in two passes? One to add all
items, and one to set their parent-child relationships. The queries
for these are simpler, more correct, and let us defer resolving
inconsistencies until we're ready to build the tree. We can add
optimizations for valid trees, and still handle every kind of
diverging structure.

Closes #22.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix up structure inconsistencies at merge time
5 participants