Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss due to a race condition in merge #118

Open
madadam opened this issue May 4, 2023 · 0 comments
Open

Data loss due to a race condition in merge #118

madadam opened this issue May 4, 2023 · 0 comments

Comments

@madadam
Copy link
Collaborator

madadam commented May 4, 2023

A directory content and its version vector can get out of sync which could lead to a two conflicting versions of a directory to be incorrectly merged losing the changes from one of the versions. This is because the directory content is a snapshot - when a directory is opened the content is loaded from the db and then stored in a member variable. On the other hand, it's version vector is always freshly loaded from the db. This can cause them to get out of sync which can result in incorrect merge. Consider this scenario:

  1. There are two branches, A (local) and B (remote) and a directory with a file in it. A's version of the file is happens-after B's
  2. We load both versions of the directory in order to merge them
  3. Then a new snapshot of B gets completed. This snapshot has changes to the file so the versions of the file are now concurrent.
  4. Then we load the versions vectors of the directories and calculate their merge
  5. We then proceed with merging the entries
  6. Because the directories were loaded before the new B snapshot was completed we don't see the concurrent versions. We see the A's version is happens-after the B's. So we keep the A's and discard the B's
  7. We then finalize the merge applying the merged version vector from step 6.
  8. A's branch now becomes happens-after B's and so we discard B
  9. The changes in B are now lost

A possible fix is to load the directory version vector in the same read transaction where the directory content is loaded and then store it in a member variable as well. This would prevent them from getting out of sync preventing the above scenario: in step 6 when the merged vv is calcualated we would use the old vvs instead of the new ones and so the resulting vv after the merge would be concurrent to the new vv of B and so B would not be pruned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant