Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(CommitLog): Do not truncate the log on out of order errors #588

Merged
merged 2 commits into from
Nov 29, 2023

Conversation

joshua-spacetime
Copy link
Collaborator

Description of Changes

See the main code comment.
I think until we have a better log format, or at the very least, better diagnostic tooling, we shouldn't ignore OutOfOrder errors.
It's not apparent that we'll see such errors in practice due to the the more lenient fsync policy.

API and ABI breaking changes

If this is an API or ABI breaking change, please apply the
corresponding GitHub label.

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

@joshua-spacetime joshua-spacetime requested a review from kim November 21, 2023 16:11
kim
kim previously requested changes Nov 21, 2023
Copy link
Contributor

@kim kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think you misread the code. We cannot actually distinguish between an out-of-order commit and corrupted data -- given the log is not appended to via other means than append_tx (which is properly synchronized), all occurrences of this error case will actually be a corrupt segment.

By not truncating, we will render the database unusable for those unfortunate enough.

@kim kim dismissed their stale review November 21, 2023 17:30

Let me think about this some more.

@kim
Copy link
Contributor

kim commented Nov 21, 2023

I think we can do the following:

  1. Check that Commit::decode completely consumed its input. If it didn’t, it’s a “false positive”, and we can treat it as data corruption.
  2. If the commit offset is not contiguous, also verify the parent hash. This does not prove the integrity of the current commit, but if verification fails it does prove that the current commit is faulty (i.e. treat as data corruption).
  3. Check that hash keys are in the object db. If an object is missing, treat as corruption.
  4. The remaining case is a non-contiguous commit, while everything else looks fine. Treat as application-level bug.

@kim
Copy link
Contributor

kim commented Nov 22, 2023

Well, 2. Is still ambiguous if we assume data can be corrupted for reasons other than a partial write. We should maybe consider introducing record-level checksums for 0.8, which is considered breaking.

@kim
Copy link
Contributor

kim commented Nov 22, 2023

@joshua-spacetime joshua-spacetime force-pushed the joshua/fix/fail-loud-on-out-of-order branch from 4f17218 to 46e06d5 Compare November 29, 2023 22:08
@joshua-spacetime joshua-spacetime enabled auto-merge (squash) November 29, 2023 22:11
@joshua-spacetime joshua-spacetime merged commit 7a69495 into master Nov 29, 2023
5 checks passed
@joshua-spacetime joshua-spacetime deleted the joshua/fix/fail-loud-on-out-of-order branch November 29, 2023 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants