Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve guarantees of experimental flag to disable flush #11423

Closed
deepthidevaki opened this issue Jan 17, 2023 · 2 comments · Fixed by #11487
Closed

Improve guarantees of experimental flag to disable flush #11423

deepthidevaki opened this issue Jan 17, 2023 · 2 comments · Fixed by #11487
Assignees
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.7 Marks an issue as being completely or in parts released in 8.1.7 version:8.2.0-alpha4 Marks an issue as being completely or in parts released in 8.2.0-alpha4 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0

Comments

@deepthidevaki
Copy link
Contributor

Users can use experimental flag to disable raft flush to get better performance by compromising safety guarantees. When flush is disabled there is no guarantee that a committed record is persisted after a node crash. However, sometimes users want to choose performance over consistency and they are ok with partial data loss. But currently with disabled flush there are scenarios where the system end up in a non-recoverable state even if users accept partial data loss from the last flush. A partition's snapshot is recoverable only if the followup events until a particular position is available in the logs. If we disable flush, we cannot guarantee that those events are persisted to disk at the time snapshot is taken. In this case, Zeebe cannot recover from that snapshot and cannot process anything on that partition. It might require manual action to clean up that partition's data and the partition can suffer a full data loss.

To allow users to use this experimental flag, while accepting that there is a chance of partial data loss across restarts, we should ensure that the journal is flushed atleast when taking the snapshot. Thus we can ensure that even if data written after last flush is lost, the partition will be able to restart from the last snapshot.

@deepthidevaki deepthidevaki added the kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. label Jan 17, 2023
@megglos
Copy link
Contributor

megglos commented Jan 17, 2023

Additional thoughts from @npepinpe:
we could still think about this idea of providing varying level of guarantees, e.g.

  • at least flush on every snapshot
  • the above + flush when creating a new segment (aka flush a segment only after it's filled up)
  • the above + flush open segments every X seconds
  • the above + flush on every append before acknowledging (i.e. current approach)
  • the above + also maintain the setLastWrittenIndex for better crash resilience

Similarly, we can look into restore consistency levels:

  • Strict, aka distinguish between partial writes and corruption, and do not start on corruption detected
  • Lenient, aka do not distinguish between partial writes and corruption, and discard data on corruption of the last segment
  • None, aka discard data on corruption detection (not sure if that's even feasible considering the snapshots)

@npepinpe
Copy link
Member

As I said, maybe it's a bad idea at the moment to provide difference recovery guarantees 😄

ghost pushed a commit that referenced this issue Jan 25, 2023
11466: refactor(broker): refactor snapshot director for readability r=deepthidevaki a=deepthidevaki

## Description

The chain of actions required to take a snapshot is difficult to understand because of the inability to compose actor futures. In this PR, we refactored it with a rudimentary way of composing actor futures. I hope it is more readable this way. We have applied a similar technique in `BackupService`.

There is no behavior change. With this change, it is easy to allow multiple ongoing snapshots. But to keep the behavior as it is, we still do not allow multiple ongoing snapshot.

For #11423 , we have to flush the journal before the snapshot is committed. With this PR, it is easy to add one step before committing snapshot.


Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
ghost pushed a commit that referenced this issue Jan 25, 2023
11473: [Backport stable/8.1] refactor(broker): refactor snapshot director for readability r=deepthidevaki a=backport-action

# Description
Backport of #11466 to `stable/8.1`.

relates to #11423

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
@ghost ghost closed this as completed in 262042f Jan 30, 2023
ghost pushed a commit that referenced this issue Jan 30, 2023
11499: [Backport stable/8.1] Flush log before committing a snapshot if raft flush is disabled r=deepthidevaki a=backport-action

# Description
Backport of #11487 to `stable/8.1`.

relates to #11423

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>
@koevskinikola koevskinikola added version:8.2.0-alpha4 Marks an issue as being completely or in parts released in 8.2.0-alpha4 version:8.1.7 Marks an issue as being completely or in parts released in 8.1.7 labels Feb 8, 2023
@npepinpe npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.7 Marks an issue as being completely or in parts released in 8.1.7 version:8.2.0-alpha4 Marks an issue as being completely or in parts released in 8.2.0-alpha4 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants