Improve guarantees of experimental flag to disable flush #11423

deepthidevaki · 2023-01-17T07:41:21Z

Users can use experimental flag to disable raft flush to get better performance by compromising safety guarantees. When flush is disabled there is no guarantee that a committed record is persisted after a node crash. However, sometimes users want to choose performance over consistency and they are ok with partial data loss. But currently with disabled flush there are scenarios where the system end up in a non-recoverable state even if users accept partial data loss from the last flush. A partition's snapshot is recoverable only if the followup events until a particular position is available in the logs. If we disable flush, we cannot guarantee that those events are persisted to disk at the time snapshot is taken. In this case, Zeebe cannot recover from that snapshot and cannot process anything on that partition. It might require manual action to clean up that partition's data and the partition can suffer a full data loss.

To allow users to use this experimental flag, while accepting that there is a chance of partial data loss across restarts, we should ensure that the journal is flushed atleast when taking the snapshot. Thus we can ensure that even if data written after last flush is lost, the partition will be able to restart from the last snapshot.

megglos · 2023-01-17T08:17:36Z

Additional thoughts from @npepinpe:
we could still think about this idea of providing varying level of guarantees, e.g.

at least flush on every snapshot
the above + flush when creating a new segment (aka flush a segment only after it's filled up)
the above + flush open segments every X seconds
the above + flush on every append before acknowledging (i.e. current approach)
the above + also maintain the setLastWrittenIndex for better crash resilience

Similarly, we can look into restore consistency levels:

Strict, aka distinguish between partial writes and corruption, and do not start on corruption detected
Lenient, aka do not distinguish between partial writes and corruption, and discard data on corruption of the last segment
None, aka discard data on corruption detection (not sure if that's even feasible considering the snapshots)

npepinpe · 2023-01-17T08:18:26Z

As I said, maybe it's a bad idea at the moment to provide difference recovery guarantees 😄

11466: refactor(broker): refactor snapshot director for readability r=deepthidevaki a=deepthidevaki ## Description The chain of actions required to take a snapshot is difficult to understand because of the inability to compose actor futures. In this PR, we refactored it with a rudimentary way of composing actor futures. I hope it is more readable this way. We have applied a similar technique in `BackupService`. There is no behavior change. With this change, it is easy to allow multiple ongoing snapshots. But to keep the behavior as it is, we still do not allow multiple ongoing snapshot. For #11423 , we have to flush the journal before the snapshot is committed. With this PR, it is easy to add one step before committing snapshot. Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>

11473: [Backport stable/8.1] refactor(broker): refactor snapshot director for readability r=deepthidevaki a=backport-action # Description Backport of #11466 to `stable/8.1`. relates to #11423 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>

11499: [Backport stable/8.1] Flush log before committing a snapshot if raft flush is disabled r=deepthidevaki a=backport-action # Description Backport of #11487 to `stable/8.1`. relates to #11423 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com> Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@users.noreply.github.com>

deepthidevaki added the kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. label Jan 17, 2023

megglos added the backport stable/8.1 label Jan 17, 2023

megglos assigned megglos and deepthidevaki Jan 19, 2023

deepthidevaki mentioned this issue Jan 24, 2023

refactor(broker): refactor snapshot director for readability #11466

Merged

backport-action mentioned this issue Jan 25, 2023

[Backport stable/8.1] refactor(broker): refactor snapshot director for readability #11473

Merged

This was referenced Jan 26, 2023

Flush log before committing a snapshot if raft flush is disabled #11487

Merged

Configurable raft flush interval #11488

Closed

ghost closed this as completed in 262042f Jan 30, 2023

backport-action mentioned this issue Jan 30, 2023

[Backport stable/8.1] Flush log before committing a snapshot if raft flush is disabled #11499

Merged

deepthidevaki mentioned this issue Feb 1, 2023

Promote disable raft flush flag to non-experimental #11519

Closed

koevskinikola added version:8.2.0-alpha4 Marks an issue as being completely or in parts released in 8.2.0-alpha4 version:8.1.7 Marks an issue as being completely or in parts released in 8.1.7 labels Feb 8, 2023

npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve guarantees of experimental flag to disable flush #11423

Improve guarantees of experimental flag to disable flush #11423

deepthidevaki commented Jan 17, 2023

megglos commented Jan 17, 2023

npepinpe commented Jan 17, 2023

Improve guarantees of experimental flag to disable flush #11423

Improve guarantees of experimental flag to disable flush #11423

Comments

deepthidevaki commented Jan 17, 2023

megglos commented Jan 17, 2023

npepinpe commented Jan 17, 2023