Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Support.WriteBlock commits block to ledger asynchronously and can have up to one block in-flight. And there's possibility a node crashes before such block is persisted successfully. Normally when node restarts, Raft loads entries from WAL and attempts to re-apply them. However, when a snapshot is taken at this block, only entries after (if any) the snapshot are loaded, and we end up hanging here forever waiting for missing blocks to be pulled from nowhere in single node situation. A straightforward solution would be to peek into ledger tip first, and decide whether to load some "old" entries from WAL, instead of blindly load data after latest snapshot. Although it's trickier than it sounds: - today, we don't strictly respect the contract between Raft and state machine, where applied data should not be lossy and it's safe to prune data in WAL after snapshots. For example, in extreme case, if we lose the entire ledger, we should not expect it to be recoverable from WAL - etcd/raft persistence library does not provide friendly interfaces to control what data to load in fine-grained manner. For example, snap.Load() simply looks for latest snapshot available, and loads entries after that. If we'd like to, for example, load older data prior to that snapshot, we'll need to come up with our own utilities This commit aims to provide a quick fix for bug described in FAB-18244, leveraging the fact that we can have only one async block in-flight, and leave the "correct" solution to future work. Signed-off-by: Jay Guo <guojiannan1101@gmail.com>
- Loading branch information