Skip to content
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.

Failed async BookKeeper writes should cause peer to to restart #390

Closed
lbradstreet opened this issue Nov 15, 2015 · 3 comments
Closed

Failed async BookKeeper writes should cause peer to to restart #390

lbradstreet opened this issue Nov 15, 2015 · 3 comments

Comments

@lbradstreet
Copy link
Member

See https://github.com/onyx-platform/onyx/blob/0.8.x/src/onyx/state/log/bookkeeper.clj#L67

If a given write has failed, then the task's local state is no longer going to be in sync with the played back log, and now new log entries will be written. The peer should either rollback to the old state and create a new ledger, or should suicide, causing a new peer replay the state and start writing to a new ledger. The unacked messages will then be replayed.

I suggest we do the second first, then create a new issue to implement the first at some point. I believe implementing the first is worthwhile because in case of a partition we may not want all the grouping peers to restart at the same time, and would rather them attempt to recover. It may be tricky to do so however.

@lbradstreet
Copy link
Member Author

Given #410, maybe we should only restart if the write failed and we're still writing to the same ledger as the original write.

@lbradstreet lbradstreet modified the milestones: 0.8.4, 0.8.3 Dec 7, 2015
@lbradstreet lbradstreet removed this from the 0.8.4 milestone Jan 14, 2016
@lbradstreet
Copy link
Member Author

Confirmed to be an issue by jepsen.

lbradstreet added a commit that referenced this issue Jan 25, 2016
Also closes #500 by improving performance of write-take-batch
@lbradstreet
Copy link
Member Author

Fixed in 4d3684e.

lbradstreet added a commit that referenced this issue Jan 26, 2016
Also closes #500 by improving performance of write-take-batch
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant