-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safe node removal #4008
Safe node removal #4008
Conversation
safe_node_removal@48093 aka 20220728.7 vs main ewma over 20 builds from 47853 to 48083 Click to see tablemain
safe_node_removal
|
…into safe_node_removal
Some notes for Monday: Interestingly, the LTS fails with:
Occurring in a 2.0.4 node, which suggests that a new node is sending messages the old nodes are unable to cope with. I was initially suspicious that this may be because the nodes are removed from the table too early, and signature verification fails, but removing the node removal call reveals that is not the case! |
The LTS break is caused by the enum extension causing a deserialisation break on the older nodes. The fix, discussed with @eddyashton is to remove the additional state and use a boolean flag instead. |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation and see the Github Action logs for details |
Fix for #1713
retired_committed
node flag in the KV. Nodes are transitioned to this flag being set by the primary observing that theirRETIRED
state for them has been committed.GET /node/network/removable_nodes
endpoint to queryRETIRED && retired_committed
nodes - this allows the operator to find out what nodes can be safely shut downRETIRED && retired_committed
nodes from the KV to avoid KV growthWe may choose to go with a "fresh read" for step 2 instead, removing the need for step 3 and allowing us to elide the
retired_committed
step and go straight to removal. This involves turningGET /node/network/removable_nodes
into a write, for which operators would need to not forget to checktx/status
. This seems unergonomic right now.Once we have blocking transactions though, we probably want to revisit this.