-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R4R circuit breaker high level explanation #3898
Conversation
# Concepts | ||
|
||
The intention of the circuit breaker is to have a contingency plan for a | ||
running network which maintains network liveness. This can be achieved through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which desires to maintain network liveness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a contingency plan for handling possible state machine bugs which maintains the ability of governance to vote on what to do next, right? I don't think this has much to do with liveness "in general".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think it does! - There could be an extreme case of the circuit breaker where all messages besides governance messages are switched off... but I don't think all circuits need to lead to this state... why not switch off a particular message type which you know is the troublemaker? no need to shut down everything all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need much more detail prior to implementation I think, but thanks for starting!
# Concepts | ||
|
||
The intention of the circuit breaker is to have a contingency plan for a | ||
running network which maintains network liveness. This can be achieved through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a contingency plan for handling possible state machine bugs which maintains the ability of governance to vote on what to do next, right? I don't think this has much to do with liveness "in general".
Co-Authored-By: rigelrozanski <rigel.rozanski@gmail.com>
The circuit breaker is intended to be enabled through either: | ||
|
||
- governance, | ||
- the bonded validator group (for emergencies), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cwgoes (responding in a different conversation)
how is this different than the bonded validator set just stopping the consensus process?
This will have to be detailed in the actual spec, however, I imagine in makes sense for there to be consensus among the validator set through either config or some alternative special validator only consensus process which shuts off a message type for all validators (so now if you're a validator who had not been aware of the circuit breaker, but you received the information that the circuit breaker had been switched, you would nolonger accept those messages in blocks or by CheckTx)
OK; thanks - do you think we should write a detailed spec prior to implementation? I think so... @mossid were you previously working on a circuit breaker - what spec or design doc was that based on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just one comment!
The circuit breaker is intended to be enabled through either: | ||
|
||
- governance, | ||
- the bonded validator group (for emergencies), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest any set of accounts with varying power distribution which is selected by the state machine, instead of a partial validator set. For an emergency situation, there will not be enough time to make a full consensus from the entire validator set, but if the state machine selects some validators to give this permission, it will make the power distribution between the validators not proportional to the voting power and possibly create some vulnerability.
I'm not sure which is the best way to handle these emergency situations but denoting that the emergency permission is not restricted on the validators will leave more possibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes a lot of sense, I'll integrate into the documentation
The circuit breaker I was working on was specialized on governance proposals, it should be redesigned to cover emergency & invariant broke case |
…s/cosmos-sdk into rigel/circuit-breaker-docs
@alexanderbez @mossid let's merge unless new comments? this is just a starter doc |
Codecov Report
@@ Coverage Diff @@
## develop #3898 +/- ##
===========================================
+ Coverage 60.92% 60.95% +0.03%
===========================================
Files 192 192
Lines 14359 14360 +1
===========================================
+ Hits 8748 8753 +5
+ Misses 5037 5033 -4
Partials 574 574 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK as a starting point but this will need further discussion, especially if slated for 0.34.
just added a concepts documentation
Ref #926
Targeted PR against correct branch (see CONTRIBUTING.md)
Linked to github-issue with discussion and accepted design OR link to spec that describes this work.
Wrote tests
Updated relevant documentation (
docs/
)Added a relevant changelog entry:
sdkch add [section] [stanza] [message]
rereviewed
Files changed
in the github PR explorerFor Admin Use: