Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4R circuit breaker high level explanation #3898

Merged
merged 4 commits into from
Mar 25, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pending/improvements/sdk/926-circuit-breaker-
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
\#926 circuit breaker high level explanation
17 changes: 17 additions & 0 deletions docs/spec/circuit-breaker/01_concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Concepts

The intention of the circuit breaker is to have a contingency plan for a
running network which maintains network liveness. This can be achieved through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which desires to maintain network liveness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a contingency plan for handling possible state machine bugs which maintains the ability of governance to vote on what to do next, right? I don't think this has much to do with liveness "in general".

Copy link
Contributor Author

@rigelrozanski rigelrozanski Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think it does! - There could be an extreme case of the circuit breaker where all messages besides governance messages are switched off... but I don't think all circuits need to lead to this state... why not switch off a particular message type which you know is the troublemaker? no need to shut down everything all the time

selectively "pausing" functionality of specific modules on a running network.
rigelrozanski marked this conversation as resolved.
Show resolved Hide resolved
The circuit breaker is intended to be enabled through either:

- governance,
rigelrozanski marked this conversation as resolved.
Show resolved Hide resolved
- the bonded validator group (for emergencies),
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwgoes (responding in a different conversation)

how is this different than the bonded validator set just stopping the consensus process?

This will have to be detailed in the actual spec, however, I imagine in makes sense for there to be consensus among the validator set through either config or some alternative special validator only consensus process which shuts off a message type for all validators (so now if you're a validator who had not been aware of the circuit breaker, but you received the information that the circuit breaker had been switched, you would nolonger accept those messages in blocks or by CheckTx)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest any set of accounts with varying power distribution which is selected by the state machine, instead of a partial validator set. For an emergency situation, there will not be enough time to make a full consensus from the entire validator set, but if the state machine selects some validators to give this permission, it will make the power distribution between the validators not proportional to the voting power and possibly create some vulnerability.

I'm not sure which is the best way to handle these emergency situations but denoting that the emergency permission is not restricted on the validators will leave more possibilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes a lot of sense, I'll integrate into the documentation

- special transaction (which proves how expected behaviour is broken).

## Pause state

The basic pause state of any module simply disables all message routes to
that module. Beyond that, it may be a appropriate for different modules to
process begin-block/end-block in an altered "safe" way.