Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4R circuit breaker high level explanation #3898

Merged
merged 4 commits into from
Mar 25, 2019
Merged

Conversation

rigelrozanski
Copy link
Contributor

just added a concepts documentation
Ref #926

  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added a relevant changelog entry: sdkch add [section] [stanza] [message]

  • rereviewed Files changed in the github PR explorer


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

# Concepts

The intention of the circuit breaker is to have a contingency plan for a
running network which maintains network liveness. This can be achieved through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which desires to maintain network liveness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a contingency plan for handling possible state machine bugs which maintains the ability of governance to vote on what to do next, right? I don't think this has much to do with liveness "in general".

Copy link
Contributor Author

@rigelrozanski rigelrozanski Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think it does! - There could be an extreme case of the circuit breaker where all messages besides governance messages are switched off... but I don't think all circuits need to lead to this state... why not switch off a particular message type which you know is the troublemaker? no need to shut down everything all the time

docs/spec/circuit-breaker/01_concepts.md Show resolved Hide resolved
docs/spec/circuit-breaker/01_concepts.md Outdated Show resolved Hide resolved
docs/spec/circuit-breaker/01_concepts.md Outdated Show resolved Hide resolved
docs/spec/circuit-breaker/01_concepts.md Outdated Show resolved Hide resolved
docs/spec/circuit-breaker/01_concepts.md Outdated Show resolved Hide resolved
Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need much more detail prior to implementation I think, but thanks for starting!

# Concepts

The intention of the circuit breaker is to have a contingency plan for a
running network which maintains network liveness. This can be achieved through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a contingency plan for handling possible state machine bugs which maintains the ability of governance to vote on what to do next, right? I don't think this has much to do with liveness "in general".

docs/spec/circuit-breaker/01_concepts.md Show resolved Hide resolved
docs/spec/circuit-breaker/01_concepts.md Outdated Show resolved Hide resolved
docs/spec/circuit-breaker/01_concepts.md Outdated Show resolved Hide resolved
Co-Authored-By: rigelrozanski <rigel.rozanski@gmail.com>
The circuit breaker is intended to be enabled through either:

- governance,
- the bonded validator group (for emergencies),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwgoes (responding in a different conversation)

how is this different than the bonded validator set just stopping the consensus process?

This will have to be detailed in the actual spec, however, I imagine in makes sense for there to be consensus among the validator set through either config or some alternative special validator only consensus process which shuts off a message type for all validators (so now if you're a validator who had not been aware of the circuit breaker, but you received the information that the circuit breaker had been switched, you would nolonger accept those messages in blocks or by CheckTx)

@rigelrozanski
Copy link
Contributor Author

@cwgoes this is not the spec! it's only a conceptual explanation based on our previous standup... I'd also gladly write a spec, Hasn't @mossid already started implementation? has that been put on hold?

@cwgoes
Copy link
Contributor

cwgoes commented Mar 15, 2019

@cwgoes this is not the spec! it's only a conceptual explanation based on our previous standup... I'd also gladly write a spec, Hasn't @mossid already started implementation? has that been put on hold?

OK; thanks - do you think we should write a detailed spec prior to implementation? I think so... @mossid were you previously working on a circuit breaker - what spec or design doc was that based on?

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly. Warrants a review from @jaekwon and @mossid

Copy link
Contributor

@mossid mossid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just one comment!

The circuit breaker is intended to be enabled through either:

- governance,
- the bonded validator group (for emergencies),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest any set of accounts with varying power distribution which is selected by the state machine, instead of a partial validator set. For an emergency situation, there will not be enough time to make a full consensus from the entire validator set, but if the state machine selects some validators to give this permission, it will make the power distribution between the validators not proportional to the voting power and possibly create some vulnerability.

I'm not sure which is the best way to handle these emergency situations but denoting that the emergency permission is not restricted on the validators will leave more possibilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes a lot of sense, I'll integrate into the documentation

@mossid
Copy link
Contributor

mossid commented Mar 20, 2019

The circuit breaker I was working on was specialized on governance proposals, it should be redesigned to cover emergency & invariant broke case

@rigelrozanski
Copy link
Contributor Author

@alexanderbez @mossid let's merge unless new comments? this is just a starter doc

@codecov
Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #3898 into develop will increase coverage by 0.03%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #3898      +/-   ##
===========================================
+ Coverage    60.92%   60.95%   +0.03%     
===========================================
  Files          192      192              
  Lines        14359    14360       +1     
===========================================
+ Hits          8748     8753       +5     
+ Misses        5037     5033       -4     
  Partials       574      574

Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK as a starting point but this will need further discussion, especially if slated for 0.34.

@cwgoes cwgoes merged commit 2ca86c8 into develop Mar 25, 2019
@cwgoes cwgoes deleted the rigel/circuit-breaker-docs branch March 25, 2019 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants