-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0047? | Hardfork safety mechanism #318
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,218 @@ | ||
--- | ||
CIP: ? | ||
Title: Hardfork Safeguard via Stake Representation | ||
Authors: Jared Corduan <jared.corduan@iohk.io> | ||
Status: Draft | ||
Type: Standards Track | ||
Created: 2022-06-30 | ||
License: CC-BY-4.0 | ||
--- | ||
|
||
## Simple Summary / Abstract | ||
|
||
This CIP proposes lifting a specific, manual safety check to the protocol. | ||
In particular, a new on-chain mechanism will replace the step in the off-chain | ||
hardfork procedure where the governance body gathers information about which | ||
stake pool operators have upgraded to the version of the software which | ||
supports an upcoming hardfork. | ||
|
||
Ever since the Shelley ledger era, block headers have included a protocol | ||
version indicating the maximum supported protocol version that the block | ||
producer is capable of supporting (see section 13, Software Updates, of the | ||
[Shelley ledger specification](https://hydra.iohk.io/job/Cardano/cardano-ledger/shelleyLedgerSpec/latest/download-by-type/doc-pdf/ledger-spec)). | ||
|
||
This (semantically meaningless) field provides a helpful metric for determining | ||
how many blocks will be produced after a hardfork, | ||
since nodes that have not upgraded will no longer produce blocks. | ||
(Nodes that have not upgraded will fail the `chainChecks` check from Figure 74 | ||
of the Shelley ledger specification, since the major protocol version in the | ||
ledger state will exceed the node's max major protocol version value, | ||
and hence can no longer make blocks.) | ||
|
||
If most of the blocks in the recent past (e.g. the last epoch) are | ||
broadcasting their readiness for a hardfork, | ||
we know that it is safe to propose an update to the major protocol version | ||
which triggers a hardfork. | ||
|
||
This CIP proposes automating this specific check, | ||
making the protocol version in the header semantically meaningful. | ||
The ledger state will determine the stake | ||
(represented as the proportion of the active stake) of all the block producers | ||
whose last block contained the next major protocol version. | ||
Moreover, a new protocol parameter `hardforkThreshold` will be used to reject | ||
any protocol parameter update that proposes to change the major protocol | ||
version but does not have enough backing stake. | ||
|
||
## Motivation / History | ||
|
||
Currently, the governance key holders collectively agree to increase the major | ||
protocol version. This allows them to make a human judgment as to the | ||
readiness of the network for a hardfork (using the mechanism described above). | ||
Since only a few, well-aligned parties are involved, this is currently easy to | ||
coordinate. As we move into the Voltaire phase, where the governance of the | ||
network is decentralized, | ||
it is imperative that we codify this human judgment in the protocol itself. | ||
|
||
## Specification | ||
|
||
### New Protocol Parameter | ||
|
||
There will be a new protocol parameter named `hardforkThreshold`, | ||
containing a rational number. | ||
|
||
The bounds of `hardforkThreshold` need to be considered with care | ||
so that unsafe values are not possible and to place checks and balances on the | ||
governance mechanism. | ||
The minimum value should greater than a half, and the maximum value should be | ||
less than one. | ||
The exact bounds need further, careful consideration. | ||
|
||
### Tracking Hardfork endorsements | ||
|
||
The ledger state will maintain a set of stake pool IDs corresponding to the | ||
block producers whose last block endorsed the next major protocol version. | ||
Endorsing here means that the major protocol version in the block header is | ||
exactly one more than the current major protocol version in the ledger state. | ||
Note that the protocol version in the block header is set by the particular | ||
cardano-node release being used by the block producer. | ||
When no hardfork is anticipated, the node will be configured to place the | ||
current major protocol version in the block header, indicating that the node | ||
is not ready for any hardfork. | ||
When a new release is introduced which can handle an upcoming hardfork, | ||
the node will be configured to use the next major protocol version in the | ||
block header. | ||
|
||
Note that there is no ambiguity regarding what the endorsement in the | ||
block header is referring to, since the major protocol version is only allowed | ||
to increase by one. | ||
Moreover, regardless of what update proposals the governance keys have | ||
proposed, each block header indicates that the corresponding block producer is | ||
either prepared for the major protocol version to increase, or that it is not | ||
prepared for it to increase. | ||
This CIP does not address the problems that arise from multiple versions of | ||
the software (potentially with different semantics) broadcasting the same | ||
major protocol version. These problems will have to be addressed as progress | ||
is made towards a full decentralized governance. | ||
|
||
Whenever the major protocol version is updated, the set of endorsements is | ||
reset to the empty set. | ||
|
||
In order to track the endorsements, the `TICK` ledger rule will need two items | ||
added to the environment (since the Shelley era, the `TICK` rule has had an | ||
empty environment). | ||
In particular, it will need the following from the block header: | ||
* The pool ID of the block producer | ||
* The major protocol version | ||
|
||
### Rejecting Updates | ||
|
||
The main point of the safeguard introduced in this CIP is the ability to reject | ||
protocol parameter updates which propose to increase the major protocol version | ||
when not enough block producers are prepared. | ||
The rejection will happen in both the consensus layer and the ledger layer. | ||
|
||
The timing of the rejection is critical, and requires understanding a bit about | ||
the timing of the hardfork combinator (see the diagram below). | ||
Ouroboros (Praos and Genesis) have a notion of a stability window, | ||
corresponding to the duration of slots after which the consensus mechanism will | ||
no longer roll back a block. | ||
The stability window is currently three tenths of the epoch length | ||
(36 hours on mainnet). | ||
The hardfork combinator requires that the changes to the ledger state which | ||
enact a hardfork (confirmed proposals to increase the major protocol version) | ||
be stable two stability windows before the end of the epoch. | ||
See section 17.4, Ledger restrictions, of the | ||
[consensus report](https://github.com/input-output-hk/ouroboros-network/tree/314845c4087bc6e662d7df0d376ab1910a5b5476/ouroboros-consensus/docs/report). | ||
Therefore protocol parameter updates for the next epoch boundary must be | ||
submitted during the first four tenths of the epoch. | ||
Call this first four tenths of each epoch the "proposal window" for the | ||
purposes of this document. | ||
The consensus layer | ||
[analyzes](https://github.com/input-output-hk/ouroboros-network/blob/314845c4087bc6e662d7df0d376ab1910a5b5476/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Ledger/Inspect.hs#L77-L96) | ||
the ledger state one stability window after the proposal window has ended to | ||
determine if the major protocol version will be increased at the next epoch | ||
boundary. | ||
The ledger itself does not apply the protocol parameter update until the | ||
epoch boundary. | ||
|
||
To apply the new safeguard, the consensus layer will now use new logic for | ||
determining if the major protocol will be increased, and the ledger will | ||
need to use the exact same logic on the epoch boundary. | ||
The new logic will take the same parameters that are currently being | ||
used to make the determination. | ||
See [protocolUpdates](https://github.com/input-output-hk/ouroboros-network/blob/314845c4087bc6e662d7df0d376ab1910a5b5476/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Ledger/Inspect.hs#L106-L110), | ||
and note that the set of endorsements, the stake distribution, and the | ||
protocol parameters will all included in what the consensus layer calls the | ||
`LedgerState`, and what the ledger layer calls the `NewEpochState` | ||
(the endorsements will be added to `LedgerState`, but the pool stake | ||
distribution and the protocol parameters are already included). | ||
|
||
The new logic for determining if the major protocol version will change is: | ||
* Has quorum been met on the proposed protocol parameter updates? | ||
* If not, there is nothing else to do. | ||
* If so, proceed. | ||
* Does the update modify the major protocol parameter version? | ||
* If not, the update will be applied on the epoch boundary, and there is | ||
nothing else to do. | ||
* If so, proceed. | ||
* What is the sum of the relative, active stake of the block producers listed | ||
in the endorsement set defined in the | ||
[previous section](#tracking-hardfork-endorsements)? | ||
Note that the stake distribution used here is the same as stake | ||
distribution currently being used for block production. | ||
* Is the sum computed above at least as large as the value of the | ||
`hardforkThreshold` protocol parameter? | ||
* If not, the entire update is rejected. | ||
* If so, the update will be applied on the epoch boundary. | ||
|
||
#### Timing diagram | ||
|
||
The following table illustrates the timing described above, | ||
using the durations on mainnet (five-day epochs). | ||
|
||
```mermaid | ||
sequenceDiagram | ||
participant s0 as Epoch Start | ||
participant s1 as 12 hrs | ||
participant s2 as 24 hrs | ||
participant s3 as 36 hrs | ||
participant s4 as 48 hrs | ||
participant s5 as 60 hrs | ||
participant s6 as 72 hrs | ||
participant s7 as 84 hrs | ||
participant s8 as 96 hrs | ||
participant s9 as 108 hrs | ||
participant sA as Epoch End | ||
|
||
s0->s4: Proposal Window | ||
s4->s7: Ledger state stabilization Window | ||
|
||
Note over s7: Consensus to<br>determine if<br>hardfork will occur | ||
Note over sA: Non-rejected<br>updates are applied<br>to ledger state | ||
``` | ||
|
||
## Rationale | ||
|
||
The safeguard presented in this CIP aligns very closely with the manual check | ||
currently performed | ||
today before any hardfork. | ||
Moreover, we have strived to make the minimal changes needed to automate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is actually only part of the criteria currently being used. SPO block creation ratio, defi TVL criterion, exchange adoption criterion. I'm not saying that the other two should be encoded here - but it would be reasonable to mention them in the Rationale There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I can make it more clear that this CIP is only aiming to automate one very specific check? I myself do not know the whole process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for that clarification. Yes, that would be very appreciated. I clearly read way too much into what you were intending to change. |
||
the check. | ||
|
||
## Backwards compatibility | ||
|
||
This change is not backwards compatible; it requires a hardfork. | ||
Since it only adds a new safeguard to the ledger rules, however, | ||
no changes are needed to the serialization or to any downstream components. | ||
|
||
## Path to Active | ||
|
||
A hardfork is required for these changes. | ||
A new ledger era is needed, containing the changes described. | ||
The consensus layer will require minimal changes, namely | ||
support for the new ledger era and adopting the new logic for determining if a | ||
hardfork is immanent. | ||
|
||
## Copyright | ||
|
||
This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem this doesn't address is what if you have competing proposals? The scheme described in this CIP only allows for sequential votes. The way this CIP is structure, I don't think this even allows you to propose two competing upgrades one after the other because the current structure doesn't have an expiry on upgrade proposals (other than maybe skipping version numbers if a version is deemed to have failed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are strict rules on how the protocol version can be increased:
https://github.com/input-output-hk/cardano-ledger/blob/7e2f674d2a2d14752d4c2d5abf60b26ae015b9e2/eras/shelley/impl/src/Cardano/Ledger/Shelley/PParams.hs#L519-L520
Either the major is increased by exactly one (and the minor reset to zero), or the minor is increased by exactly one (and the major remains unchanged).
Moreover, this proposal is only putting in a safeguard for hardforks (major number increase). So it is always clear what the broadcasted protocol version in the block header is referring to.
Maybe this is only clear if I also explain how the existing protocol parameter update system works? During the voting window, each goverance key can propose a change (they can submit multiple proposals, but the latest one overrides the previous) for the end of the current epoch. If quorum is met, the change happens, otherwise nothing happens and the voting state resets. After the voting window, the each goverance key can stage a vote for the next epoch, which behaves exactly as though they waited until the next epoch and placed a vote during the next voting window.
I think I've explained this above as well. The current structure has harsh expirations. or am I misunderstanding what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a sentence to the end of this paragraph, let me know if it's clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you actually addressed the real question - and this is related to my comment above.
Who gets to make such proposals. Let's say entity A wants to make a change to parameter X and therefore submits a proposal to change the protocol version to (m+1, 0) - and entity B wants to leave parameter X unchanged, but wants to change parameter Y instead. The also submit a proposal. Which protocol version would be assigned to that? How would a determination be made, which of the changes to proceed with? How would the SPOs indicate which of the proposals they endorse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there are two issues here:
The answer to the first question is:
Whether or not the governance system will move to change the protocol version to
(m+1, 0)
depends not just on A and B, but also on the other five governance entities (the quorum is 5 of 7 on mainnet). The current system is very basic: at least five of the keys must agree on the entire set of changes. So if entities A - G all want to change Y to 42, but only four of the entities want to change the protocol version to(m+1, 0)
, nothing is changed at all, not even Y.The answer to the second question is:
Regardless of what the governance body is doing, if you are a stake pool operator and you are aware of a software update that prepares a hard fork, let's say introducing protocol version to
(m+1, 0)
, you can:m+1
in your block headers (the new software will actually do this for you)m
in your block headers (the old software will actually do this for you)If not enough stake is backed by the bolck producers posting
m+1
, no update proposal can occur which changes the major version tom+1
, even if quorum is met and even if other protocol parameters we also slated to change.I can try to summarize this in the CIP, since clearly it's still not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried again to make this point more clear. Let me know if it is still murky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if 20% of SPO's upgrade, then a bug is found. 50% of the original upgraded SPO then install the new fixed major release and enough other SPO's do as well so the hard fork is successful. But 10% of the SPO's have the buggy prefix running but it has the same version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent point @WarriorField , and I'm embarrassed I did not think to address it in this CIP. It's far from just a theoretical concern, this came up quite recently, and this is what we did:
https://github.com/input-output-hk/cardano-node/blame/8832f86728ef6a425452b44f2f269acde149448c/cardano-node/src/Cardano/Node/Protocol/Cardano.hs#L201-L206
We fiddled with the minor version. And of course this only worked since the process is still manual. This CIP should address this, thank you!