Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardfork Initiation into a new era #4253

Merged
merged 12 commits into from
May 7, 2024
Merged

Conversation

lehins
Copy link
Collaborator

@lehins lehins commented Apr 10, 2024

Fixes in this PR:

Resolves #4006

  • We have a unified approach to update PParams throughout all eras starting with Shelley,
    thus making HFC combinator much more robust and correct.
  • We can remove duplicate logic from consensus that used to count up the genesis key holder votes: Utilize new PParams prediction functionality for HFC ouroboros-consensus#1078
  • We solve a problem where HFC is triggered in previous era, while TICK happens in the new
    era after translation. Which previously caused protocol version not being updated
    correctly in Conway, since protocol parameter update mechanism was vastly different from
    the one in Babbage.
  • We finish implementation of HardForkInitiation into a new era.
  • DRep pulser chunks are much more accurate now.

Description

Outline of the new approach.

Shelley through Babbage:

  1. Genesis key holders submit votes with proposals for new PParams. Depending on the
    timeing within the epoch those votes will either go into the current or the future
    proposals bucket.
  2. During the first 4k/f slots in the the PPUP rule PParamUpdate proposals added to
    current proposals. Besides keeping the proposals we also now keep potential protocol
    parameters in the future pparms, but only quorum of genesis votes is reached. Potential
    values for new PParams can change if genesis key holders change their votes during
    this period, therefore they cannot yet be considered stable.
  3. The very first TICK that happens during the last two stability windows before the end
    of the epoch we solidify the proposed PParams, thus ensuring they will be applied at
    the next epoch boundary. (See solidifyNextEpochPParams)
  4. At the epoch boundary during the NEWPP rule, instead of counting the votes on the
    proposals, we just looked up in the next PParams that were decided earlier and apply
    them. All of the future votes that where potentially submitted before during the last
    two stability windows of past epoch are converted to current votes, which will be
    treated in the same way as if they were submitted in the 2nd step. This also resets the
    future PParams for the next epoch.

Conway era forward

  1. Either ParameterChange or HardForkInitiation proposal is submitted and votes are
    collected.
  2. At the epoch boundary all of the votes and proposals are snapshotted.
  3. Pulser starts the work on every TICK figuring out the stake distribution for DReps,
    calculating the votes and ratifying the proposals. Whenever ParameterChange or
    HardForkInitiation gets ratified in the RATIFY rule, the new values are immediately
    applied to the future enact state by the ENACT rule. Therefore we have future
    PParams at the latest two stability windows before the end of the epoch.
  4. During the first 4k/f slots on every tick we also lazily update future PParams.
  5. Just as in Shelley era the very first TICK that happens during the last two stability
    windows (6k/f) before the end of the epoch we solidify the proposed PParams. (See
    solidifyNextEpochPParams). Unlike previous eras, in Conway this step is safe to do at
    any point during the initial part of the epoch, because they are considered stable as
    soon as we enter new epoch, however they are expensive to compute during that period,
    that is why we solidify them only when we are pretty confident that the DRep pulser is
    done and RATIFY with ENACT rules got a chance to be executed.
  6. At the epoch boundary we apply the new PParams that where solidified in the previous
    step and reset the future pparams, thus making it ready for the next epoch. The
    important part here is that we do not use the values from the Enact state directly, but
    we take the futurePParams as the source of truth. This allows us to correctly update
    the PParams not only using the voting process of Conway era, but allows us to apply
    the PParams update from Babbage era.

Forecast

It is very important that the TICKF rule does the same steps as the TICK and EPOCH (for
Conway) or NEWPP (for pre Conway) rules. In particular same solidification and rotation of
pparams process as in the stepsabove should happen for forcasting to work correctly.

Checklist

  • Commit sequence broadly makes sense and commits have useful messages
  • New tests are added if needed and existing tests are updated
  • When applicable, versions are updated in .cabal and CHANGELOG.md files according to the
    versioning process.
  • The version bounds in .cabal files for all affected packages are updated. If you change the bounds in a cabal file, that package itself must have a version increase. (See RELEASING.md)
  • All visible changes are prepended to the latest section of a CHANGELOG.md for the affected packages. New section is never added with the code changes. (See RELEASING.md)
  • Code is formatted with fourmolu (use scripts/fourmolize.sh)
  • Cabal files are formatted (use scripts/cabal-format.sh)
  • hie.yaml has been updated (use scripts/gen-hie.sh)
  • Self-reviewed the diff

@lehins lehins force-pushed the lehins/hardfork-initiation branch 2 times, most recently from 622a94e to cf5d63c Compare April 18, 2024 22:05
@lehins lehins force-pushed the lehins/hardfork-initiation branch 4 times, most recently from 3133e8c to cd5dd3d Compare April 26, 2024 04:24
@lehins lehins force-pushed the lehins/hardfork-initiation branch 4 times, most recently from 680be6d to 686c8fd Compare May 3, 2024 05:14
@lehins lehins marked this pull request as ready for review May 3, 2024 05:15
@lehins lehins requested review from aniketd, teodanciu and TimSheard May 3, 2024 05:18
Copy link
Contributor

@Lucsanszky Lucsanszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a first read through but I will definitely have to go through it again to fully understand it. It looks good to me so far but I will wait for the others to approve.

Copy link
Contributor

@teodanciu teodanciu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions from me.
This PR is so dense 🤯
Thank you for the comprehensive explanations in the haddocks and in the ADR!

eras/shelley/impl/src/Cardano/Ledger/Shelley/Governance.hs Outdated Show resolved Hide resolved
docs/adr/2024-04-30_008-pparams-update.md Show resolved Hide resolved
docs/adr/2024-04-30_008-pparams-update.md Show resolved Hide resolved
lehins added 6 commits May 7, 2024 09:32
And `futurePParamsShelleyGovStateL` and `sgsFuturePParams`

For now none of these are functional they are just placeholders
This commit makes UPEC rule obsolete, because it brings into the ledger
state the future value of the PParams, which no longer need to be
computed and can be just looked up in the state.

The future PParams are being updated on each vote and whenever future
proposals are being rotated to the current proposals.
There is a bug where PPUP rule would not update the PParams,
while consensus Hard Fork Combinator (HFC) would still think that the
update was successful.

This degenerate case could have only happened whenever there was a
ProposedPPUpdates that contained an update to major protocol version
that is expected to be for the new era as well as contained an invalid
update to at least one of the `ppMaxTxSizeL`, `ppMaxBHSizeL` or `ppMaxBBSizeL`
parameters.
One of the most important tasks of the `TICKF` rule is to accurately
forecast the PParams that will be in the future.
@lehins lehins force-pushed the lehins/hardfork-initiation branch 2 times, most recently from 492150a to 68996fb Compare May 7, 2024 15:55
lehins added 3 commits May 7, 2024 11:26
* Golden test for ShelleyGovState had to change due to addition of a new
  field

* Fix Shelley unit test by making sure the new future pparams field is
  updated correctly
@lehins lehins force-pushed the lehins/hardfork-initiation branch from 68996fb to 54dc232 Compare May 7, 2024 17:26
@lehins lehins enabled auto-merge May 7, 2024 19:55
@lehins lehins merged commit 71b8cb1 into master May 7, 2024
125 checks passed
@lehins lehins deleted the lehins/hardfork-initiation branch May 7, 2024 20:15
github-merge-queue bot pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Nov 4, 2024
Reverting the hacky approach of #366.

Closes #1239 by superseding it.

Addresses
IntersectMBO/cardano-ledger#4635 (comment).

# Justifying backwards-compatibility

This PR touches the Cardano ledger rules, concretely the logic for
translating a Babbage ledger state to a Conway ledger state. As the
Conway HF already happend on mainnet, it is crucial to argue why this
change retains backwards-compatibility with the historical chain.

## TL;DR

- The original reason for #366 was resolved by the refactoring in
IntersectMBO/cardano-ledger#4253, making the
hack here in Consensus unnecessary.
- The accidental side effects of #366 around pointer addresses were made
"official" in IntersectMBO/cardano-ledger#4647.

Therefore, it is fine to revert #366 without replacement.

## Detailed overview

### Context on HFC ledger ticking

When the HFC ticks a ledger state across an era boundary from A to B, it
does so via the "translate-then-tick" scheme:

1. First, the ledger state in A is translated into a ledger state in B.
2. Second, the ledger state is ticked to the target slot across the
epoch/era boundary, using the logic of era B.

For Cardano, the logic for these two operations lives in Ledger, or
rather, it *should* live in Ledger. However, in #366, we introduced a
temporary workaround/hack by modifying the translation logic from
Babbage to Conway to resolve
IntersectMBO/cardano-ledger#3491. This PR
reverts the hack, such that we now again directly/transparently call
Ledger logic.

### Chronology of changes to Babbage→Conway ticking

1. Mainnet era transitions are triggered by on-chain updates to the
major ledger protocol version. The logic for updating the ledger
protocol version lives, unsurprisingly, in the Ledger, and takes place
while ticking across an epoch boundary.

For `cardano-ledger-conway < 1.14` (that's significantly before any
version used in a node that was mainnet-ready for Conway), this logic
was broken on the era transition from Babbage to Conway, resulting in
IntersectMBO/cardano-ledger#3491, ie the
protocol version was *not* updated. Briefly[^1], the reason was that the
governance schemes of Babbage and Conway are completely different, which
caused issues because, as mentioned above, ticking across the
Babbage→Conway era/epoch boundary uses the logic of Conway, which
doesn't understand Babbage governance proposals, which were hence
discarded during the translation step.

2. The Consensus team decided[^2] to fix this issue via #366, which
updates the protocol version during the Babbage→Conway translation step
in an ad-hoc fashion, by temporarily ticking the *Babbage* ledger step
across the epoch/era boundary (yielding another Babbage ledger state),
and then setting the `GovState` (an era-specific ledger concept deep in
the ledger state, which in particular contains the current protocol
parameters, and hence the protocol version) of the unticked Babbage
ledger state to the one of the ticked Babbage ledger state, and then
proceeding as before.

Concretely, Babbage→Conway ticking now worked like this, starting with a
Babbage ledger state `l0` and a target slot `s`.

1. Tick `l0` just across the era/epoch boundary to get `l1` (a Babbage
ledger state).
2. Set the governance state of `l0` the the one of `l1` and get `l2` (a
Babbage ledger state).
     3. Translate `l2` into a Conway ledger state `l3`.
     4. Tick `l4` to `s` to get the final result.

3. A few months later, for `cardano-ledger-conway-1.14`, @lehins changed
in IntersectMBO/cardano-ledger#4253 how the way
how protocol parameters are updated in Ledger in a way that is nicely
compatible with the "translate-then-tick" scheme, see the
[ADR](https://github.com/IntersectMBO/cardano-ledger/blob/a02dc6eae44287e8a1ac67ffafb8a1ecc492128f/docs/adr/2024-04-30_008-pparams-update.md)
added in that PR for details[^3]. In particular, this would have allowed
us to revert #366 immediately, but we didn't do so, probably because we
saw now immediate motivation. (In retrospect, we should have done that
immediately.)

4. A few months later, the Conway HF happened on mainnet. Due to
investigating an unrelated serialization bug around pointer addresses
(IntersectMBO/cardano-ledger#4589), I realized
that not reverting #366 actually caused a slight difference in the
ledger rules, namely regarding stake delegations from pointer addresses
(also see
IntersectMBO/cardano-ledger#4635 (comment)).

Concretely, Ledger wants to get rid of pointer addresses as they are
considered to be a misfeature and a potential liability for future
projects like Leios (also see [this
ADR](https://github.com/IntersectMBO/cardano-ledger/blob/master/docs/adr/2022-12-05_005-remove-ptr-addresses.md)).
In Conway, stake delegations from pointer addresses are intentionally no
longer considered. In particular, this happens during the SNAP rule
while ticking, by invoking the
[`forgoPointerAddressResolution`](https://github.com/IntersectMBO/cardano-ledger/blob/a02dc6eae44287e8a1ac67ffafb8a1ecc492128f/eras/shelley/impl/src/Cardano/Ledger/Shelley/HardForks.hs#L67)
predicate on the current protocol version, branching on whether the
current major protocol version is larger than `8` (the last Babbage
major protocol version).

- Using cardano-node 9.1 (i.e. the node that everyone was on to go to
Conway), so with #366:

When ticking the translated Conway ledger state into Conway, the current
protocol version is `9` (the first Conway major protocol version), due
to the previous ad-hoc patching of the `GovState` previously as part of
the workaround from #366. Therefore, pointer addresses are *not*
resolved while updating the stake distribution.

     - If we had reverted #366 for cardano-node 9.1:

Because we directly translate the Babbage ledger state to Conway without
doing the `GovState` patching before, the current protocol version while
ticking is `8`, so pointer addresses *are* resolved.

Altogether, the stake distribution used for the leader schedule starting
in the second Conway epoch would have differed slightly (only very
little stake, exactly 100 ADA, has been delegated via pointer
addresses).

Crucially, this difference had a chance to occur only because Ledger did
*not* blank e.g. the `ptrMap` field in `IncrementalStake` during the
Babbage→Conway translation. (This is actually what caused the
serialization bug mentioned above.)

There would have been another, less relevant difference: Because the
current protocol parameters are updated *twice* with #366 (first during
the Babbage tick, and then again during the Conway tick), the *previous*
protocol parameters during the first Conway epoch are incorrectly equal
to the *current* protocol parameters. However, the previous protocol
parameters are only used for reward calculation, and reward calculation
doesn't care whether the major protocol version is `8` or `9`. So this
difference doesn't matter.

8. In a recent Ledger PR
IntersectMBO/cardano-ledger#4647, @lehins
modified the Babbage→Conway translation logic to blank out the pointer
addresses, e.g. `ptrMap` in `IncrementalStake`. This change landed in
Node 10.0.

Therefore, the difference described in 4. does not matter anymore, as
there no longer are any pointer addresses to resolve in Conway when
ticking (which happens *after* translating). Crucially, this enables us
to now revert #366 without replacement, because both before and after,
no pointer addresses are resolved for the stake distribution while
ticking from Babbage to Conway.

### Testing

I tested this on mainnet by starting from a Babbage ledger state and
evolving it via `db-analyser` to the first ledger state (slot
`134092810`) in the second Conway epoch using full block validation,
both with and without this PR. The resulting ledger states are
identical.

In the first Conway epoch, the ledger states differ, but only trivially
in the previous protocol parameters which has no effect as explained
above.

We *could* also write a component-level test for the pointer address
aspect, but that does not necessarily seem worth the cost/subtlety, as
this is a legacy feature already.

### Concluding thoughts

Generally, I think what we should take away from this is that we
*really* need proper specification and testing of what exactly should
happen at era boundaries, see
#418 and
IntersectMBO/cardano-ledger#4635, especially
because certain esoteric parts of the ledger state (like pointer
addresses) might not exist on any testnet.

[^1]: See "Why the status quo is problematic" in #339 for the details
(but ignore the rest of the issue).
[^2]: After a long process that considered/prototyped various
alternatives, but the details are not that relevant for this PR and the
PR description is already very long.
[^3]: Briefly, the logic that updates the protocol parameters on
cross-epoch ticking is no longer era-dependent; rather, it just sets the
protocol parameters to "future" ones that were decided on earlier by
era-specific logic. The insight is that this set of future protocol
parameters can be easily/cleanly translated from Babbage to Conway, and
the Conway ticking logic can apply them despite having no idea *how*
Babbage decided that these should be the next protocol parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HardFork into a new era
5 participants