-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass continuation for handling ledger events #402
base: main
Are you sure you want to change the base?
Changes from all commits
eb18a35
852ae3d
971c1c9
1026b39
57f56e9
34d185e
68260b6
26cfb96
808b7f6
e39f524
c478377
8c139ed
2fe143f
1ffb2fa
ab42867
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# How to provide ledger events to client apps | ||
|
||
## Introduction | ||
|
||
On Thursday Oct 12, we discussed [PR #402](https://github.com/input-output-hk/ouroboros-consensus/pull/402) during the Consensus Office Hours. | ||
@KtorZ @koslambrou @kderme @nfrisby @amesgen did most of the talking. | ||
UTxO HD came up, so @jasagredo likely also would have been in that list if he hadn't been on a bank holiday. | ||
|
||
This document captures the key points of the discussion. | ||
|
||
## Executive summary | ||
|
||
- The Consensus Team considers this PR itself innocuous, because we don't anticipate it being difficult to review or to maintain or to design around (famous last words?). | ||
- However, there is one axis along which we think it is under-specified, and so our above judgement would be revisited after _the exact necessary semantics_ of `rnHandleLedgerEvent` are better identified. | ||
- During the call we discussed some possible semantics of `rnHandleLedgerEvent`. | ||
The "simple" one (which _we think_ this PR currently satisfies) would be easy for Consensus to maintain, but offloads a significant amount of complexity to the ultimate client. | ||
The "complex" one relieves the clients of that burden, but would require a significantly larger PR here. | ||
- We are concerned that the "simple" semantics would let us merge this PR, but then _eventually_ no one would actually use the `rnHandleLedgerEvent` interface because the required client complexity is too hard. | ||
We are hesitant to expand the Consensus interface (even with the "simple" semantics) if there's a clearly-plausible path to it becoming dead code in the near-future (there's only so much dev power available, it should ideally be allocated to long-lived aka "long-term" solutions). | ||
|
||
## Requirements | ||
|
||
Any complete solution for the needs of the relevant client applications _must_ handle all three of the following tasks. | ||
|
||
- staying synced with a node when its selection gains some blocks | ||
- staying synced with a node when its selection loses some blocks, aka rollbacks | ||
- becoming synced with a node | ||
- block-producing nodes must be able to completely disable this feature, with no residual cost | ||
|
||
The primary desiderata are as follows. | ||
|
||
- The client can be implemented in any language (ie without having to re-implement the ledger rules, which are currently only available in Haskell). | ||
- The client has access to the per-block ledger events, or at least the ones known to be desirable (eg calculated rewards and calculated stake distributions). | ||
- The client need not hold its own copy of the ledger state in memory. | ||
(UTxO HD is related, but does not necessarily completely remove this concern.) | ||
|
||
## Solution Space | ||
|
||
Today's primary options are as follows. | ||
|
||
- Use ChainSync to follow the chain, and maintain a second copy of the ledger state. | ||
- REQUIREMENT VIOLATION: This can only be implemented in Haskell. | ||
- REQUIREMENT VIOLATION: This violates the desideratum regarding holding second copy of the ledger state in memory. | ||
As of UTxO HD, the major portions of those two ledger states would be on disk instead of memory, but there'd still need to be two copies (later development could _maybe_ allow sharing the disk part). | ||
- To handle rollbacks, the client would also need to maintain its own copy of Consensus's `LedgerDB` abstraction. | ||
- (The [`foldBlocks`](https://input-output-hk.github.io/cardano-node/cardano-api/lib/Cardano-Api-LedgerState.html#v:foldBlocks) approach does this, but only for the immutable chain.) | ||
|
||
- Piggy-back on a `db-sync` instance. | ||
- REQUIREMENT VIOLATION: This violates the desideratum regarding holding second copy of the ledger state in memory. | ||
- @kderme remarked that he and @jasagredo charted a path towards using UTxO HD to ameliorate the memory cost, as sketched in the ChainSync option above. | ||
- @KtorZ remarked that `db-sync` currently requires hundreds of gigabytes of disk, so that is also client app burden for this approach. | ||
|
||
- Rely on PR #402's `rnHandleLedgerEvent` et al. | ||
- REQUIREMENT VIOLATION: This violates the desideratum about becoming synced with a node. | ||
The only workaround at the moment is to restart the node and force it to re-apply whichver historical blocks the client needs to process. | ||
Moreover, currently, the only way to force the node to do so is to delete certains parts of its on-disk state! | ||
- To handle rollbacks, the client would also need to maintain something akin to Consensus's `LedgerDB` abstraction, but more lightweight since its elements need not be entire Cardano ledger states. | ||
- @kderme also remarked that `db-sync` only uses the ledger events for so much, and so still relies on the ledger state itself for the rest; this suggests that the blocks and their ledger events alone might not suffice for clients. | ||
|
||
The envisioned longer-term solutions are as follows. | ||
|
||
- Enrich the node to permanently store ledger events alongside blocks. | ||
- Then the ChainSync approach above could simply serve ledger events alongside the blocks. | ||
- This solution would very explicitly violate the design principles of the node, since this is non-trivial logic in the node that is absolutely not required for the core functionality of the blockchain itself. | ||
- Also: what happens if the specification of ledger events for some block changes (eg a new ledger event is added)? | ||
|
||
- If a client is already maintaing the UTxO map (as does `db-sync`), then the release of UTxO HD might allow that app to avoid the UTxO-related heap costs of the ledger state---it could implement its own UTxO HD `BackingStore`. | ||
- REQUIREMENT VIOLATION: This can only be implemented in Haskell. | ||
|
||
- Release a minimal proxy that uses the existing ChainSync approach above in order to maintain a database mapping blocks (ie slot and hash) to ledger events. | ||
- This technically violates the second copy of ledger state in memory requirement. | ||
However, the database should be reused among clients, which would amortize that RAM cost. | ||
- This proxy and its database would be another point of failure in the client's architecture. | ||
- However, the tool's code and its deployment scripts should be _very_ simple. | ||
And the overall community could maintain redundant instances for the sake of resiliency. | ||
- Still, the running client app would need to trust the database. | ||
- Also: what happens if the specification of ledger events for some block changes (eg a new ledger event is added)? | ||
|
||
## Action Items | ||
|
||
All of these relate to the solution based on PR #402. | ||
|
||
- @KtorZ will write up documentation for client-app developers to read that explains the system's guarantees, which are the requirements for the Cardano code. | ||
- @nfrisby asked that it be roughly divided into four parts: burdens on the Ledger Team (eg they will likely need to handle requests to extend/refine ledger events), burdens on the Consensus Team (eg maintaining sufficient behavior of `rnHandleLedgerEvent`), burdens on the Node Team (eg the new TCP endpoint), and everything else---to whatever extent would not interfere with that document remaining useful to the client app developer. | ||
- @nfrisby didn't ask this during the call but should have: requirements on the user code (eg don't DoS that TCP endpoint!). | ||
- One point of emphasis in that document would be guidance for how the client should recognize and handle rollbacks. | ||
Such logic in the client is required unless the only ledger event parts the client needs are "stable", in which case rollbacks can't change them. | ||
- @kderme also suggested that this should explain how we foresee the ledger events themselves evolving (eg do client developers open PR's against `cardano-ledger`?). | ||
(So far, all ledger events have arisen from the `db-sync` Team requesting them.) | ||
|
||
- The Ledger Team should be consulted regarding how the ledger events will evolve once they become so much more accessible to the community. | ||
(See the content @kderme suggested for the document in the first action item.) | ||
|
||
- If all anticipated clients only require stable ledger events, then perhaps there's an even simpler point in the solution space. | ||
(See the rollback-insensitive/stable content for the document in the first action item.) | ||
|
||
- We posited providing an alternative tool (roughly akin to a `db-analyser` pass) that could allow the client to obtain the ledger events of historical blocks without forcing the node to re-apply them. | ||
Instead of terminating the node, hacking its disk-state, and restarting the node, the client would instead "terminate the node, run this tool, restart the node". | ||
Note that the tool only needs to process blocks from the ImmutableDB, since the node always re-applies VolatileDB blocks when it restarts. | ||
It therefore should be relatively simple, but a prototype would be wise. | ||
|
||
The tool would also maintain a ledger state, but crucially the node and the tool would never run at the same time. | ||
If the historical block of interest is older than any of the node's ledger state snapshot files, then it would have to re-apply all blocks since Genesis, which can take hours. | ||
That should be tenable, as along as the client therefore avoids falling out of sync often. | ||
(TODO maybe the client could somehow keep the node's latest ledger state snapshot files while it was in sync alive, for later use when invoking the tool?) | ||
|
||
It would be preferable to provide this tool instead of making it easier to abuse the node for the sake of re-calculating the ledger events of historical blocks. | ||
|
||
In this variation of PR #402 et al, the specification of `rnHandleLedgerEvent` is something like "If the TCP stream of events is idle, then the events for at least every block on the current chain have been previously emitted by this node process." | ||
That should be easy enough for the Consensus design to ensure moving forward, since the Consensus Layer itself currently must reapply blocks when it switches to a different chain, even if it had previously selected-and-switched-away-from those blocks---it doesn't cache anything about them except the crypto checks (ie re-apply versus apply). | ||
|
||
- Esgen noted that perhaps the same out-of-sync recovery tool could be used to handle an unexpectedly-deep rollback. | ||
The client would still have to handle rollbacks any common depth (eg up to 10 blocks), but perhaps that's significantly easier to do than handling rollbacks up to k=2160, since deep rollbacks have so far been rare on Cardano mainnet. | ||
Especially in languages other than Haskel, it might be burdensome for the LedgerDB-esque abstraction to rely on sharing to avoid multiplying the client's state's heap size by the depth of handle-able roll back. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,6 +71,7 @@ import Ouroboros.Consensus.Config.SupportsNode | |
import Ouroboros.Consensus.Fragment.InFuture (CheckInFuture, | ||
ClockSkew) | ||
import qualified Ouroboros.Consensus.Fragment.InFuture as InFuture | ||
import Ouroboros.Consensus.Ledger.Basics (LedgerEventHandler) | ||
import Ouroboros.Consensus.Ledger.Extended (ExtLedgerState (..)) | ||
import qualified Ouroboros.Consensus.Network.NodeToClient as NTC | ||
import qualified Ouroboros.Consensus.Network.NodeToNode as NTN | ||
|
@@ -180,6 +181,9 @@ data RunNodeArgs m addrNTN addrNTC blk (p2p :: Diffusion.P2P) = RunNodeArgs { | |
|
||
-- | Network PeerSharing miniprotocol willingness flag | ||
, rnPeerSharing :: PeerSharing | ||
|
||
-- | An event handler to trigger custom action when ledger events are emitted. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding the long-term plan. I think this data flow would be unnecessary if the the ChainDB instead stored the ledger events alongside each block. Then those events could simply be served alongside the (potentially immutable) blocks (and rollback messages!) by the local ChainSync client. But that would of course require the additional on-disk database that the PR description posited. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I wonder of the effort should be spent there instead of on what currently seems to me to be a stop-gap measure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What do you mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While a dedicated database alongside the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And having this solution in place does not preclude working on improvements, eg. storing the events and serving them through a custom protocol or alongside the |
||
, rnHandleLedgerEvent :: LedgerEventHandler m (ExtLedgerState blk) blk | ||
} | ||
|
||
-- | Arguments that usually only tests /directly/ specify. | ||
|
@@ -342,8 +346,13 @@ runWith RunNodeArgs{..} encAddrNtN decAddrNtN LowLevelRunNodeArgs{..} = | |
, ChainDB.cdbVolatileDbValidation = ValidateAll | ||
} | ||
|
||
chainDB <- openChainDB registry inFuture cfg initLedger | ||
llrnChainDbArgsDefaults customiseChainDbArgs' | ||
chainDB <- openChainDB rnHandleLedgerEvent | ||
registry | ||
inFuture | ||
cfg | ||
initLedger | ||
llrnChainDbArgsDefaults | ||
customiseChainDbArgs' | ||
|
||
continueWithCleanChainDB chainDB $ do | ||
btime <- | ||
|
@@ -567,7 +576,8 @@ stdWithCheckedDB pb databasePath networkMagic body = do | |
|
||
openChainDB | ||
:: forall m blk. (RunNode blk, IOLike m) | ||
=> ResourceRegistry m | ||
=> LedgerEventHandler m (ExtLedgerState blk) blk | ||
-> ResourceRegistry m | ||
-> CheckInFuture m blk | ||
-> TopLevelConfig blk | ||
-> ExtLedgerState blk | ||
|
@@ -576,8 +586,8 @@ openChainDB | |
-> (ChainDbArgs Identity m blk -> ChainDbArgs Identity m blk) | ||
-- ^ Customise the 'ChainDbArgs' | ||
-> m (ChainDB m blk) | ||
openChainDB registry inFuture cfg initLedger defArgs customiseArgs = | ||
ChainDB.openDB args | ||
openChainDB handleLedgerEvent registry inFuture cfg initLedger defArgs customiseArgs = | ||
ChainDB.openDB handleLedgerEvent args | ||
where | ||
args :: ChainDbArgs Identity m blk | ||
args = customiseArgs $ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
{-# LANGUAGE DeriveFunctor #-} | ||
{-# LANGUAGE DeriveTraversable #-} | ||
{-# LANGUAGE FlexibleContexts #-} | ||
{-# LANGUAGE NamedFieldPuns #-} | ||
{-# LANGUAGE TypeFamilies #-} | ||
|
||
-- | Definition is 'IsLedger' | ||
|
@@ -19,6 +20,9 @@ module Ouroboros.Consensus.Ledger.Basics ( | |
, castLedgerResult | ||
, embedLedgerResult | ||
, pureLedgerResult | ||
, LedgerEventHandler(..) | ||
, discardEvent | ||
, natHandler | ||
-- * Definition of a ledger independent of a choice of block | ||
, IsLedger (..) | ||
, LedgerCfg | ||
|
@@ -165,6 +169,24 @@ class ( -- Requirements on the ledger state itself | |
applyChainTick :: IsLedger l => LedgerCfg l -> SlotNo -> l -> Ticked l | ||
applyChainTick = lrResult ..: applyChainTickLedgerResult | ||
|
||
-- | Handler for ledger events | ||
newtype LedgerEventHandler m l blk = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @KtorZ Had to add the additional There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's odd. Why can't the previous header hash simply be a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There is a structural difference: the previous hash might be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah. Correct. Bummer. |
||
LedgerEventHandler | ||
{ handleLedgerEvent | ||
:: ChainHash blk -- Previous block header hash | ||
-> HeaderHash l -- Block header hash of the applied block | ||
-> SlotNo -- Slot number of the applied block | ||
-> BlockNo -- Applied block number | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is becoming crowded here. I believe that we can replace |
||
-> [AuxLedgerEvent l] -- Resulting 'AuxLedgerEvent's after applying `applyBlock`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @KtorZ Changed it to a list so that clients and know if they missed an event or if a rollback occured. |
||
-> m () | ||
} | ||
|
||
natHandler :: (m () -> n ()) -> LedgerEventHandler m l blk -> LedgerEventHandler n l blk | ||
natHandler nat LedgerEventHandler{handleLedgerEvent} = LedgerEventHandler (\ph h s bn -> nat . handleLedgerEvent ph h s bn) | ||
|
||
discardEvent :: Applicative m => LedgerEventHandler m l blk | ||
discardEvent = LedgerEventHandler { handleLedgerEvent = \_ _ _ _ _ -> pure () } | ||
|
||
{------------------------------------------------------------------------------- | ||
Link block to its ledger | ||
-------------------------------------------------------------------------------} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This current diff is invoking this whenever chain selection validates a block. It's unclear to me that that is the most useful thing to do.
As such, would you reword and expand this comment to specify (in plain terms that your anticipated users of this feature would be likely to correctly interpret---not just eg Consensus team members) exactly which ledger events will be passed to this handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, I would want to double-check whether the events for blocks on A be emitted twice times if a node switches from A to B and then back to an extension of A. And is that the desired behavior?
Does the client need to be informed of blocks whose events were previously emitted being rolled back? Or are they suppose to track that on their own, based on the emitted hashes and slots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they definitely need events for the volatile blocks? Or could you instead side-step these questions by only even emitting events for a block when it becomes part of the immutable chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, although the validation is what's driving the addition of blocks to the volatile db. The indented behavior here really is to emit an event when a block is added to our local db.
Not sure to understand the question. This will pass ALL ledger events as emitted by the ledger. What events depend on the era and the relative time within the epoch. Are you asking to list all those in the comments here? (at the risk of creating discrepancy as soon as the ledger adds / removes events?) -- We could perhaps link to the relevant section of the ledger?
That is the desired behavior. In this model, we let clients cope with this and that's why events are emitted alongside a block header hash and a slot number. With these added information, clients should be able to figure out by themselves that a chain switch has occurred and that they need to rollback some of their internal states.
That's a good question which can only be answered by use-cases I believe. Only relying on the immutable means that we are always lagging ~18h in the past. This may or may not sufficient depending on the use case (and I need to give it some thoughts for my own use case 🤔 ...). I believe that emitting events for volatile blocks is the most flexible. If clients need the immutability, they can wait for k blocks. The opposite isn't possible if we only emit events for immutable blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the ledger events from "exactly which" blocks will be passed to the handler. EG you could say "every block the Consensus Layer chooses to validate". However, that's probably too opaque for the end-user. So how to describe it more clearly than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dispatching events to the consumer at the point where chain selection occurs is the behaviour we want. In this current design, we want something that's minimally intrusive and therefore lay the burden of storing, indexing, and managing events to the client, providing them "all" the relevant information, eg. the block/slot and the event itself.
In any case, clients of a blockchain need to be able to understand and correctly handle rollbacks and forks. The block/slot index makes it easy to correlate the data of a block as provided by
ChainSync
and the events this block generated.Not sure if this answers your questions @nfrisby :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so. I can imagine the intended spec is something along the lines:
At the moment, though it will currently (with this diff) also include events from blocks that were never (yet) selected, since currently the node does validate blocks from the near-future even before they are selectable 😬 (that's something we're working to stop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfrisby we've been quite careful to not wire in the event handler in functions like
chainSelectionForFutureBlocks
andaddBlockAsync
to avoid precisely what you're describing.Is this "pointless" in the sense that the selection of future blocks will still happen through other execution paths?