Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass continuation for handling ledger events #402

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

KtorZ
Copy link
Contributor

@KtorZ KtorZ commented Oct 4, 2023

Note

This works goes is complementary to the work done in cardanosolutions/cardano-node. In particular, the following pieces:

See commit history for details.

Problem

The Cardano ledger rules are complex and aren't getting easier. This complexity stems from the rules themselves and the various patches that occurred during the development lifecycle (e.g. intra-era hard forks). Some of the information computed by the ledger is made readily available to client applications through the local-state-query protocol.

However, this comes with a few limitations:

  1. The local-state query can only query information that the ledger has. Yet, the ledger can only rewind up to $\frac{3 \times k}{f}$ slots in the past. This pertains to the security of the consensus protocol and the need to roll back to some old state -- but not too old. Accessing historical data from the ledger is, therefore, not possible.

  2. The information exposed via the local-state-query protocol is already aggregated in a way suitable for the ledger but not necessarily for client applications.

To cope with (1) and (2), the ledger eventually introduced Ledger Events into the small-steps semantic. For a couple of eras, the ledger emits them as it validates blocks. Events contain various kinds of useful information. However, at the moment, ledger events are only available to clients who have access to the entire ledger state in a fold-like Haskell API: foldBlocks. This is, in particular, used by cardano-db-sync.

Yet, this method is inconvenient for it requires client applications to hold a copy of the ledger state in memory and redo all calculations. This is rather inconvenient, even for Haskell applications, as it dramatically increases the resources needed by that application (as of today, the ledger-state(s) on mainnet requires 13-15GB of RAM!). Moreover, it is simply unusable outside of the Haskell landscape, for it is a Haskell-only interface.

Solution

Command-sourcing vs. Event-sourcing

From the point of view of software architecture, the Cardano ledger (and node) are designed as a Command-sourced system: What's persisted is a sequence of blocks which act as commands whereby each block applied on ledger state yields a new version of the ledger state. This works because the rules that govern application of blocks on ledger state are deterministic thus guaranteeing application of the same sequence of blocks on the same starting point will yield the same final state on any system running nodes with the same version.

This is great but is has one drawback: In order to compute the state-transition function, fully or partially, one has to know the exact set of rules to implement, eg. one has to run a ledger which is not a trivial thing to do as it requires a lot of resources (see above) and is not easily portable.

Events on the other hand, do not require such knowledge because they are its results: There's nothing more to compute, they are the direct result of the consensus and do not need to be verified. They are therefore intrinsically easier to understand, more portable, can be interpreted partially in any way the client sees fit, etc.

The proposed change combines the benefits of both approaches making Cardano even more flexible and open.

State of affairs

Since the ledger/node already calculates and emits those events -- since, by nature, it has access to the ledger state, our idea is to bubble up those events into a language-agnostic interface usable by local clients. We observed that events were currently discarded by the consensus layer, which remains the main driver of the block application. From the Abstract API of the consensus, we see the following:

https://github.com/input-output-hk/ouroboros-consensus/blob/49c7f76175b431ba4e9d16aa959db234cc6772bd/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Ledger/Basics.hs#L71-L75

https://github.com/input-output-hk/ouroboros-consensus/blob/49c7f76175b431ba4e9d16aa959db234cc6772bd/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Ledger/Abstract.hs#L80-L89

LedgerResult contains both the new ledger state (a, instantiated later) and a list of all events resulting from this block application. However, those events are, in fact, discarded by the actual implementation of the abstract interface:

https://github.com/input-output-hk/ouroboros-consensus/blob/49c7f76175b431ba4e9d16aa959db234cc6772bd/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Update.hs#L114-L127

https://github.com/input-output-hk/ouroboros-consensus/blob/49c7f76175b431ba4e9d16aa959db234cc6772bd/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Ledger/Abstract.hs#L160-L166

Ledger events continuation

Using a continuation-passing style approach, we want to provide a handler which can be executed for each event. Using a continuation, we can easily turn off any event handling in situations where it does not matter or could even be harmful to performances (e.g. block producers). It also strikes the right balance between flexibility and intrusiveness as it only requires threading a simple function from the top-level of the call stack down to applyBlock.

So, we introduce the following continuation:

Monad m => AuxLedgerEvent l -> m ()

Where m is eventually IO and AuxLedgerEvent l a multi-era ledger event GADT from the cardano-ledger. From the consensus standpoint, this bubbles up to the NodeArgs record object into a new field:

rnHandleLedgerEvent :: AuxLedgerEvent (ExtLedgerState blk) -> m ()

The NodeArgs is ultimately created by the cardano-node wrapper, where the consensus, ledger, networking and plutus layers get stitched together.

Cardano-node event hook

From there, it becomes straightforward to expose the ledger events in the cardano-node directly by hooking into handleNodeWithTracers and runNode.

At the moment, our proof of concept mounts a superficial server listening on a TCP socket and streaming all events back to a client application. All events are indexed by the slot number and the header hash of the block which "produced" them. This index is necessary to allow clients to detect and deal with chain switches (a.k.a. rollbacks). With such an interface, filtering and aggregating events is thus the client's responsibility.

To make the event handler optional, we've integrated the handler up inside the cardano-node command line with a new optional flag, --ledger-event-handler TCP/PORT. When present, the option should point to a port that the cardano-node will bind to and stream events. If omitted, we fall back to the original behaviour of the node, which is to ignore all ledger events.

Ledger event serialization

We have to settle on a transport format to write events to a socket/file and stream them to a client. Hence, we've opted for CBOR, which is already pervasive across the Cardano interfaces and suitable for networking serialization. We have, therefore, added CBOR encoders and decoders for most ledger events. This currently lives in a fork of the cardano-node with the intention to eventually live in the cardano-ledger itself.

Note

To ease the manipulation of the ledger events, we map them into a custom, era-independent, LedgerEvent data-type. However, we preserve the era during serialization since it is known from the BlockType GADTs coming with each block.

https://github.com/CardanoSolutions/cardano-node/blob/a01ee644a8eb25ac027ca3a7b625ea9584fcd2f2/cardano-node/src/Cardano/Node/LedgerEvent.hs#L733-L742

The codec Version is determined from the block's era and serialized with the event to allow clients to deserialize events accordingly.

We have documented the on-the-wire format of all the mapped events using a CDDL specification. We've added descriptions and details from the (outdated) LedgerEvents.md document from the cardano-ledger, as well as from our understanding from reading the ledger rules and asking questions around. We invite the ledger team to review that document.

Use-cases

Explorers: historical rewards

Many tax jurisdictions require Ada holders to keep track of their rewards. As we explained earlier, this currently limits the options in tooling to cardano-db-sync, which introduces a massive cost in resources on any machine. Most commercial laptops cannot afford to run both the cardano-node and cardano-db-sync due to their resource usage. For 3rd party services which provide such access, the cost of running this infrastructure is also significant. It also adds complexity and increases the probability of faults. There have been several reports of bugs and inaccuracy in the past due to the logic duplication occurring between the cardano-ledger and cardano-db-sync.

By streaming events directly from the node, we reduce complexity, resource usage, and the risk of 'getting it wrong'. The Cardano Foundation is particularly interested in this feature to power its new explorer and a number of similar use cases to help onboard financial institutions.

Mithril: Stake distribution & more

The Mithril signature depends on the knowledge of the stake distribution, which is currently acquired through the cardano-cli relevant command. However, as a mithril-signer or aggregator needs access to a cardano-node anyway to know what data to sign, it would make sense for it to interact with the node in a more direct way, e.g using the mini-protocols.

Being notified of the stake distribution changes immediately, ideally through a mini-protocol delivering those events, would streamline the integration between mithril and Cardano.

Moreover, while mithril-signers currently sign the full node DB only, providing other data "certified" by a Mithril certificate, like Rewards, is certainly planned.

Long-term vision

With this first proof-of-concept, we've been able to stream events from the ledger to a file and re-compute the entire reward history of some stake credentials using only ledger events. This current proof-of-concept has a few limitations, which we acknowledge:

  1. It only allows a single client connection.
  2. It doesn't allow clients to resume synchronization from an arbitrary point. It continues where the ledger is at.
  3. From (2) means that if the client loses the connection, it may miss events.

While it is sufficient to demonstrate the capabilities, we envision this becoming a more robust mini-protocol akin to the local-chain-sync protocol. Implementing such a protocol requires the node to store all ledger events and be able to rapidly search for them using chain points (block header hash + slot number). A chunk-based approach with indexes similar to the one used for the immutable and volatile db could work nicely.

This idea is similar to the one proposed in CIP-0078, though we propose to make an entirely new protocol for the following reasons:

  1. It's decoupled from the local-chain-sync protocol, meaning we don't need to make any existing interface more complex. The added complexity can be segregated out.
  2. It's nicely complementary to the local-chain-sync. Indeed, the latter is akin to a command-based interface (pushing to clients the outcome of observing an event), whereas the local-event-sync (or whatever it is called) would offer an event-based interface to let clients decide how they intend to process those events.

Note

The long-term vision is something up for discussion. The current PR has yet to intend to introduce any of this. Currently, we are solely interested in the continuation described earlier.


cc @abailly-iohk @koslambrou

KtorZ and others added 9 commits September 20, 2023 10:16
  Now, needs to be threaded down to 'addBlockSync' and up back to
  'runWith' to be usable from the node.

  The goal for now is simply to print ledger events on the console on a
  running node.
…tack

  - I've skipped 'chainSelectionForFutureBlocks' and 'addBlockAsync' which seem not relevant to the use case of streaming events down to clients. Those functions are used in anticipation when preparing blocks to apply from the mempool but should likely not lead to any event notification.
  - Similarly, the handler is set to 'const $ pure ()' on initialization functions which are simply replaying the database.
  So that we can use the newest 'local' ouroboros-consensus-diffusion for building the node, at the same time as our new patch consensus.
This is to prepare adding arguments to index events by block hash/slot number
@KtorZ KtorZ requested a review from a team as a code owner October 4, 2023 16:01
@@ -165,6 +169,17 @@ class ( -- Requirements on the ledger state itself
applyChainTick :: IsLedger l => LedgerCfg l -> SlotNo -> l -> Ticked l
applyChainTick = lrResult ..: applyChainTickLedgerResult


-- | Handler for ledger events
newtype LedgerEventHandler m l =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very much seems like a Tracer. Would you object to using the Tracer interface for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is l somehow more convenient in your use case than the more primary blk? EG you could have blk -> AuxLedgerEvent (LedgerState blk) -> m () (or maybe it's ExtLedgerState).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracer conveys the wrong semantic, doesn't it? I agree that the interface is similar but they are used for vastly different things (although you could simple trace events as well, true).

Regarding the l parameter, I recall we had to stick to l because some functions in the consensus are really abstract and don't even have blk in context.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is l somehow more convenient in your use case than the more primary blk?

If we replace l with blk, then the line (lrEvents result) won't compile because it needs the type parameter l and not blk.

@@ -85,10 +87,11 @@ withDB
, ConvertRawHash blk
, SerialiseDiskConstraints blk
)
=> ChainDbArgs Identity m blk
=> LedgerEventHandler m (ExtLedgerState blk)
-> ChainDbArgs Identity m blk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the LedgerEventHandler to the ChainDbArgs instead of passing it alongside them?

ApplyVal b -> do
result <- either (throwLedgerError db (blockRealPoint b)) return $ runExcept $
tickThenApplyLedgerResult cfg b l
forM_ (lrEvents result) $
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that this is the only place you're actually invoking the given event handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Though not all call to applyBlock will have an handler. In some situations, like when blocks are re-applied after a restart or, when blocks are applied to our own local chain, we pass down discardEvent to avoid yielding unnecessary events.

And we'd love your insights on that. The idea is to only yield event once for each block - modulo rollbacks.

@@ -180,6 +181,9 @@ data RunNodeArgs m addrNTN addrNTC blk (p2p :: Diffusion.P2P) = RunNodeArgs {

-- | Network PeerSharing miniprotocol willingness flag
, rnPeerSharing :: PeerSharing

-- | An event handler to trigger custom action when ledger events are emitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This current diff is invoking this whenever chain selection validates a block. It's unclear to me that that is the most useful thing to do.

As such, would you reword and expand this comment to specify (in plain terms that your anticipated users of this feature would be likely to correctly interpret---not just eg Consensus team members) exactly which ledger events will be passed to this handler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I would want to double-check whether the events for blocks on A be emitted twice times if a node switches from A to B and then back to an extension of A. And is that the desired behavior?

Does the client need to be informed of blocks whose events were previously emitted being rolled back? Or are they suppose to track that on their own, based on the emitted hashes and slots?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they definitely need events for the volatile blocks? Or could you instead side-step these questions by only even emitting events for a block when it becomes part of the immutable chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfrisby This current diff is invoking this whenever chain selection validates a block.

Correct, although the validation is what's driving the addition of blocks to the volatile db. The indented behavior here really is to emit an event when a block is added to our local db.

@nfrisby exactly which ledger events will be passed to this handler?

Not sure to understand the question. This will pass ALL ledger events as emitted by the ledger. What events depend on the era and the relative time within the epoch. Are you asking to list all those in the comments here? (at the risk of creating discrepancy as soon as the ledger adds / removes events?) -- We could perhaps link to the relevant section of the ledger?

@nfrisby For example, I would want to double-check whether the events for blocks on A be emitted twice times if a node switches from A to B and then back to an extension of A. And is that the desired behavior?

That is the desired behavior. In this model, we let clients cope with this and that's why events are emitted alongside a block header hash and a slot number. With these added information, clients should be able to figure out by themselves that a chain switch has occurred and that they need to rollback some of their internal states.

@nfrisby Do they definitely need events for the volatile blocks? Or could you instead side-step these questions by only even emitting events for a block when it becomes part of the immutable chain?

That's a good question which can only be answered by use-cases I believe. Only relying on the immutable means that we are always lagging ~18h in the past. This may or may not sufficient depending on the use case (and I need to give it some thoughts for my own use case 🤔 ...). I believe that emitting events for volatile blocks is the most flexible. If clients need the immutability, they can wait for k blocks. The opposite isn't possible if we only emit events for immutable blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly which ledger events will be passed to this handler?

Not sure to understand the question.

I meant the ledger events from "exactly which" blocks will be passed to the handler. EG you could say "every block the Consensus Layer chooses to validate". However, that's probably too opaque for the end-user. So how to describe it more clearly than that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dispatching events to the consumer at the point where chain selection occurs is the behaviour we want. In this current design, we want something that's minimally intrusive and therefore lay the burden of storing, indexing, and managing events to the client, providing them "all" the relevant information, eg. the block/slot and the event itself.
In any case, clients of a blockchain need to be able to understand and correctly handle rollbacks and forks. The block/slot index makes it easy to correlate the data of a block as provided by ChainSync and the events this block generated.
Not sure if this answers your questions @nfrisby :)

Copy link
Contributor

@nfrisby nfrisby Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. I can imagine the intended spec is something along the lines:

An event handler that will be called with the ledger events arising from the validation of a block that was ever one of the k latest blocks on the node's selection. Note that this means the stream of ledger events will be affected by roll backs, and that it will exclude events from blocks that were older than k (eg the entire historical chain) when the node started.

At the moment, though it will currently (with this diff) also include events from blocks that were never (yet) selected, since currently the node does validate blocks from the near-future even before they are selectable 😬 (that's something we're working to stop).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfrisby we've been quite careful to not wire in the event handler in functions like chainSelectionForFutureBlocks and addBlockAsync to avoid precisely what you're describing.

Is this "pointless" in the sense that the selection of future blocks will still happen through other execution paths?

Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the unbundled comments above; I wasn't expecting to do a full pass, but I did.

Just to clarify: I have not yet considered whether this seems like an appropriate overall architecture. On the other hand, maybe that ship has already sailed, given the existence of your PRs in this repo and others. I still will need to discuss it with the Consensus team and other architects, etc.

Perhaps we can discuss it in a Consensus office hours? cc @KtorZ @abailly-iohk @dcoutts @dnadales

@@ -165,6 +169,24 @@ class ( -- Requirements on the ledger state itself
applyChainTick :: IsLedger l => LedgerCfg l -> SlotNo -> l -> Ticked l
applyChainTick = lrResult ..: applyChainTickLedgerResult

-- | Handler for ledger events
newtype LedgerEventHandler m l blk =
Copy link

@koslambrou koslambrou Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KtorZ Had to add the additional blk in order to get the previous block header hash. Doesn't seem like we can achieve it with just l.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd. Why can't the previous header hash simply be a HeaderHash l? There's no structural or semantic difference between the current header hash and the previous one 🤨

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no structural or semantic difference between the current header hash and the previous one

There is a structural difference: the previous hash might be Genesis instead of actually a block's hash. (Though you could collapse that case down to the genesis block's hash as a block's hash, but that's not a built-in behavior we do ubiquitously/automatically.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Correct. Bummer.

…clients can now if a rollback occurred, or if they missed an event.
-> HeaderHash l -- Block header hash of the applied block
-> SlotNo -- Slot number of the applied block
-> BlockNo -- Applied block number
-> [AuxLedgerEvent l] -- Resulting 'AuxLedgerEvent's after applying `applyBlock`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KtorZ Changed it to a list so that clients and know if they missed an event or if a rollback occured.

@ghost
Copy link

ghost commented Oct 4, 2023

@nfrisby Thanks a lot for such a quick review! The ship hasn't actually sailed and this is really preliminary work to gather feedback and insights from consensus and node experts. As pointed out by Matthias, the way to actually shovel that information towards clients through the node is an open space: We choose a very simple solution that works for a simple use case, there might be a lot of other options to consider and experiment with (have an EventsDB for example).

:: ChainHash blk -- Previous block header hash
-> HeaderHash l -- Block header hash of the applied block
-> SlotNo -- Slot number of the applied block
-> BlockNo -- Applied block number
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming crowded here. I believe that we can replace SlotNo -> BlockNo -> HeaderHash l by a Tip blk

@@ -180,6 +181,9 @@ data RunNodeArgs m addrNTN addrNTC blk (p2p :: Diffusion.P2P) = RunNodeArgs {

-- | Network PeerSharing miniprotocol willingness flag
, rnPeerSharing :: PeerSharing

-- | An event handler to trigger custom action when ledger events are emitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the long-term plan. I think this data flow would be unnecessary if the the ChainDB instead stored the ledger events alongside each block. Then those events could simply be served alongside the (potentially immutable) blocks (and rollback messages!) by the local ChainSync client.

But that would of course require the additional on-disk database that the PR description posited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wonder of the effort should be spent there instead of on what currently seems to me to be a stop-gap measure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what currently seems to me to be a stop-gap measure.

What do you mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a dedicated database alongside the ImmutableDB and friends would be desirable as it would remove the need for clients to track themselves the data, the proposed solution has the advantage of the simplicity and not adding an additional responsibility on top of an already pretty busy consensus.
In practice, we thought a client that crashes or wants to catch up could simply reset the node before the desired point and let the synchronisation magic happens. Doing it manually is error prone but relatively straightforward, of course one would want to have a tool for that but this is orthogonal to streaming the events.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And having this solution in place does not preclude working on improvements, eg. storing the events and serving them through a custom protocol or alongside the ChainSync

@dcoutts
Copy link
Contributor

dcoutts commented Oct 11, 2023

While it is sufficient to demonstrate the capabilities, we envision this becoming a more robust mini-protocol akin to the local-chain-sync protocol. Implementing such a protocol requires the node to store all ledger events[...]

Right, events must be stored if we want a design that does not involve clients computing them.

There's then two approaches to that:

  1. make the node do it
  2. have another client do it

The advantage of 1 is that it avoids ever having two copies of the ledger state (currently in memory, but in future on disk). The disadvantage of 1 is that it goes against one of the general design principles of the node: have the node only compute and store what it needs.

The advantage of 2 is that it does go with the existing node design principles, where the node provides the raw data but other clients provide the data in other ways that are more useful to applications. So that's things like db-sync, other indexers, ogmios and other event APIs.

@KtorZ
Copy link
Contributor Author

KtorZ commented Oct 11, 2023

@dcoutts I'd honestly be okay with that responsibility falling on a client if-and-only-if there are easy ways for a client to rollback the node to a specific state. This becomes necessary in rare situations where connections are lost and one need to recover some lost events.

At the moment, this is only achievable by manually removing .chunk files from the immutable/volatile databases (😶 ... ). This isn't the most convenient. I think it's reasonable to assume that a client storing events would know where it needs to starts over (as a chain point). All that's missing is the node to make it easy to restart from any given point.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 12, 2023

@KtorZ if a dedicated client is doing it, i.e. storing all the events and making them available to other applications (e.g. something like Ogmios) then there's no need for any changes in the node at all. It is already possible to follow the chain with a ledger state, as cardano-db-sync demonstrates. (Yes, it should be easier to write such applications using the Cardano API, but that's another story.)

So imagine a client with a combo of features from db-sync and Ogmios:

  • it follows the chain itself, with its own copy of the ledger state (like db-sync)
  • it can start after the node, and catch up from being offline (like db-sync)
  • it serves other clients a chain and ledger events in a suitable format (like Ogmios)

This is possible today, without any node changes (though it could be easier with some Cardano API improvements).

There key thing is don't insist that there is only ever one copy of the ledger state. Allow this dedicated client to have its own copy. Then there's no tricky synchronisation problems or extra features needed from the node, or need to wind back or anything.

@KtorZ
Copy link
Contributor Author

KtorZ commented Oct 12, 2023

@dcoutts It is already possible to follow the chain with a ledger state, as cardano-db-sync demonstrates. (Yes, it should be easier to write such applications using the Cardano API, but that's another story.)

It is possible but unpractical (hence the current PR) as clients must hold on the ledger state and keep a (multiple GBs) copy of it in memory (surely UTxO on-disk storage will alleviate that resource constraint, but will come at a cost of much more implementation complexity).

@dcoutts This is possible today, without any node changes

It is, but the developer experience is atrocious. What you're describing is the current state of affairs which we deem unsatisfactory. The current situation (having a client follow the chain and replicate the ledger state) is precisely why we seek alternative solutions.

@dcoutts There key thing is don't insist that there is only ever one copy of the ledger state. Allow this dedicated client to have its own copy.

If it were easy to build such a copy without the cardano-ledger libraries, then I could hear the argument. Nevertheless, as it stands, the specs are insufficient to build another valid implementation. And even with complete and precise specs, this is a costly and cumbersome endeavor which we generally do not want to consider lightly.

So what you're describing is only possible for clients that would be written in Haskell, which excludes a large part of the ecosystem.

Rather than re-doing the ledger calculations (might it be in Haskell or not), it seems to me that working out an API to make ledger events accessible is a relatively low-hanging fruit. As demonstrated by this PR, a couple of days of work suffices. And allowing the node to rollback its volatile/immutable db should be quite straightforward as well.

Surely bundling that as a full mini-protocol is more work. But this isn't what we're asking here.

@ghost
Copy link

ghost commented Oct 12, 2023

what you're describing is only possible for clients that would be written in Haskell, which excludes a large part of the ecosystem.

I think that's a key point here, thanks for emphasizing it @KtorZ! And all the other points of course :)

This proposed change is a first step, the beginning of a journey towards making the cardano data more open. It's a very minimal change to the consensus, which might or might not lead to changes in the mini-protocols, in the node, in specialised versions of the node, in clients...

@dcoutts
Copy link
Contributor

dcoutts commented Oct 12, 2023

So what you're describing is only possible for clients that would be written in Haskell, which excludes a large part of the ecosystem.

You're missing my point. I also agree that applications need to be in any language the author chooses.

The architecture is intended to look like this:

node -> api adaptor -> application

API adaptors here include things like db-sync and Ogmios. Yes it is intended that these adaptors be written in Haskell (using the appropriate Cardano libraries), and provide language neutral APIs to other applications.

So I'm not proposing you write all your applications in Haskell. I am proposing that this adaptor be written in Haskell (just as it would be if the feature were somehow integrated in the node).

@dcoutts
Copy link
Contributor

dcoutts commented Oct 12, 2023

This proposed change is a first step

Note that it does not get one any closer to the desired design, including the design proposed in this ticket.

While it is sufficient to demonstrate the capabilities, we envision this becoming a more robust mini-protocol akin to the local-chain-sync protocol. Implementing such a protocol requires the node to store all ledger events and be able to rapidly search for them using chain points (block header hash + slot number).

This design idea does not need this continuation PR, it needs something different and rather larger. Or the client based approach I propose needs no change in the node at all (but would benefit from improvements to the Cardano API instead).

@KtorZ
Copy link
Contributor Author

KtorZ commented Oct 12, 2023

@dcoutts API adaptors here include things like db-sync and Ogmios.

This echoes what I wrote in introduction of this PR, and why I think the current status quo isn't satisfactory:

@KtorZ: This currently limits the options in tooling to cardano-db-sync, which introduces a massive cost in resources on any machine. Most commercial laptops cannot afford to run both the cardano-node and cardano-db-sync due to their resource usage. For 3rd party services which provide such access, the cost of running this infrastructure is also significant. It also adds complexity and increases the probability of faults.

Surely we can write another adaptor trying to solve some of those problems, but I'd rather not to. Even cardano-db-sync is moving away from that with the recent introduction of a "no-ledger-state" mode.

@dcoutts: The architecture is intended to look like this: node -> api adaptor -> application

Perhaps then I am simply challenging that vision. I do see a world where we also have many: node -> application. This is actually happening. The node-to-node and node-to-client network interfaces have already been (partially) re-implemented:

Point being that people do not want to work with a middleware. It usually adds cost and complexity (more things can go wrong). Since the community of builders is already walking down that path, why not make it easier for people to harness what's currently available -- especially when they are relatively low hanging fruits.

@dcoutts: Note that it does not get one any closer to the desired design, including the design proposed in this ticket.

I concede that this doesn't get us closer to the "long-term vision" (which we should probably call an "alternative vision") from a design perspective. But it does one important thing: it makes ledger events available so that client applications can start relying on them.

Whether they get the events via an event stream from a socket or through a more sophisticated mini-protocol doesn't invalidate that. Clients can start thinking in terms of what's now possible to build with these instead of being forced into one particular design (i.e. cardano-db-sync).

@dcoutts: This design idea does not need this continuation PR

Correct. This PR is an intermediate solution that far easier to implement than the mini-protocol approach. It has the main benefit of being non-intrusive and easily turned off in the case of block producers. The reason this PR exists is because there are far less discussions needed to approve it than what it would take to design a complete mini-protocol. So its chances of being accepted are high(er).

Now, if we ever get to make a move on the mini-protocol idea, I am more than happy to come, remove and clean up whatever this PR introduces. Pinky promise.

@koslambrou
Copy link

koslambrou commented Oct 12, 2023

Or the client based approach I propose needs no change in the node at all (but would benefit from improvements to the Cardano API instead).

But that requires the client to run a process which uses more that ~10GB of RAM. That's exactly what we're trying to avoid. If the counter argument is that we'll eventually have the LedgerState computed stored on disk, then the question is: when? My guess is that it's going to take a while, and we need a solution now, and this solution provides a quick way to for clients to get the info.

Like @KtorZ said, once LedgerState doesn't use so much memory, we can always come back and delete what this PR introduced.

The disadvantage of 1 is that it goes against one of the general design principles of the node: have the node only compute and store what it needs.

To clarify, are you referring only to cardano-node, or are you also including ouroboros-consensus? You can imagine users building their own nodes with different capabilities using what is provided by ouroboros-consensus. Then no changes would actually be required in cardano-node.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 13, 2023

Lets try and classify designs here. There's three:

  1. the "intermediate" one proposed in this PR where ledger events are streamed live to one client
  2. the "node storage" one, where the node stores all ledger events and can provide them to any client
  3. the "client storage" one where a dedicated node client stores ledger events and can provide them to any other client

Now I argue that design 1 is not something that people will actually want to use, and nor is is something the node team would want to support. It is not nearly as useful as it looks: there's no way to get events for old blocks, so a client cannot start late or reconnect. There's no sensible way to synchronise the client with the node. It also means the node is limited by the speed at which the slowest client can consume the events. This is different to how the local chain sync works which does not block the progress of the node growing its chain, irrespective of how slow or late clients consume blocks. Trying to run a node in this mode means it is really being run as a client, and cannot follow Ouroboros in timely manner.

More generally, these problems are not specific to the proposed intermediate design. As far as I can see any design that tries to have just a single copy of the ledger state is going to have these problems. It means the ledger state (and thus progress of the chain) has to be synchronised between the node and a client.

Yes it would be nice to avoid needing two copies of the ledger state, but I think that's inevitable for any kind of robust design.

But that requires the client to run a process which uses more that ~10GB of RAM. That's exactly what we're trying to avoid. If the counter argument is that we'll eventually have the LedgerState computed stored on disk, then the question is: when? My guess is that it's going to take a while, and we need a solution now, and this solution provides a quick way to for clients to get the info.

The on-disk storage project has a branch ready for the node that stores the UTxO on disk, and which passes benchmarks for the in-memory backend. We're waiting for system level benchmarks for the on-disk backend.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 13, 2023

Point being that people do not want to work with a middleware. It usually adds cost and complexity (more things can go wrong). Since the community of builders is already walking down that path, why not make it easier for people to harness what's currently available -- especially when they are relatively low hanging fruits.

@KtorZ as I'm sure you know, the node is complex and has high maintenance costs. Without middleware we would have to integrate all those middleware features into the node itself. That's not a sustainable architecture, the complexity and resource costs would be unacceptable.

It's fine not to use middleware where what the node can sensibly provide natively is sufficient for applications (i.e. just block streaming), but for extra things like indexing, or indeed storing and providing ledger events, the sensible thing is to use middleware.

The real problem, imho, is that we've not put enough effort into making it easy to develop such middleware. It should be easy to write such things using the Cardano API, but it's currently much harder than it needs to be. We've got a proof of concept with foldBlocks, but we need support for snapshots and re-syncronising when a client starts up (i.e. what cardano-db-sync does, but with far too much extra code).

@dcoutts
Copy link
Contributor

dcoutts commented Oct 13, 2023

BTW, how does the intermediate design support switching forks? With local chain sync, that's built in, but with live streaming? 😬

@nfrisby
Copy link
Contributor

nfrisby commented Oct 13, 2023

@dcoutts Thanks for enaging in the above discussion.

Regarding your latest question:

BTW, how does the intermediate design support switching forks? With local chain sync, that's built in, but with live streaming? 😬

This is something we did discuss during the Consensus Office Hours yesterday.

I have an action item to write the summary as comment on this PR, but I've been preparing for my flight to Paris that departs in a few hours, so that will hopefully happen before Monday.

@nfrisby nfrisby mentioned this pull request Oct 15, 2023
@nfrisby
Copy link
Contributor

nfrisby commented Oct 15, 2023

My notes summarizing our Consensus Office Hours call on Oct 12 were far too big for a GitHub comment, and deserve review.

So I opened a dedicated Draft PR for them: #440

Edit: we combined that PR into this one for simplicity; see the pr-402.md file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants