-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0078? | Extended Local Chain Sync Protocol #375
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
--- | ||
CIP: 78 | ||
Title: Extended Local Chain Sync Protocol | ||
Authors: Erik de Castro Lopo <erikd@mega-nerd.com> | ||
Discussions-To: Erik de Castro Lopo <erikd@mega-nerd.com> | ||
Comments-Summary: Extend Local Chain Sync Protocol | ||
Comments-URI: | ||
Status: Draft | ||
Type: ? | ||
Created: 2022-11-15 | ||
License: CC-BY-4.0 | ||
--- | ||
## Abstract | ||
|
||
Modify the `cardano-node` (and underlying code) to provide an extended version of the existing | ||
local chain sync protocol. | ||
|
||
## Motivation | ||
|
||
Applications that provide insight into the Cardano block chain (like db-sync, exporters, Kupo, and | ||
smart contracts reacting to events) often need access to the current state of the Cardano ledger | ||
(stake distribution, reward and wallet balances etc). This information is known to the node, partly | ||
on the block chains and partly in what is referred to as ledger state (described more fully below). | ||
Extracting block chain data is relatively easy, but ledger state data is not. Currently these | ||
applications have to recreate and maintain ledger state themselves based on the block information | ||
they stream from the node over the local chain sync protocol. Recreation and maintenance of the | ||
ledger state is not only complex and a source of bugs but more importantly requires significant | ||
resources, specifically RAM. Ledger state in memory currently consumes 10 Gig of RAM and that is | ||
growing. In situations where the node and any such application run on the same machine, the machine | ||
ends up with twice the resource usage. The following proposal hopes to reduce resource usage and | ||
complexity for chain following applications like db-sync. | ||
|
||
### Current Situation | ||
|
||
Currently there is a local chain sync protocol which is really just the peer-to-peer protocol | ||
using a local domain socket rather than the TCP/IP socket normally used for P2P transport. | ||
|
||
The data transported over this local chain sync protocol is limited to block chain data. However, a | ||
Cardano node also maintains ledger state which includes: | ||
|
||
* The current UTxO state. | ||
* Current amount of ADA delegated to each stake pool. | ||
* Which stake address is currently delegated to each pool. | ||
* Rewards account balances for each stake address. | ||
* Current protocol parameters. | ||
|
||
The first of these ledger state components is by far the largest component and is probably not | ||
needed outside the node (and definitely not needed by db-sync). However the others are needed and | ||
stored by `cardano-db-sync` which gets these data sets by maintaining its own copy of ledger state | ||
and periodically extracting the parts required. | ||
|
||
This means that when `node` and `db-sync` are run on the same machine (which is the recommended | ||
configuration) that machine has two basically identical copies of ledger state. Ledger state is a | ||
*HUGE* data structure and the mainnet version currently consumes over 10 Gigabytes of memory. | ||
Furthermore, maintaining ledger state duplicates code functionality that is in `ouroboros-consensus` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An alternative solution that won't solve the duplicate use of RAM but would solve the duplicate use of code would be to pack consensus + ledger into a "Slim node" that works in read-only mode with an existing ChainDB, that would be packaged and built along side the node. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then db-sync or any client could use that "ghost node" to replay blocks from any point in time, generate events, query ledger state and what not. I am even surprised such a tool does not already exist: Is this not somewhat akin to what db-analyser do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't this idea be extended even more to solve the duplicate RAM problem? What if db-sync maintained its own ChainDB, connected to peers itself, ran ouroboros etc? That way it doesn't need a separate trusted node process, there is no replication, no need for protocol extension, deployment is greatly simplified etc. At the same time this adds no complexity to the node that @dcoutts understandably wants to avoid, as this complexity will live in the db-sync repository. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, this is an idea that popped into my head recently: Make the full node's logic available as a library instead of only as an executable. Then db-sync could just embed the node's logic, and provided there's some way to plug "decorators" here and there, could just tap on node processing stuff, or poke at the current ledger state when it needs, etc... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another alternative -- and straightforward -- solution would be to also simply "dump" information required to the calculation of implicit stuff (such as rewards) into some file as the node processes them. And either, document the format of that file or provide a thin api-layer to query its content. Having just that would solve many problems down the line; client applications will almost always be connected to a local node and that node will have to do this work anyway. So instead of duplicating that work; simply record the information when its done and gives way to access that information later on. It could be a mere .csv or .json file recording the stake distribution on each epoch transition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, ok, so you would not need to run a node on a machine running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's probably (much) more work so I'd not consider this as an alternative for this particular CIP There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Funny, in the last company I worked for, everything was written as a library first (makes testing much easier) and then thin wrapper was added to make it into an executable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is this similar to what Santiago from TxPipe proposed a while back called Dolos, a Data Node? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know, I wasn't there a while back ;) But I guess this idea is pretty much obvious and not novel so I am pretty sure other people have thought of iti. |
||
and the maintenance of ledger state has been the cause of about 90% of the bugs in `db-sync` over | ||
the last two years. The maintenance of ledger state also makes updating, running and maintaining | ||
`db-sync` by operators more difficult than it should be. Finally, if `db-sync` did not have to | ||
maintain ledger state, the size of the `db-sync` code base would probably decrease by about 50% and | ||
the bits removed are some of the most complicated parts. | ||
|
||
|
||
## Specification | ||
|
||
The proposed solution is an enhanced local chain sync protocol that would only be served over a | ||
local domain socket. The enhanced chain chain sync protocol would include the existing block chain | ||
data as well as events containing the ledger data that db-sync needs. This enhanced local chain | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has always been said to be "impossible" because the ledger does not record past events and wouldn't be able to replay any of those events that are too far in the past. Wouldn't it make sense to start by recording the events somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is effectively part of this proposal. |
||
sync protocol is useful for many applications other than just db-sync. | ||
|
||
Smart contract developers would like an application that turns block chain and ledger state changes | ||
into an event stream. With this enhanced local chain sync protocol, generating an easily consumable | ||
event stream simply requires a conversion of the binary enhanced local chain sync protocol into | ||
JSON. | ||
|
||
This enhanced local chain sync protocol is basically the data that would be provided by the | ||
proposed Scientia program (which as far as I am aware has been dropped). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this "Scientia program" about? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was a proposal put forward by John Woods during his temporary tenure at IOG. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @KtorZ The output document isn't public, but it was a user research report of what chain indexers exist, which and why people use, pros, cons etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it isn't public, I don't see why it is mentioned here 👍 |
||
|
||
The ledger state data that would provided over the extended chain sync protocol is limited to: | ||
|
||
* Per epoch stake distribution (trickle fed in a deterministic manner). | ||
* Per epoch rewards (trickle fed in a deterministic manner). | ||
* Per epoch protocol parameters (tiny so provide in a single event) | ||
* Per epoch reaped pool list (single event). | ||
* Per epoch MIR distribution (single event). | ||
* Per epoch pool deposit refunds (single event). | ||
|
||
The small bits of data above are sent as single per epoch events. Large bits of data like the epoch | ||
stake distribution map (which can have millions of entries) are tickle fed as they are calculated | ||
by the ledger. Its deterministic in that give the same ledger state LS, with the same block B, the | ||
events generated will always be identical. This is so that if the consumer is stopped and restarted | ||
and needs to rollback a block or two, the replay will be identical. | ||
|
||
|
||
## Rationale | ||
|
||
THe recommended configuration for `db-sync` is to run it on the same machine as the `node`. | ||
Currently this means that there are two copies of the *HUGE* ledger state data structure (each being | ||
at least 10G in size) on the machine. In addition, `db-sync` and other applications only need about | ||
1% of that data. The rest is | ||
|
||
|
||
## Test Cases | ||
|
||
|
||
|
||
## Implementations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A possible implementation first step would be to write a prototype funnelling existing events (or only one of them for simplicity's sake because I assume there would be need to implement serialisation) through the consensus code down to the network stack, in order to properly understand the impact of this change. |
||
|
||
|
||
## Copyright | ||
|
||
This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's little use-case for the UTxO state or the protocol parameters because these are straightforward to obtain from the chain-sync protocol itself.
However, anything that regards rewards is indeed a pain in the *** to work with; because rewards are implicit in the protocol. I'd love to see a quick justification here (instead of "basically" 😅 ...) stating just that. There's no way for a client application to provide reliable information on rewards without maintaining the entire ledger state; for one needs to know what everyone owns in order to calculate rewards snapshots on each era.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure that the protocol parameters are not available on chain, only the voting for the changes. Yes, the ledger rules can be applied to whatever voting is seen on chain but that is way more fragile than getting the parameters from the source.
Agree, that nobody in their right mind would want the UTxO state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the protocol parameters are available on chain,
NewEpochState
->EpochState
->PParams
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really nice to have a function to pull the protocol parameters out of ledger state so that I did not have to go digging around inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're in luck, I have one right here:
nesEs . esPp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call that "digging into the internals".
I am asking for an officially maintained function that is part of an official API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of a quality api is indeed an issue, but a much bigger issues imo, is the need to replicate resources outside the node, especially for ledger state data. An api cannot help there, this requires improvements on the protocol level. Api and protocol are two different topics.