-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR-040: Storage and SMT State Commitments #8430
Changes from 11 commits
11728cf
662ec91
5fdbe5d
fa8e9e3
864927e
78215b2
6dd0323
250b5ff
374916f
e90bf8a
8602b3e
ca39df5
aedce21
f704279
06d1952
7537c84
1cc123e
d321dac
80d0122
962a28b
bb89798
19d2126
356f987
42e7f08
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
# ADR 040: Storage and SMT State Commitments | ||
|
||
## Changelog | ||
|
||
- 2020-01-15: Draft | ||
|
||
## Status | ||
|
||
DRAFT Not Implemented | ||
|
||
|
||
## Abstract | ||
|
||
Sparse Merke Tree (SMT) is a version of a Merkle Tree with various storage and performance optimizations. This ADR defines a separation of state commitments from data storage and the SDK transition from IAVL to SMT. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## Context | ||
|
||
Currently, Cosmos SDK uses IAVL for both state commitments and data storage. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would define what state commitments are and how it differs from data storage. It can be concise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't it self explaining? State commitment is a commitment to a state. I can add a link to explain more general commitment schemes. |
||
|
||
IAVL has effectively become an orphaned project within the Cosmos ecosystem and it's proven to be an inefficient state commitment. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
In the current design, IAVL is used for both data storage and as a Merkle Tree for state commitments. IAVL is meant to be a standalone Merkelized key/value database, however it's using a KV DB engine to store all tree nodes. So, each node is stored in a separate record in the KV DB. This causes many inefficiencies and problems: | ||
|
||
+ Each object select requires a tree traversal from the root | ||
+ Each edge traversal requires a DB query (nodes are not stored in a memory) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure about this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when traversing, we a tree we are always doing a DB query. However subsequent queries are cached on SDK level, not the IAVL level. I can add that calcification. |
||
+ Creating snapshots is [expensive](https://github.com/cosmos/cosmos-sdk/issues/7215#issuecomment-684804950). It takes about 30 seconds to export less than 100 MB of state (as of March 2020). | ||
+ Updates in IAVL may trigger tree reorganization and possible O(log(n)) hashes re-computation, which can become a CPU bottleneck. | ||
+ The leaf structure is pretty expensive: it contains the `(key, value)` pair, additional metadata such as height, version. The entire node is hashed, and that hash is used as the key in the underlying database, [ref](https://github.com/cosmos/iavl/blob/master/docs/node/node.md | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please elaborate on why it's "expensive". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It contains lot of data, which is not needed in the new structure. We don't really need the metadata in the new structure. |
||
). | ||
|
||
|
||
Moreover, the IAVL project lacks support and a maintainer and we already see better and well-established alternatives. Instead of optimizing the IAVL, we are looking into other solutions for both storage and state commitments. | ||
|
||
|
||
## Decision | ||
|
||
We propose separate the concerns of state commitment (**SC**), needed for consensus, and state storage (**SS**), needed for state machine. Finally we replace IAVL with [LazyLedger SMT](https://github.com/lazyledger/smt). LazyLedger SMT is based on Diem (called jellyfish) design [*] - it uses a compute-optimised SMT by replacing subtrees with only default values with a single node (same approach is used by Ethereum2 as well). | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Decouple state commitment from storage | ||
|
||
Separation of storage and commitment (by the SMT) will allow to optimize the different components according to their usage and access patterns. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
SMT will use it's own storage (could use the same database underneath) from the state machine store. For every `(key, value)` pair, the SMT will store `hash(key)` in a path and `hash(key, value)` in a leaf. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible for us to apply these changes to the IAVL implementation which would remove the state duplication from the implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of scope of the refactor. It can be done after this upgrade has been completed and if someone asks for it, otherwise we would look at archiving IAVL There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with @marbar3778 . IAVL has other drawbacks, and no point to update it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be really great to understand better why we're storing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, in the design I put forth in #9158 and #9156, I was thinking we might store There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what we are doing. Modules don't even know if there is a merkle tree, and what goes into the merkle tree. Modules only use a generic KVStore interface as it's done today (with caching and key prefixing). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We want to bind a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm more or less expressing my desire for two methods on The reason I mention proto JSON is because for Rather than specifying this at the framework level, my solution would be for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This ADR is not about introducing a storage for data not being part of the state commitment. The reason we have 2 data store (SS and SC) under the hood is for efficiency and was inspired by turbo geth. In other words, in this design we have only one external store, which commits and queries committed data. Under the hood, it uses 2 DBs for efficiency. Support storage (eg module off chain store) or indexers are out of the scope and are not part of the committed state. We could implement an extension store which will use the state commit store (this ADR) in some way (eg: kind of a subtree, or polynomial commitment). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why storage should deal with additional logic (eg what's the data type), rather then bytes. If client want's to save data using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add notes about off-chain store to Further Discussion section. |
||
|
||
For data access we propose 2 additional KV buckets: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is a KV bucket here? this may be nomenclature I am not familiar with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some KV databases use buckets for creating different databases under the same server / engine. Postgresql will call it databases (you can have multiple databases in single Postgresql instance). RocksDB calls it column family. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you post this link with a small explainer. The current explainer doesn't explain, it just throws a sentence into the mix |
||
1. B1: `key → value`: the principal object storage, used by a state machine, behind the SDK `KVStore` interface: provides direct access by key and allows prefix iteration (KV DB backend must support it). | ||
2. B2: `hash(key, value) → key`: an index needed to extract a value (through: B2 -> B1) having a only a Merkle Path. Recall that SMT will store `hash(key, value)` in it's leafs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the need for the reverse index? I'm just wondering what the use case is. I'm imagining mostly we will have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't have it then you will always need to know a |
||
3. we could use more buckets to optimize the app usage if needed. | ||
|
||
Above, we propose to use KV DB. However, for the state machine, we could use an RDBMS, which we discuss below. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Requirements | ||
|
||
State Storage requirements: | ||
+ range queries | ||
+ quick (key, value) access | ||
+ creating a snapshot | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+ prunning (garbage collection) | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
State Commitment requirements: | ||
+ fast updates | ||
+ path length should be short | ||
+ creating a snapshot | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+ pruning (garbage collection) | ||
|
||
|
||
### LazyLedger SMT for State Commitment | ||
|
||
A Sparse Merkle tree is based on the idea of a complete Merkle tree of an intractable size. The assumption here is that as the size of the tree is intractable, there would only be a few leaf nodes with valid data blocks relative to the tree size, rendering the tree as sparse. | ||
|
||
|
||
### Snapshots | ||
|
||
One of the Stargate core features are snapshots and fast sync delivered in the `/snapshot` package. Currently this feature is implemented through IAVL. | ||
Many underlying DB engines support snapshotting. Hence, we propose to reuse that functionality and limit the supported DB engines to ones which support snapshots (Badger, RocksDB, ...) using a _copy on write_ mechanism (we can't create a full copy - it would be too big). | ||
|
||
New snapshot will be created in every `EndBlocker`. The number of snapshots should be configurable by user (eg: 100 past blocks and one snapshot every 100 blocks for past 2000 blocks). | ||
|
||
Pruning old snapshots is effectively done by DB. If DB allows to configure max number of snapshots, then we are done. Otherwise, we need to hook this mechanism into `EndBlocker`. | ||
|
||
### Versioning | ||
|
||
At minimum SC doesn't need to keep old versions. However we need to be able to process transactions and roll-back state updates if transaction fails. This can be done in the following way: during transaction processing, we keep all state change requests (writes) in a `CacheWrapper` abstraction (as it's done today). Only when we commit on a root store, all changes are written to the the SMT. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We can use the same approach for SM Storage. | ||
|
||
#### Accessing old, committed state versions | ||
|
||
One of the functional requirements is to access old state. This is done with `abci.Query` structure. The version is specified by a block height (so we query for an object by key `K` at a version committed in block height `H`). The number of old versions supported for `abci.Query` is configurable. Moreover, SDK could provide a way to directly access the state. However, a state machines shouldn't do that - since the number of snapshots is configurable, it would lead to a not deterministic execution. | ||
|
||
We validated the Snapshot mechanism for querying old state versions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How was this validated - any further reading/links? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We tested it. You can read more about it here: #8297 (comment) |
||
|
||
Pruning custom versions could be done using a Garbage Collector: once per defined period, a GC will start, and remove old snapshots. This will require encoding a version mechanism in a KV store. | ||
|
||
|
||
### Managing versions and pruning | ||
|
||
Number of historical versions for `abci.Query` and snapshots for fast sync is part of a node configuration, not a chain configuration. | ||
tac0turtle marked this conversation as resolved.
Show resolved
Hide resolved
|
||
As outlined above, snapshot and versioning feature is fully offloaded to the underlying DB engine. However, we still need to have a process to instrument the DB engine to create or remove a version. | ||
The `rootmulti.Store` keeps track of the version number. The `Store.Commit` function increments the version on each call, and checks if it needs to remove old versions. We need to add support for not `IAVL` store types there. | ||
|
||
NOTE: `Commit` must be called exactly once per block. Otherwise we risk going out of sync for the version number and block height. | ||
|
||
TODO: It seams we don't need to update the `MultiStore` interface - it encapsulates a `Commiter` interface, which has the `Commit`, `SetPruning`, `GetPruning` functions. However, we may consider splitting that interface into `Committer` and `PrunningCommiter` - only the multiroot should implement `PrunningCommiter`. | ||
|
||
|
||
## Consequences | ||
|
||
|
||
### Backwards Compatibility | ||
|
||
This ADR doesn't introduce any SDK level API changes. | ||
|
||
We change a storage layout, so storage migration and a blockchain reboot is required. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Positive | ||
|
||
+ Decoupling state from state commitment introduce better engineering opportunities for further optimizations and better storage patterns. | ||
+ Performance improvements. | ||
+ Joining SMT based camp which has wider and proven adoption than IAVL. Example projects which decided on SMT: Ethereum2, Diem (Libra), Trillan, Tezos, LazyLedger. | ||
|
||
### Negative | ||
|
||
+ Storage migration | ||
+ LL SMT doesn't support pruning - we will need to add and test that functionality. | ||
|
||
### Neutral | ||
|
||
+ Deprecating IAVL, which is one of the core proposals of Cosmos Whitepaper. | ||
tac0turtle marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## Further Discussions | ||
|
||
### RDBMS | ||
|
||
Use of RDBMS instead of simple KV store for state. Use of RDBMS will require an SDK API breaking change (`KVStore` interface), will allow better data extraction and indexing solutions. Instead of saving an object as a single blob of bytes, we could save it as record in a table in the state storage layer, and as a `hash(key, protobuf(object))` in the SMT as outlined above. To verify that an object registered in RDBMS is same as the one committed to SMT, one will need to load it from RDBMS, marshal using protobuf, hash and do SMT search. | ||
|
||
|
||
## References | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
+ [IAVL What's Next?](https://github.com/cosmos/cosmos-sdk/issues/7100) | ||
+ [IAVL overview](https://docs.google.com/document/d/16Z_hW2rSAmoyMENO-RlAhQjAG3mSNKsQueMnKpmcBv0/edit#heading=h.yd2th7x3o1iv) of it's state v0.15 | ||
+ [State commitments and storage report](https://paper.dropbox.com/published/State-commitments-and-storage-review--BDvA1MLwRtOx55KRihJ5xxLbBw-KeEB7eOd11pNrZvVtqUgL3h) | ||
+ [LazyLedger SMT](https://github.com/lazyledger/smt) | ||
+ Facebook Diem (Libra) SMT [design](https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf) | ||
+ [Trillian Revocation Transparency](https://github.com/google/trillian/blob/master/docs/papers/RevocationTransparency.pdf), [Trillian Verifiable Data Structures](https://github.com/google/trillian/blob/master/docs/papers/VerifiableDataStructures.pdf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this shouldn't in fact be two ADRs instead? One for separating storage and commitments and one about the SMT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about it. But they are highly related - one cannot be done without other. Hence, I'm proposing here a general design and leave a space for future ADR for RDMS which will introduce SDK breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we could separate the two with IAVL right? We don't need SMT for that AFAIK...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronc, we could describe here only SMT, but it will only a half backed idea without a working solution:
Do you have something else in mind?