Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-040: Storage and SMT State Commitments #8430

Merged
merged 24 commits into from
May 11, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
11728cf
ADR-040: Storage and SMT State Commitments
robert-zaremba Jan 25, 2021
662ec91
Update docs/architecture/adr-040-storage-and-smt-state-commitments.md
robert-zaremba Jan 26, 2021
5fdbe5d
Update docs/architecture/adr-040-storage-and-smt-state-commitments.md
robert-zaremba Jan 26, 2021
fa8e9e3
Added more details for snapshotting and pruning.
robert-zaremba Jan 26, 2021
864927e
updated links and references
robert-zaremba Jan 26, 2021
78215b2
add blockchains which already use SMT
robert-zaremba Jan 26, 2021
6dd0323
reorganize versioning and pruning
robert-zaremba Jan 27, 2021
250b5ff
Update docs/architecture/adr-040-storage-and-smt-state-commitments.md
robert-zaremba Jan 29, 2021
374916f
Update docs/architecture/adr-040-storage-and-smt-state-commitments.md
robert-zaremba Jan 29, 2021
e90bf8a
adding a paragraph about state management
robert-zaremba Jan 29, 2021
8602b3e
adr-40: update 'accessing old state' section
robert-zaremba Feb 25, 2021
ca39df5
Merge branch 'master' into robert/adr-040
robert-zaremba Apr 23, 2021
aedce21
update based on all recent discussions and validations
robert-zaremba Apr 23, 2021
f704279
adding more explanation about KV interface
robert-zaremba Apr 27, 2021
06d1952
Merge branch 'master' into robert/adr-040
robert-zaremba Apr 27, 2021
7537c84
Apply suggestions from code review
robert-zaremba Apr 28, 2021
1cc123e
Apply suggestions from code review
robert-zaremba Apr 28, 2021
d321dac
review comments
robert-zaremba Apr 28, 2021
80d0122
adding paragraph about commiting to an object without storying it
robert-zaremba Apr 28, 2021
962a28b
review updates
robert-zaremba Apr 30, 2021
bb89798
Apply suggestions from code review
robert-zaremba May 5, 2021
19d2126
review udpates
robert-zaremba May 5, 2021
356f987
adding clarification
robert-zaremba May 7, 2021
42e7f08
Merge branch 'master' into robert/adr-040
robert-zaremba May 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,4 @@ Read about the [PROCESS](./PROCESS.md).
- [ADR 037: Governance Split Votes](./adr-037-gov-split-vote.md)
- [ADR 038: State Listening](./adr-038-state-listening.md)
- [ADR 039: Epoched Staking](./adr-039-epoched-staking.md)
- [ADR 040: Storage and SMT State Commitments](./adr-040-storage-and-smt-state-commitments.md)
157 changes: 157 additions & 0 deletions docs/architecture/adr-040-storage-and-smt-state-commitments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# ADR 040: Storage and SMT State Commitments

## Changelog

- 2020-01-15: Draft

## Status

DRAFT Not Implemented


## Abstract

Sparse Merke Tree (SMT) is a version of a Merkle Tree with various storage and performance optimizations. This ADR defines a separation of state commitments from data storage and the SDK transition from IAVL to SMT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this shouldn't in fact be two ADRs instead? One for separating storage and commitments and one about the SMT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about it. But they are highly related - one cannot be done without other. Hence, I'm proposing here a general design and leave a space for future ADR for RDMS which will introduce SDK breaking changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we could separate the two with IAVL right? We don't need SMT for that AFAIK...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronc, we could describe here only SMT, but it will only a half backed idea without a working solution:

  • keeping IAVL (in it's current implementation) with anything else doesn't make sense because we double the data.
  • the main value proposition here is to not store objects in SMT (we store only hashes).

Do you have something else in mind?

robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved


## Context

Currently, Cosmos SDK uses IAVL for both state commitments and data storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would define what state commitments are and how it differs from data storage. It can be concise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it self explaining? State commitment is a commitment to a state. I can add a link to explain more general commitment schemes.


IAVL has effectively become an orphaned project within the Cosmos ecosystem and it's proven to be an inefficient state commitment.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved
In the current design, IAVL is used for both data storage and as a Merkle Tree for state commitments. IAVL is meant to be a standalone Merkelized key/value database, however it's using a KV DB engine to store all tree nodes. So, each node is stored in a separate record in the KV DB. This causes many inefficiencies and problems:

+ Each object select requires a tree traversal from the root
+ Each edge traversal requires a DB query (nodes are not stored in a memory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when traversing, we a tree we are always doing a DB query. However subsequent queries are cached on SDK level, not the IAVL level. I can add that calcification.

+ Creating snapshots is [expensive](https://github.com/cosmos/cosmos-sdk/issues/7215#issuecomment-684804950). It takes about 30 seconds to export less than 100 MB of state (as of March 2020).
+ Updates in IAVL may trigger tree reorganization and possible O(log(n)) hashes re-computation, which can become a CPU bottleneck.
+ The leaf structure is pretty expensive: it contains the `(key, value)` pair, additional metadata such as height, version. The entire node is hashed, and that hash is used as the key in the underlying database, [ref](https://github.com/cosmos/iavl/blob/master/docs/node/node.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate on why it's "expensive".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It contains lot of data, which is not needed in the new structure. We don't really need the metadata in the new structure.

).

Moreover, the IAVL project lacks support and a maintainer and we already see better and well-established alternatives. Instead of optimizing the IAVL, we are looking into other solutions for both storage and state commitments.


## Decision

We propose separate the concerns of state commitment (**SC**), needed for consensus, and state storage (**SS**), needed for state machine. Finally we replace IAVL with [LazyLedger SMT](https://github.com/lazyledger/smt). LazyLedger SMT is based on Diem (called jellyfish) design [*] - it uses a compute-optimised SMT by replacing subtrees with only default values with a single node (same approach is used by Ethereum2 as well).
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

The storage model presented here doesn't deal with data structure nor serialization. It's a Key-Value database, where both key and value are binaries. The storage user is responsible for data serialization.

### Decouple state commitment from storage


Separation of storage and commitment (by the SMT) will allow to optimize the different components according to their usage and access patterns.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

SMT will use it's own storage (could use the same database underneath) from the state machine store. For every `(key, value)` pair, the SMT will store `hash(key)` in a path (needed to evenly distribute keys in the tree) and `hash(key, value)` in a leaf (to bind the (key, value) pair stored in the `SS`). Since we don't know a structure of a value (in particular if it contains the key) we hash both the key and the value in the `SC` leaf.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't follow this first sentence. Is it using its own storage or the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous paragraphs I'm writing that we will separate state storage from state commitment. So the State commitment will have it's own storage (won't share the same namespace as the state storage). I will try to reword the paragraph to make it more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't follow this. It's hard to understand and there's a few punctuation errors. I would suggest re-wording this to make the points more clear and easy to follow.


For data access we propose 2 additional KV buckets:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a KV bucket here? this may be nomenclature I am not familiar with

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some KV databases use buckets for creating different databases under the same server / engine. Postgresql will call it databases (you can have multiple databases in single Postgresql instance). RocksDB calls it column family.
I will add few words to explain it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you post this link with a small explainer. The current explainer doesn't explain, it just throws a sentence into the mix

1. B1: `key → value`: the principal object storage, used by a state machine, behind the SDK `KVStore` interface: provides direct access by key and allows prefix iteration (KV DB backend must support it).
2. B2: `hash(key, value) → key`: an index needed to extract a value (through: SMT → B2 → B1) having only a Merkle Path. Recall that SMT will store `hash(key, value)` in it's leafs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this.

Copy link
Collaborator Author

@robert-zaremba robert-zaremba Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't get a data using SMT data. SMT only stores hashes.
So, if you read a value from SMT, and you want to get a data out, you need to recover the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sort of like an inverted index then. Can you rewrite this sentence like you just explained to make it clearer please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

3. we could use more buckets to optimize the app usage if needed.

Above, we propose to use KV DB. However, for the state machine, we could use an RDBMS, which we discuss below.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved


### Requirements

State Storage requirements:
+ range queries
+ quick (key, value) access
+ creating a snapshot
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved
+ prunning (garbage collection)
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

State Commitment requirements:
+ fast updates
+ tree path should be short
+ creating a snapshot
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved
+ pruning (garbage collection)


### LazyLedger SMT for State Commitment

A Sparse Merkle tree is based on the idea of a complete Merkle tree of an intractable size. The assumption here is that as the size of the tree is intractable, there would only be a few leaf nodes with valid data blocks relative to the tree size, rendering the tree as sparse.


### Snapshots for storage sync and versioning
One of the Stargate core features are snapshots and fast sync delivered in the `/snapshot` package. This feature is implemented in SDK and requires a storage support. Currently the only supported is IAVL.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.
Comment on lines +81 to +82
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.
Database versioning provides a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a versioning mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support viewing past versions. Hence, we propose to reuse that functionality for the state sync snapshots and versioning (described below). It will limit the supported DB engines to ones which efficiently implement versioning. In a final section we will discuss DBs evaluated for this feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm defining database snapshot here, so I prefer to use snapshot mechanism here, so I prefer to keep the original language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I may have misunderstood. This is what some DBs call snapshots, and distinct from state sync snapshots as used in the ABCI, right? (although it can be used to implement ABCI snapshots)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here we are talking about a database engine mechanism.


New snapshot will be created in every `EndBlocker`. The `rootmulti.Store` keeps track of the version number and implements the `MultiStore` interface. `MultiStore` encapsulates a `Commiter` interface, which has the `Commit`, `SetPruning`, `GetPruning` functions which will be used for creating and removing snapshots. The `Store.Commit` function increments the version on each call, and checks if it needs to remove old versions. We will need to update the SMT interface to implement the `Commiter` interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is snapshot creation part of the state-machine process? Also, if you just take a direct DB snapshot, how do you perform verification?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the App has a knowledge when to create a snapshot. Storage doesn't have that knowledge. We could assume that it can create a snapshot on each commit, but it will make the design more constrained, and the library less robust.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about verification and the time it takes to create a snapshot?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very efficient - the DB is using copy-on-write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification, copy-on-write is used to maintain historical versions, but the state sync snapshot still involves copying the entire state store at the time of creation (at least, that is how it's currently implemented).

robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved
NOTE: `Commit` must be called exactly once per block. Otherwise we risk going out of sync for the version number and block height.
NOTE: For the SDK storage, we may consider splitting that interface into `Committer` and `PrunningCommiter` - only the multiroot should implement `PrunningCommiter` (cache and prefix store don't need pruning).
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

Number of historical versions (snapshots) for `abci.Query` and fast sync is part of a node configuration, not a chain configuration. A configuration should allow to specify number of past blocks and number of past blocks modulo some number (eg: 100 past blocks and one snapshot every 100 blocks for past 2000 blocks). Archival nodes can keep all snapshots.
Copy link
Member

@tac0turtle tac0turtle Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this section reads weird. Could it be reworded.

What is a node configuration and what is a chain configuration?

Also historical versions are not only needed for abci.Query, they are aslo generally needed in the sdk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node configuration = app instance configuration (your app config).
Chain configuration = configuration implied by the blockchain consensus.
Do you have a suggestion how to make it more clear? I can add:
"chain configuration (configuration implied by the blockchain consensus)."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also historical versions are not only needed for abci.Query, they are aslo generally needed in the sdk

You mean SDK State Machine? State Machine doesn't access an old state. I'm not aware about any interface in the SDK to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user wants to query previous state.. Many users query through the sdk currently. The abci.query is just there if someone wants to use it or for the app to query through it.


Pruning old snapshots is effectively done by DB. Whenever we update a record in `SC`, SMT will create a new one without removing the old one. Since we are using a snapshot for each block, we must update the mechanism and immediately remove an orphaned from the storage. This is a safe operation - snapshots will keep track of the records which should be available for past versions.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

To manage the active snapshots we will either us a DB _max number of snapshots_ option (if available), or will remove snapshots in the `EndBlocker`. The latter option can be done efficiently by identifying snapshots with block height.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit confusing to me. Pruning of Snapshots and pruning of application states, currently, are two separate configurable parameters. Are we merging these two? If so can it worded this way.

What is the impact to disk size with this design?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you define application state pruning? For me, it is removing not needed records by a module (eg removing zero balances).
In this document I don't refer to "application state pruning". We don't do app state pruning here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about how we currently prune application states or versions. You are talking about pruning versions or snapshots which are used for versions. This is application state pruning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your concern. ADR-40 is not about pruning application state. Old SS (state storage) versions (a version of the whole state) are covered by snapshots. If we want to remove an old version we remove a snapshot.


#### Accessing old state versions

One of the functional requirements is to access old state. This is done through `abci.Query` structure. The version is specified by a block height (so we query for an object by a key `K` at block height `H`). The number of old versions supported for `abci.Query` is configurable. Accessing an old state is done by using available snapshots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't specific to abci.Query. Might make more sense to reword in the sense of querying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's not specific for abci.Query? I don't see any other use-case than using the ABCI to query old state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users want to query old state as well.. Many dont want to go through abci.Query.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How they do it now? Are you talking about a new feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One of the functional requirements is to access old state. This is done through `abci.Query` structure. The version is specified by a block height (so we query for an object by a key `K` at block height `H`). The number of old versions supported for `abci.Query` is configurable. Accessing an old state is done by using available snapshots.
One of the functional requirements is to access old state. This is done through `abci.Query` structure. The version is specified by a block height (so we query for an object by a key `K` at block height `H`). The number of old versions supported for `abci.Query` is configurable. Accessing an old state is done by using available historical versions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - I prefer to be consistent and use snapshot.

`abci.Query` doesn't need old state of `SC`. So, for efficiency, we should keep `SC` and `SS` in different databases (however using the same DB engine). We will only create snapshots for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple database instances will compete for resources (CPU, memory, IO). IMHO without benchmarking, you can't say that this will be beneficial for efficiency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that SC doesn't need to be part of the transactions - we don't need to keep old versions, as we need with SS (State Store). If we don't separate this DB, then it will be hard to keep versioning for SS and not do it for SC. Does it make sense? If yes I will update the paragraph above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you generate the proof if you don't have the commitment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think its answered, this may benefit from some rewording. Also touching on how proofs for old data will work would be useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof is done by getting a branch from SMT. I will add a sentence about it.


Moreover, SDK could provide a way to directly access the state. However, a state machines shouldn't do that - since the number of snapshots is configurable, it would lead to a not deterministic execution.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

We positively validated a snapshot mechanism for querying old state with regards to the database we evaluated.
Copy link
Member

@tac0turtle tac0turtle Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way we could link to this validation?
Which database was evaluated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way we could link to this validation?

All of it is in the SDK discussion thread, linked in References section. I will link it here as well.

Which database was evaluated?

This is written in Evaluated KV Databases section



### Rollbacks

We need to be able to process transactions and roll-back state updates if transaction fails. This can be done in the following way: during transaction processing, we keep all state change requests (writes) in a `CacheWrapper` abstraction (as it's done today). Once we finish the block processing, in the `Endblocker`, we commit a root store - at that time, all changes are written to the SMT and to the `SS` and a snapshot is created.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved


## Consequences


### Backwards Compatibility

This ADR doesn't introduce any SDK level API changes.

We change a storage layout, so storage migration and a blockchain reboot is required.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

### Positive

+ Decoupling state from state commitment introduce better engineering opportunities for further optimizations and better storage patterns.
+ Performance improvements.
+ Joining SMT based camp which has wider and proven adoption than IAVL. Example projects which decided on SMT: Ethereum2, Diem (Libra), Trillan, Tezos, LazyLedger.

### Negative

+ Storage migration
+ LL SMT doesn't support pruning - we will need to add and test that functionality.

### Neutral

+ Deprecating IAVL, which is one of the core proposals of Cosmos Whitepaper.
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved


## Alternative designs.

Most of the alternative designs were evaluated in [state commitments and storage report](https://paper.dropbox.com/published/State-commitments-and-storage-review--BDvA1MLwRtOx55KRihJ5xxLbBw-KeEB7eOd11pNrZvVtqUgL3h).

Ethereum research published [Verkle Tire](https://notes.ethereum.org/_N1mutVERDKtqGIEYc-Flw#fnref1) - an idea of combining polynomial commitments with merkle tree in order to reduce the tree height. This concept has a very good potential, but we think it's too early to implement it. The current, SMT based design could be easily updated to the Verkle Tire once other research implement all necessary libraries. The main advantage of the design described in this ADR is the separation of state commitments from the data storage and designing a more powerful interface.


## Further Discussions

### Evaluated KV Databases

We verified existing databases KV databases for evaluating snapshot support. The following DBs provide efficient snapshot mechanism: Badger, RocksDB, [Pebbe](https://github.com/cockroachdb/pebble). DB which don't provide such support or are not production ready: boltdb, leveldb, goleveldb, membdb, lmdb.
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

### RDBMS

Use of RDBMS instead of simple KV store for state. Use of RDBMS will require an SDK API breaking change (`KVStore` interface), will allow better data extraction and indexing solutions. Instead of saving an object as a single blob of bytes, we could save it as record in a table in the state storage layer, and as a `hash(key, protobuf(object))` in the SMT as outlined above. To verify that an object registered in RDBMS is same as the one committed to SMT, one will need to load it from RDBMS, marshal using protobuf, hash and do SMT search.


## References
robert-zaremba marked this conversation as resolved.
Show resolved Hide resolved

+ [IAVL What's Next?](https://github.com/cosmos/cosmos-sdk/issues/7100)
+ [IAVL overview](https://docs.google.com/document/d/16Z_hW2rSAmoyMENO-RlAhQjAG3mSNKsQueMnKpmcBv0/edit#heading=h.yd2th7x3o1iv) of it's state v0.15
+ [State commitments and storage report](https://paper.dropbox.com/published/State-commitments-and-storage-review--BDvA1MLwRtOx55KRihJ5xxLbBw-KeEB7eOd11pNrZvVtqUgL3h)
+ [LazyLedger SMT](https://github.com/lazyledger/smt)
+ Facebook Diem (Libra) SMT [design](https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf)
+ [Trillian Revocation Transparency](https://github.com/google/trillian/blob/master/docs/papers/RevocationTransparency.pdf), [Trillian Verifiable Data Structures](https://github.com/google/trillian/blob/master/docs/papers/VerifiableDataStructures.pdf).
+ Design and implementation [discussion](https://github.com/cosmos/cosmos-sdk/discussions/8297).