This repository has been archived by the owner on Feb 27, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 53
Implement removal of orphan nodes #37
Merged
adlerjohn
merged 12 commits into
celestiaorg:tzdybal/remove_orhpans
from
roysc:tzdybal/remove_orhpans
Jul 8, 2021
Merged
Changes from 6 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
3b026d7
Implement orphan pruning
roysc c355649
Correct tests comments
roysc bb7c2b6
Remove redundant checks and copies
roysc a51be73
Refactor value storage
roysc 48e4ab9
Fix array copy
roysc a207801
Avoid extra key param
roysc 989c5e7
Fix idempotent Set
roysc 37010f3
Deprecate past root access, remove prune as option
roysc 92ecfa6
Index values by just path
roysc afcb9fe
Cleanup
roysc cef0b26
Update deepsubtree.go
roysc c73c878
verifyProofWithUpdates - Fix unused return value
roysc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the first use, but conceptually is there a reason to map
hash(key, hash(value)) -> value
? Why not justhash(key) -> value
(i.e.path -> value
)? Mappingpath -> value
allows retrieval to the value knowing only the key, which is necessary for e.g. constant-time reads.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With just
key/path -> value
, there is no way to retrieve values for past versions. I considered usingroot+key
as @tzdybal suggested, but then lookup requires knowing the root at the value's last update. Seems we would need some sort of copy-on-write versioned backing store to make this efficient.note - this is just
key+hash(value) -> value
, no extra hash.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that comment is outdated. Snapshotting and retrieving past version is outside the scope of this library. It can be accomplished at the db level. See https://github.com/cosmos/cosmos-sdk/blob/6a5a2de798d4e61302783ae84ecfe2958c7088d5/docs/architecture/adr-040-storage-and-smt-state-commitments.md.
Potentially yes, and that's an interesting direction, but as above it's outside the scope of this library. The caller can accomplish this by versioning the mapstores they pass in on SMT initialization.
TL;DR: use
path -> value
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work is actually in support of that ADR - we'll be doing exactly that, so I'm basically indifferent to the indexing used here. I just didn't want to introduce breakage.
If just
path
is used, supporting past value access will be dropped, but I assume proofs for non-pruned roots should still be supported? Otherwise pruning effectively becomes the only behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Proofs against past versions of the tree isn't needed in this library, since that can be accomplished caller-side. So, pruning should be the only behavior (making the
prune
flag unnecessary).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proofs against old roots is a test case specified in the current version of this library. If this feature is to be removed because it's considered out of scope, we should be sure that this can easily be replicated caller-side with a KV store with snapshot support. What KV store supports this exactly? I checked RocksDB, but its snapshot feature is basically copying the entire DB directory, which clearly isn't efficient.
Or would the idea be that the caller can implement versioning themselves on top of any efficient KV store like this or this? If so, this doesn't seem to be suggested in the ADR, which seems to assume that the underlying KV store natively supports snapshotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ADR states that the state commitment (i.e. the SMT, i.e. this library) isn't responsible for snapshots and historical queries, but rather the state storage is. Note that the state storage won't be a raw KV store, but will be a wrapper over one or more raw KV stores with additional logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there has been some benchmarking done on Rocks DB etc snapshots for the ADR, this isn't a widely supported feature by KV stores, and it's yet to be seen how well this will work in practice. RocksDB snapshots seem to be designed with backups use case in mind; creating a new directory for every block could exhaust the inode limit on filesystems, given a chain with millions of blocks. BadgerDB seems to be more suitable as it's a key-value-version store, but this seems to be the only KV that supports this.
This library should act a general-purpose library. If someone wants to use this library for testing or research for example, they shouldn't have to setup complex extra scaffolding - or be tied to BadgerDB (the only KV store that seems to support efficient versioning) - to get the basic feature of accessing previous roots.
For flexibility, this library should ideally support the following two extremes:
Anything in between could be implemented by the caller-supplied KV store; in the simple case, this could be delaying Del operations until e.g. 100 blocks. This would allow us to keep the ProveForRoot etc operations.
This could be implement by using a path -> value KV for this PR, and making pruning optional. The path -> value store (for efficient read ops), would only be for the latest version of the tree. However, the SMT would still need to store valueHash -> value in the nodes MapStore, to allow accessing value for old roots when pruning is disabled. When GetForRoot is called, it would get the leaf by traversing the tree directly, rather than using the path -> value store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively: this library could support "prune everything" only, but keep storing valueHash -> value in the nodes MapStore (until pruned).
It would then be trivial for the caller to implement a "prune nothing" with MapStore scaffolding that simply does nothing when Del is called. GetForRoot and ProveForRoot can then still work. However, those functions would only work for callers that have used a MapStore with this hack.
This is probably the better / easiest option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually I just realised that valueHash -> value can't be stored in either proposals, due to the ambiguity issue when pruning. So we need to think of another way to implement pruning while ideally allow for faster Get operations, without forcing people to use BadgerDB or implement their own versioning scheme, if they want to disable pruning.