-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree column #199
Comments
About plugin triedb to it, we currently don't always have this path: &[NodeKey]. For iterator, we got a stack of crumb containing option hash so not really doable to get &[NodeKey] memory wise, but could do something with a different api (iterator or some trait over sized array access). For triedbmut we should limit to the modifying operation, but in this case access are not always the child of previous access (when merging with another children for instance), and we don't have an explicit stack (we use recursive call instead). So in this case I feel like it would need some change (it just means keeping a Vec along the recursive call should be ok to do, especially as it will be following the same rules as the NibbleFullkey vec that is use to keep trace of the prefix for rocksdb). So in the end I think the easier might be to just extend the struct that is used to get current node prefix for rocksdb with a stack of node Hashes (at least for a quick and simple implementation). If it seems pretty clear to me how to access the db, updating the state is currently not as clear (generally all I am thinking about is ordered node payload to be able to write in order with filling address of a stack of parent (same for removal)). This would be easier implemented by not using triedbmut at all and just run a diff in a single interation (I did write this in the past and it was doable but a bit tricky with node splitting and merging but doable)). Or could just address node with their path added to the hash (as accessing with get_node), but this sounds like something heavy to keep trace of in the changeset sent from state-machine to client. |
Sounds like quite a few changes will be requred at the trie level. So it looks like it makes sense to have the node index at least initially, so the path won't be required: fn get_node(col: ColId, node_key: &NodeKey) -> Result<Option<(Vec<u8>, Children)>>; Would that work? We'd still need to rebuild the trie node from Regarding commits, the order of nodes in |
Not really involved I think, and could even reuse the buffer since it would be decoded just after (I mean encoding would allow triedb to read expected data, but triedb should ultimately just instantiate OwnedNode from Vec and Children directly). One small thing missing here is the ability to pass with the Operation the state version of the trie: need to be store on a per node basis, but would just be part of "the type of the node". Actually I think it is the only type information needed as 0 children is a leaf, others are branches (value presence should be cheap to check) and empty node is leaf without value at depth 0. Another question would be if the partial key should be stored aligned, eg in case we got branch with nibbles "abc" and leaf at "d" with no partial. Do we store "abc" as "abc0" or do we pad as in the codec "0abc". I would tend to think it is better to pad where it is in the trie "abc0" when depth is even and "0d" when depth is odd. but that is just a detail (and not really useful since query initially would pas by the encoded form of the node). |
This can be stored in
This is not really paritydb concern, as partial key encoding will be done at the |
Updated the description for a simpler design. Instead of |
@arkpar Are there any updates about this issue? |
Some progress in paritytech/polkadot-sdk#3386 (I got to clean and add test before moving out of draft, but the referenced paritydb patch is master (trie patch got also substantial unmerged changes)). |
Introduce a new column type:
MultiTree
intended for more efficient storage of blockchain state tries.Design goals:
Allow storing arbitrary trie structures in the database. Each tree is identified by a root key. There may be arbitrary number of roots. The column should optimize disk layout for tree traversal.
The API.
NodeAddress
is an opaque ID that allows for quick node retrieval. These IDs are defined by the database and will most likely be justTable::Address
Operation
enum is extended with two additional tree operations.Querying a root:
Querying non-root nodes:
Internal structure.
Hash index will be used for root nodes only. Nodes are stored in value tables. For Each node we store
N
pairs of(NodeKey, NodeAddress)
.N
is a compile time constant = 16.Removing trees.
There are two possible approches to implementing tree removal
NodeAddresses
. On deletion these are dereferenced and the journal record is deleted as well. These records may be stored in a separate value table.This should be faster overall, but potentially waste space for large tree sets.
Background optimization.
The database will periodically resettle nodes within value tables to optimize on-disk tree layout. The idea here that iterating a recent trie root should result in sequential IO.
Potential issues:
Data duplicaton. The design as proposed here has no node index. If the same value node is inserted into 2 different paths of the same tree, it will be stored twice. If this turns out to be an issue we'd need to add each node to the index and use reference counting.
We need to collect statistics for existing chains to estimate the impact for this.
The text was updated successfully, but these errors were encountered: