-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of storage::append
#30
Comments
I've not checked any details, but I wonder if the "tree-hashing" in blake2 gives efficient append for In fact, you could typically "back out" the "finalization" call to make appends efficient for any hash function, and maybe blake2 also requires doing this. I suppose blake2's "tree-hashing" would make partial reads efficient, but not really help partial writes. |
i would like to tackle it and get some guidance. |
What guidance? If you want to work on this, you should start looking into the code and coming up with a way to implement this. |
Okey |
I have layed out the basic idea in the initial issue description. I would recommend as I said to look into the code first. I didn't spend that much more time thinking about the actual implementation. However, this is also part of the job when wanting to tackle this. Any kind of questions you have, you can post here. |
There is one one compress call in finalization, so an append proof could be an internal blake2b state, but this does not really jive with the hash function interface, so technically such a use case requires re-analysis. ugh.. |
I can not really follow what you are talking about. However, this issue isn't about hashing. |
Me too. I will raise the PR and iterate |
I'm saying if all you want is a more efficient proof of appending to a hash function, to make PoV blobs smaller, then we could actually do that pretty nicely. It requires some further analysis of the has function though, so not really solvable right here. If you do care a lot about then, then ask me to find someone to work it out though. We can move this to the research channel. |
So after reading and familiarizing myself with So 1. Can I tackle this issue step by step and not wait untill I finish it. |
I don't get this question.
Not sure what you mean by the "retrieving side". In the end we will still output a |
Maybe a random remark, but how are you sure that this is the case? Right now, if I understand correctly, you append the new item to the storage value and update the number of elements at the start. This means that, most of the time, the operation will be very inexpensive (memcpying the new element), except when:
...in which case the entire existing data plus the new element are memcpied. On the other hand, with what you suggest:
So, to me, the only situation where that new solution would be better is if either you never ever read the data (in which case why is there a storage item in the first place?), or if you add more than 16384 elements in a row before reading and then never ever append to it again. |
For every append you would not need to relocate the existing data, because the vec will allocate more data (assuming you don't use The most important improvement here would be around storage transactions. Currently storage transactions require that you clone the entire data, while here we could prevent cloning the entire data. We would store only the old number of elements + the old length. Then on discarding a transaction it would be a simple |
|
this snippet function is actually appending data , and yes it is cloning the old data. What I was thinking is because the Please any feedback on my thinking |
Sorry, but I don't get what you have written above. |
I am a bit late in this conversation, but in one of my branch (https://github.com/cheme/substrate/tree/threads) I got a similar system:
|
Another thing that was needed (but is often convenient) was to switch readonly externalities to &mut eg https://github.com/cheme/substrate/blob/b0f194bf319472dd4ffe1211028c02bd5cdf3a77/primitives/externalities/src/lib.rs#L76 |
* adds static `db_config.max_total_wal_size`, pinned to `40 GB` to match current documentation @ https://github.com/PolymeshAssociation/polymesh-tools/blob/7d8146deb459ffca3de6cd5da270928a529f233c/docs/operator/README.md#node-resource-requirements : `It is recommended that you reserve an additional 40GB of disk space for the WAL.` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` Co-authored-by: Quinn Diggity <git@quinn.dev> Max wal size (paritytech#30) * Use `Option<u64>` for max_total_wal_size. Signed-off-by: Robert G. Jakabosky <rjakabosky+neopallium@neoawareness.com>
Bumps [ctrlc](https://github.com/Detegr/rust-ctrlc) from 3.1.3 to 3.1.4. - [Release notes](https://github.com/Detegr/rust-ctrlc/releases) - [Commits](https://github.com/Detegr/rust-ctrlc/commits/3.1.4) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that : - append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value. - rollback on append: restore value by applying offsets and put it back in previous overlay value - commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained. - set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept). The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction. Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector). Fix #30. --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: EgorPopelyaev <egor@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com> Co-authored-by: Liam Aharon <liam.aharon@hotmail.com> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Branislav Kontur <bkontur@gmail.com> Co-authored-by: Bastian Köcher <info@kchr.de> Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that : - append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value. - rollback on append: restore value by applying offsets and put it back in previous overlay value - commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained. - set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept). The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction. Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector). Fix #30. --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: EgorPopelyaev <egor@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com> Co-authored-by: Liam Aharon <liam.aharon@hotmail.com> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Branislav Kontur <bkontur@gmail.com> Co-authored-by: Bastian Köcher <info@kchr.de> Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that : - append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value. - rollback on append: restore value by applying offsets and put it back in previous overlay value - commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained. - set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept). The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction. Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector). Fix #30. --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: EgorPopelyaev <egor@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com> Co-authored-by: Liam Aharon <liam.aharon@hotmail.com> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Branislav Kontur <bkontur@gmail.com> Co-authored-by: Bastian Köcher <info@kchr.de> Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that : - append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value. - rollback on append: restore value by applying offsets and put it back in previous overlay value - commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained. - set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept). The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction. Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector). Fix #30. --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: EgorPopelyaev <egor@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com> Co-authored-by: Liam Aharon <liam.aharon@hotmail.com> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Branislav Kontur <bkontur@gmail.com> Co-authored-by: Bastian Köcher <info@kchr.de> Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that : - append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value. - rollback on append: restore value by applying offsets and put it back in previous overlay value - commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained. - set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept). The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction. Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector). Fix paritytech#30. --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: EgorPopelyaev <egor@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com> Co-authored-by: Liam Aharon <liam.aharon@hotmail.com> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Branislav Kontur <bkontur@gmail.com> Co-authored-by: Bastian Köcher <info@kchr.de> Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
…aritytech#30) * Add prometheus config to run cmd and start substrate network * Sync seednode list from Bitcoin Core * Update openssl dep * Make Substrate full sync work * Restore the default --blocks-pruning * Fix clippy
* adds static `db_config.max_total_wal_size`, pinned to `40 GB` to match current documentation @ https://github.com/PolymeshAssociation/polymesh-tools/blob/7d8146deb459ffca3de6cd5da270928a529f233c/docs/operator/README.md#node-resource-requirements : `It is recommended that you reserve an additional 40GB of disk space for the WAL.` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` Co-authored-by: Quinn Diggity <git@quinn.dev> Max wal size (paritytech#30) * Use `Option<u64>` for max_total_wal_size. Signed-off-by: Robert G. Jakabosky <rjakabosky+neopallium@neoawareness.com>
* adds static `db_config.max_total_wal_size`, pinned to `40 GB` to match current documentation @ https://github.com/PolymeshAssociation/polymesh-tools/blob/7d8146deb459ffca3de6cd5da270928a529f233c/docs/operator/README.md#node-resource-requirements : `It is recommended that you reserve an additional 40GB of disk space for the WAL.` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` * adds cli flag `--db-max-total-wal-size` to allow specifying `db_config.max_total_wal_size` Co-authored-by: Quinn Diggity <git@quinn.dev> Max wal size (paritytech#30) * Use `Option<u64>` for max_total_wal_size. Signed-off-by: Robert G. Jakabosky <rjakabosky+neopallium@neoawareness.com>
This branch propose to avoid clones in append by storing offset and size in previous overlay depth. That way on rollback we can just truncate and change size of existing value. To avoid copy it also means that : - append on new overlay layer if there is an existing value: create a new Append entry with previous offsets, and take memory of previous overlay value. - rollback on append: restore value by applying offsets and put it back in previous overlay value - commit on append: appended value overwrite previous value (is an empty vec as the memory was taken). offsets of commited layer are dropped, if there is offset in previous overlay layer they are maintained. - set value (or remove) when append offsets are present: current appended value is moved back to previous overlay value with offset applied and current empty entry is overwrite (no offsets kept). The modify mechanism is not needed anymore. This branch lacks testing and break some existing genericity (bit of duplicated code), but good to have to check direction. Generally I am not sure if it is worth or we just should favor differents directions (transients blob storage for instance), as the current append mechanism is a bit tricky (having a variable length in first position means we sometime need to insert in front of a vector). Fix paritytech#30. --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: EgorPopelyaev <egor@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com> Co-authored-by: Liam Aharon <liam.aharon@hotmail.com> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Branislav Kontur <bkontur@gmail.com> Co-authored-by: Bastian Köcher <info@kchr.de> Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
storage::append
currently works by appending the new data directly to the old data. Instead it would be better if keep appending the raw data to someVec<u8>
and keep the number of items in a separate field:materialized
would be used when the runtime requests the data to cache the materialized "view". When we append new data, we would resetmaterialized
.This new data structure would also be more optimized for storage transactions as we only need to truncate the
data
vector on discarding a transaction.The text was updated successfully, but these errors were encountered: