Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Storage chains: indexing, renewals and reference counting #8265

Merged
19 commits merged into from
Mar 18, 2021
Merged

Conversation

arkpar
Copy link
Member

@arkpar arkpar commented Mar 4, 2021

See #7962

This adds externalities for indexing and renewing transactions from within the runtime.
Runtime may now mark a specific region within the encoded extrinsic data to be indexed and addressable by hash with storage_index_transaction externality. Another function storage_renew_transaction_index renews a previously indexed transaction and prevents it from being pruned.
Reference counting for indexed data is implemented for parity-db natively and for rocksdb by storing a counter in a separate DB value.

Following PR will add a runtime module and a storage proof inherent.

polkadot companion: paritytech/polkadot#2570

@arkpar arkpar added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes labels Mar 4, 2021
@arkpar arkpar requested a review from gavofyork March 4, 2021 10:02
@@ -129,10 +129,10 @@ impl<S, Block> BlockchainBackend<Block> for Blockchain<S> where Block: BlockT, S
Err(ClientError::NotAvailableOnLightClient)
}

fn extrinsic(
fn transaction(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this to transaction since it now returns an indexed portion, rather than all of the extrinsic data

@arkpar arkpar added the C1-low PR touches the given topic and has a low impact on builders. label Mar 4, 2021
Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New externalities feels a lot like offchain indexing here (do trigger some optional behavior from runtime with no feedback).
So I wonder if this could be achievable through offchain worker consuming offchain indexing data containing those indexing operations (it will need to do re-indexing in a second asynchronous step which can be a bit concurrent with pruning but seems manageable).
Thinking twice attaching the block hash with the indexing operation in the offchain db may not be trivial, just mentioning the possibility.
May also be a bit tricky to decode 'extrinsic_headers' then (two different encoding to manage).

client/api/src/client.rs Show resolved Hide resolved
client/api/src/backend.rs Outdated Show resolved Hide resolved
DbHash::from_slice(hash.as_ref()),
Some(extrinsic[offset..].to_vec())
);
extrinsic_headers.push((DbHash::from_slice(&hash.as_ref()), extrinsic[..offset].to_vec()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extrinsic_headers is mixing extrinsic using applied offset with full extrinsic from renewed, is it ok?
Generally I am not sure how 'renewed' is supposed to use its 'size' field.
Also wondering how/if 'extrinsic[..offset]' can be decoded?
The runtime can certainly find the right offset to split transaction between two field, then when decoding one should use the split variant, yes probably work this way (but then 'renewed' also use this variant and should write the split version).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extrinsic_headers is mixing extrinsic using applied offset with full extrinsic from renewed, is it ok?

Yes, all extrinsics that don't index anything, including renewals, are just written as header.

Also wondering how/if 'extrinsic[..offset]' can be decoded?

It is never decoded on its own. Only full extrinsic (header + indexed portion) is decoded. The indexed region is not supposed to be scale-encoded. It's just some bytes that are interpreted by the user.

Generally I am not sure how 'renewed' is supposed to use its 'size' field.

size field is a bit of work in progress. The idea is that the renewal transaction should include fee that is proportional to the existing data size. Here, in commit we would check existing expected indexed data size and don't renew anything if it does not match. However it is not clear yet if actually checking the size in the runtime would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for whatever reason I was thinking renewed did contain the indexed content, and did need to be strip, but obviously it doesn't).
I do not mark the conversation as resolved because the reply looks useful to me, but it is resolved.

client/db/src/lib.rs Outdated Show resolved Hide resolved
client/db/src/lib.rs Outdated Show resolved Hide resolved
primitives/database/src/kvdb.rs Outdated Show resolved Hide resolved
client/db/src/lib.rs Outdated Show resolved Hide resolved
primitives/database/src/lib.rs Outdated Show resolved Hide resolved
primitives/database/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I was a bit hesitant about the new extrinsics, I think there may be something todo with offchain workers to avoid the new extrinsics, but I am also pretty sure it involves substantial additional work.
But this can always switch or change later (as long as they are not call in a production chain (IIRC those are not exposed in the wasm as long as it is not used by the wasm)).

@paritytech paritytech deleted a comment from smisty Mar 16, 2021
@arkpar
Copy link
Member Author

arkpar commented Mar 16, 2021

bot merge force

@ghost
Copy link

ghost commented Mar 16, 2021

Trying merge.

@ghost
Copy link

ghost commented Mar 16, 2021

Merge failed: "At least 2 approving reviews are required by reviewers with write access."

@arkpar arkpar requested a review from bkchr March 16, 2021 17:08
client/api/src/backend.rs Outdated Show resolved Hide resolved
client/api/src/client.rs Show resolved Hide resolved
client/api/src/client.rs Outdated Show resolved Hide resolved
client/network/src/bitswap.rs Show resolved Hide resolved
@@ -568,6 +568,43 @@ where
}
}

fn storage_index_transaction(&mut self, index: u32, offset: u32) -> Result<(), ()>{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return a Result here, when this can not fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

client/api/src/client.rs Outdated Show resolved Hide resolved
if let Some(mut t) = self.indexed_transaction(&hash)? {
data.append(&mut t);
}
Block::Extrinsic::decode(&mut data.as_slice())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would use a custom Input here, we could prevent calling append above and allocating a third time.
This custom input would just combine data and t

|h| self.extrinsic(&h).and_then(|maybe_ex| maybe_ex.ok_or_else(
|| sp_blockchain::Error::Backend(
format!("Missing transaction: {}", h))))
match Vec::<(Block::Hash, Vec<u8>)>::decode(&mut &body[..]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This body structure is really implicitly used here. Maybe we could use at least some typedef.

}
}
// Also discard all blocks from displaced branches
for h in displaced.leaves() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means before we always kept all blocks, even from forks, but now we will prune all fork blocks? Even when block pruning is not enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

for h in displaced.leaves() {
let mut number = finalized;
let mut hash = h.clone();
// Follow displaced chains up to finality point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we start at the finalized point and move downwards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. We start with a displaced leaf and follow its parents until we reach a block that is canonical

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understood it this way.

Shouldn't the comment say Follow displaced chains up to canonical point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they are displaced due to finality, as soon as we meet a canonical parent it is guaranteed to be a part of a finalized chain. I'll update the comment to make it more obvious.

arkpar and others added 2 commits March 17, 2021 21:33
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
@arkpar arkpar added the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 17, 2021
@arkpar
Copy link
Member Author

arkpar commented Mar 18, 2021

@bkchr All the issues addressed now I believe. Companion build failure seems unrelated.

@arkpar arkpar added D6-newhostfunctions and removed A3-in_progress Pull request is in progress. No review needed at this stage. A7-needspolkadotpr labels Mar 18, 2021
self.0.read(&mut into[..read])?;
}
None => {
return self.0.read(into)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we will never read self.1. For our use case here it is okay, because we always should get Some(), but that should really not be used by someone else.

Maybe we should use unreachable here or make JoinInput directly use Vec<u8> and Option<Vec<u8>>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to operate on slices instead

@bkchr bkchr added the D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit label Mar 18, 2021
@arkpar
Copy link
Member Author

arkpar commented Mar 18, 2021

bot merge

@ghost
Copy link

ghost commented Mar 18, 2021

Trying merge.

@ghost ghost merged commit 462384b into master Mar 18, 2021
@ghost ghost deleted the a-tx-storage branch March 18, 2021 11:46
@@ -103,12 +103,35 @@ pub struct OverlayedChanges {
children: Map<StorageKey, (OverlayedChangeSet, ChildInfo)>,
/// Offchain related changes.
offchain: OffchainOverlayedChanges,
/// Transaction index changes,
transaction_index_ops: Vec<IndexOperation>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized while merging master in some PRs, but 'transaction_index_ops' could also support transaction (Same implementation as 'OffchainOverlayedChanges' for offchain indexing).
Without thinking too much about it, I would think it should (would allow conditional indexing).
cc\ @arkpar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not bother with that for now. As I understand, transactional semantics are used here to support rollback on failing extrinsics. Indexing is transparent to the runtime anyway. If an extrinsic fails, that fact that it still indexes something does not affect consensus.
Furthermore:

  1. Storage transactions should not be adding to the index until they are pass the failure point. I.e. all fees are applied.
  2. Unlike regular transactions, failing storage transactions should not be included in the block in the first place. Storing them for free would defeat the purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its a priority.
I agree consensus is not touched and client can even choose to not index anything at all.
That was the same thing with offchain indexing. Can be implemented at any time.
(I was talking about storage transaction, it is used for different purpose, mainly in contracts afaik)

hirschenberger pushed a commit to hirschenberger/substrate that referenced this pull request Apr 14, 2021
…#8265)

* Transaction indexing

* Tests and fixes

* Fixed a comment

* Style

* Build

* Style

* Apply suggestions from code review

Co-authored-by: cheme <emericchevalier.pro@gmail.com>

* Code review suggestions

* Add missing impl

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* impl JoinInput

* Don't store empty slices

* JoinInput operates on slices

Co-authored-by: cheme <emericchevalier.pro@gmail.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
jordy25519 pushed a commit to cennznet/substrate that referenced this pull request Sep 17, 2021
…#8265)

* Transaction indexing

* Tests and fixes

* Fixed a comment

* Style

* Build

* Style

* Apply suggestions from code review

Co-authored-by: cheme <emericchevalier.pro@gmail.com>

* Code review suggestions

* Add missing impl

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* impl JoinInput

* Don't store empty slices

* JoinInput operates on slices

Co-authored-by: cheme <emericchevalier.pro@gmail.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants