Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Light friendly storage tracking: changes trie + extending over ranges #628

Merged
merged 25 commits into from
Sep 18, 2018

Conversation

svyatonik
Copy link
Contributor

@svyatonik svyatonik commented Aug 29, 2018

Implementation of sections "Ordered, indexed, Merklised per-block change-trie" and "Extending over ranges" of #131

Changes trie is built at the end of each block and contains mapping of changed keys to the indices of extrinsics where they have been changed. E.g. if extrinsic#5 has changed FreeBalanceOfAlice from 500 to 400 and extrinsic#8 has changed free FreeBalanceOfBob from 300 to 200, the input for changes trie will be:

[
  FreeBalanceOfAlice => [5],
  FreeBalanceOfBob => [8],
]

Note that comparing to original description from #131, input if free of [ index => key ] pairs, because of this comment.

There are 2 configuration parameters affecting changes trie creation (right now they're configured at genesis and can't be changed later - see TODO for this):

  • digest_interval - every digest_interval blocks changes trie will contain level1-digest: mapping of changed keys to the numbers of blocks where they have been changed for previous digest_interval-1 blocks;
  • digest_levels - max level of hierarchical digests. Digest of level2 is created every digest_interval^2 blocks and contains all changed keys from previous digest_interval-1 level1 digests. And so on.

Digest build example:

let configuration = { digest_interval: 16, digest_levels: 3 };
let block = 4096;
let changes_trie = [
  map[ changed key of block#4096 => Set<ExtrinsicIndex> ],
  map[
    changed key of last 15 blocks#[4081, 4082, 4083, 4084, 4085, 4086, 4087, 4088, 4089, 4090, 4091, 4092, 4093, 4094, 4095]
      => Set<BlockNumber>
  ],
  map[
    changed keys of last 15 level1-digests at blocks#[3856, 3872, 3888, 3904, 3920, 3936, 3952, 3968, 3984, 4000, 4016, 4032, 4048, 4064, 4080]
      => Set<DigestBlockNumber>
  ],
  map[
    changed keys of last 15 level2-digests at blocks#[256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328, 3584, 3840]
      => Set<DigestBlockNumber>
  ],
]

Changes tries are optional and are built upon runtime request. To announce that it wants to create changes trie, runtime provides configuration to the substrate. At the end of block, if config has been set, the root of changes trie is computed and stored in the block header. The nodes of changes trie are saved in the database on import operation commit.

Bu reading changes tries for a range of blocks, full node can find all ( block, extrinsic ) pairs where given key has been changed in by calling key_changes method. Full node can provide a proof of key_changes execution to light nodes.

TODOs:

  1. add key_changes proof request && responses at Fetcher/OnDemand level
  2. use (1) to clear tx pool on light nodes
  3. support changes trie pruning - we can prune change tries older than digest_interval^digest_level on non-archive full nodes
  4. support 'turning on' changes trie functionality for chains that were not using it previously
  5. cache mapping [ changed key => block number ] to speed up digests creation

@svyatonik svyatonik added the A0-please_review Pull request needs code review. label Aug 29, 2018
@gavofyork
Copy link
Member

4 probably won't be needed as we'll restart the testnet and any other chains should begin with this on.

@svyatonik
Copy link
Contributor Author

svyatonik commented Aug 30, 2018

@gavofyork I assumed that this PR is most useful for light nodes (though it could be utilized by full nodes as well). And I could remember that @rphmeier has said previously that not all substrate-based chains require a light clients => that's why I made a Header::changes_root an Option<Hash> && included this TODO.

So if the decision is to build changes tries for all chains, this TODO should be renamed to "make Header::changes_root a required field" && placed under number#1 in the list.

@gavofyork
Copy link
Member

gavofyork commented Aug 30, 2018

I would consider putting changes_root as a field of Digest to avoid polluting the header. I'd generally agree with keeping changes_root to be optional.

@@ -117,6 +115,9 @@ decl_storage! {
pub ExtrinsicIndex get(extrinsic_index): b"sys:xti" => u32;
ExtrinsicData get(extrinsic_data): b"sys:xtd" => required map [ u32 => Vec<u8> ];
RandomSeed get(random_seed): b"sys:rnd" => required T::Hash;
ChangesTrieCreationEnabled get(changes_trie_creation_enabled): b"sys:changes_trie_creation_enabled" => default bool;
Copy link
Member

@gavofyork gavofyork Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything in storage is already an option, so bools are generally superfluous. probably better done as:

ChangesTrie get(changes_trie) => ChangesTrieConfig;

and then define ChangesTrieConfig:

#[derive(Encode, Decode, PartialEq, Eq, Clone)]
struct ChangesTrieConfig {
    interval: u32,
    levels: u32,
}

then it's just if let Some(config) = Self::changes_trie() { /* do stuff */ }

use changes_trie::storage::InMemoryStorage;
use super::*;

fn prepare_for_drilldown() -> (Configuration, InMemoryStorage<KeccakHasher>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be using Blake2Hasher now.

@gavofyork
Copy link
Member

Looks good otherwise!

@svyatonik
Copy link
Contributor Author

svyatonik commented Aug 30, 2018

@gavofyork

  1. made the ChangesTrieConfiguration encodable + fixed how it is stored, thanks
  2. looks like KeccakHasher is used everywhere now (merged with master) - probably it is planned to go on with Blake2?
  3. moving from Header to Digest looks like a good idea to me, but right now there are no digest items at all. I'm currently working on this in Light-client Authority Set Handoffs #269 . We could onice this PR until Light-client Authority Set Handoffs #269 is resolved, or postpone this migration

@gavofyork
Copy link
Member

The rebasing-upkeep is no-doubt non-trivial, so I'm happy to figure this out sooner rather than later. Issue with using Digest is that the runtime will need to be aware of whether its tracking changes or not since it authors Digest. That said, it seems reasonable to expect this since there are storage items in system already which control it.

@svyatonik svyatonik mentioned this pull request Aug 31, 2018
@@ -0,0 +1,194 @@
[[package]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this checked in accidentally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops...Thanks! :)

Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks reasonable! Few style nits and questions

pub(crate) changes_trie_config: Option<ChangesTrieConfig>,
pub(crate) block: Option<u64>,
pub(crate) extrinsic: Option<u32>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure has become unwieldy.

I'm proposing to make some changes:

  1. Explicitly name tuples that are used as values in HashMap for prospective and commited. It's is not immediatelly clear why it has such arguments. Also usages like entry.0 and entry.1 will become much cleaner.
  2. Leave a comment what exactly block and extrinsic mean and why they're optional, etc.


impl OverlayedChanges {
/// Sets the changes trie configuration.
pub(crate) fn set_changes_trie_config(&mut self, block: u64, config: ChangesTrieConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has some some pre-conditionals. I think they're worth documenting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So because they're ultimately used by the runtime ideally all functions in the chain should contain a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to require these pre-conditions? Why, for example, set_extrinsic_index doesn't panic if changes_trie_config is none but just no-op? Should we do the same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting extrinsic index is noop in case if config hasn't been set.
Setting different config leads to the creation of invalid changes trie.
Updated comment.


//! Substrate changes trie configuration.

/// An identifier for an authority in the consensus algorithm. The same size as ed25519::Public.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the comment?

Ok(extrinsic_map.into_iter()
.map(move |(key, extrinsics)| InputPair::ExtrinsicIndex(ExtrinsicIndex {
block,
key: key.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need clone() here?

let mut is_set: u8 = 0;
let mut result: [u8; 32] = Default::default();
unsafe {
ext_storage_changes_root(&mut is_set, result.as_mut_ptr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need is_set here? Wouldn't it be easier to just return it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option is not ConvertibleToWasm. It is easier to return Option than is_set + result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry should have state it more clearer: what i meant is that we can return is_set not via pointer but by the return value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks! :)


// check storage changes root
let storage_changes_root = System::Hashing::storage_changes_root();
header.changes_root().cloned().check_equal(&storage_changes_root);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need cloned here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because CheckEqual was implemented for Option<T>. Changed to for Option<&T>.


/// Changes trie storage. Provides access to trie roots and trie nodes.
pub trait Storage<H: Hasher>: Send + Sync {
//type ChangesTrieTransaction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to not check in commented code

@svyatonik svyatonik added A5-grumble and removed A0-please_review Pull request needs code review. labels Aug 31, 2018
@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A5-grumble labels Sep 3, 2018
@svyatonik svyatonik removed the A0-please_review Pull request needs code review. label Sep 13, 2018
@gavofyork
Copy link
Member

gavofyork commented Sep 13, 2018

Ok - I understand a bit better now.

So I'm happy that there needs to be the extrinsic ext_storage_changes_trie_root. However, I am very much against extending the Externalities trait, particularly so with any generics.

I'd suggest that we simply name a couple of storage items:

  • :changes_trie: If Some, then storage changes should be collated into the trie. Contains an Encode of the relevant configuration structure.
  • :extrinsic_index: If Some, then encodes a u32 expressing the currently executing extrinsic index. If None then relates to the initialisation or finalisation of the block (no distinction is made between the two).

We can then move the system ExtrinsicIndex item over to become a system-level storage item (like :code, :authorities and :heap_pages) and name it :extrinsic_index as above. There should be no need for any changes to Externalities under this model IIUC.

Generic arguments should be removed by making assumptions about the type of trie in the same way that is currently the case with ext_storage_trie_root.

@svyatonik
Copy link
Contributor Author

  1. replaced Ext::bind_to_extrinsic with direct read from native code
  2. replaced Ext::ext_set_changes_trie_config with direct read from native code
  3. added block argument to the Ext::storage_changes_trie_root (we need it to build changes trie)

TestExternalities still require generic NodeCodec argument. The difference with ext_storage_trie_root is that when computing changes trie root, we need to read previous change tries (remember 'extending over ranges') => codec is required. And when we're calling ext_storage_trie_root we do not need anything except changes itself. I thought that maybe I could replace changes_trie::Storage trait with higher-level trait (like read_changed_keys_at_block), but this is a dead-end, because then we'll be unable to generate proofs for changes (that's why we need low-level Storage here).

And once again - I'm not changing the trait - I'm adding missing generic parameter to the implementation we use in tests - Ext already have a dependency on NodeCodec.

@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A5-grumble labels Sep 14, 2018
@gavofyork
Copy link
Member

Fair enough - looks good.

@gavofyork gavofyork added A7-looksgoodcantmerge A0-please_review Pull request needs code review. and removed A0-please_review Pull request needs code review. A7-looksgoodcantmerge labels Sep 15, 2018
@gavofyork
Copy link
Member

Would get a few more eyeballs on this - @rphmeier @pepyakin @arkpar

Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will perform more thorough review a bit later today!

@@ -769,6 +773,8 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
"checksum num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c51a3322e4bca9d212ad9a158a02abc6934d005490c054a2778df73a70aa0a30"
"checksum owning_ref 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "cdf84f41639e037b484f93433aa3897863b561ed65c6e59c7073d7c561710f37"
"checksum parity-bytes 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)" = "fa5168b4cf41f3835e4bc6ffb32f51bc9365dc50cb351904595b3931d917fd0c"
"checksum parity-codec 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "9260216bbb7acdbb72dfb6bcb63245ef5bde9a8fb7706fe641e1caf1e7ae460f"
"checksum parity-codec-derive 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "1eda64d782c342261aea4ca047a609f9bd92d5f9dafabe6b5a396caf5c7b8827"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this line/file:

There are some wasm files was checked. I think this was incidental, right?

}

fn construct_block(number: BlockNumber, parent_hash: Hash, state_root: Hash, extrinsics: Vec<CheckedExtrinsic>) -> (Vec<u8>, Hash) {
fn construct_block(number: BlockNumber, parent_hash: Hash, state_root: Hash, changes_root: Option<Hash>, extrinsics: Vec<CheckedExtrinsic>) -> (Vec<u8>, Hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this line?

b"aba".to_vec() => None,
b"abd".to_vec() => None,
b"abc".to_vec() => None.into(),
b"abb".to_vec() => None.into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this were changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously: OverlayedChanges.committed: HashMap<Vec<u8>, Option<Vec<u8>>> now OverlayedChanges.committed: HashMap<Vec<u8>, OverlayedChange>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry! I meant the order, not types : )

}

/// Sets the index of extrinsic that is currenty executing.
pub fn set_extrinsic_index(extrinsic_index: u32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used exclusively for tests? If so, let's put #[cfg(any(feature = "std", test))] on it

Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! There is a lot of stuff going on there and I can't say that I looked into every corner. But where I looked it looks pretty reasonable.

There are a few places where I'd add some docs / do renamings / some refactor. I've noted a few of them.

That said, I don't want to block merging for this, on the contrary: I want to merge this ASAP and fix other issues in following PRs.

@svyatonik can you rebase this please?

///
/// Returns false if configuration has been set already and we now trying
/// to install different configuration. This isn't supported now.
pub(crate) fn set_changes_trie_config(&mut self, config: ChangesTrieConfig) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider marking #[must_use] until then.

if let Some(extrinsic) = extrinsic_index {
let mut extrinsics = entry.extrinsics.take().unwrap_or_default();
extrinsics.insert(extrinsic);
entry.extrinsics = Some(extrinsics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this way of achieving the same

entry.extrinsics.get_or_insert_with(Default::default).insert(extrinsic);

?

// is created after OverlayedChanges

let changes_trie_config = try_read_overlay_value(overlay, backend, b":changes_trie")?;
set_changes_trie_config(overlay, changes_trie_config)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_changes_trie_config has the same name as overlay.set_changes_trie_config but actually also performs a decoding and handling None. Consider renaming this method.

return DigestBuildIterator::empty();
}

// digest is buievery digest_multiplier blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: is built

begin: u64,
end: u64,
max: u64,
key: &[u8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does max mean?

max: u64,
begin: u64,
end: u64,
) -> Result<(u64, u64, u64, u32), String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cool to extract struct for the return value.

@gavofyork gavofyork merged commit 95e017d into master Sep 18, 2018
@gavofyork gavofyork deleted the change_trie branch September 18, 2018 07:14
lamafab pushed a commit to lamafab/substrate that referenced this pull request Jun 16, 2020
liuchengxu pushed a commit to chainx-org/substrate that referenced this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants