Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update seek to seek_prefix #11

Closed
wants to merge 46 commits into from
Closed

Conversation

blasrodri
Copy link

No description provided.

mina86 and others added 20 commits August 30, 2022 15:20
Add Temperature enum and NodeStorage struct which introduces a layer
between node’s storage and Store interface.  As it stand, this is
really just complicating code, however, the eventual goal is to have
hot and cold storage with NodeStorage being interface to access
them.  Store struct will offer access to either of those databases.

The Temperature enum is a ZST so in the current form it should be
optimised out at compile time.
Contract cache now lives in contract-runtime, this code is just an
elaborate indirection, we can get rid of it.
Introduce ColdDatabase struct which wraps a real Database interface
and transparently translates reads and writes as expected by the users
of Database interface to reads and writes as they will happen in cold
storage.  This is needed because we want to optimise cold storage
slightly (e.g. by simplifying key in State column and removing
reference counts).
Standard rocksdb `flush()` only flushes the default column family.
See https://github.com/facebook/rocksdb/blob/95ef007adc9365fbefc0f957722a191c1fd7dcd3/include/rocksdb/db.h#L1398-L1400

To flush all column families as intended, iterator over them and flush
them individually.

This change only affects the runtime parameter estimator. In the
productive client, we never force a flush. (As of today.)
Firstly, 1.29 release cycle has started so start fuzzing 1.29.0.

Secondly, 1.28.1 has been released so switch to fuzzing that rather
than 1.28.0.

And lastly, with all that don’t waste time fuzzing 1.27 any longer.
Just minor cleanups:

* remove needles `&mut`
* flatten the entry point a bit
* collapse two unwraps into one
)

Add flat_state::maybe_new function which creates FlatState object when
requested and the feature is enabled.  This removes bunch of
annotations and noise from shard_tries.rs.

Furthermore, split definition of FlateState to one when the feature is
enabled and another when it’s disabled.  This further removes bunch of
annotations this time at definition place.

Furthermore, by making the latter impossible to instantiate (by using
an enum with no variants) this encodes in the type system that the
flat state won’t be created if the feature is not enabled and its
method won’t be called.  This also makes `Option<FlatState>` a ZST.
Since PeerManager has uses the database in a way noticeably different
than the rest of the code (that is, it just stores mutable data there
rather than appending new information to the growing chain), PeerStore
bypasses the Store abstraction.

Despite that, when creating a new PeerManagerActor a Store instance
needs to be passed as constructor argument.  The Store abstraction is
then pealed away to get the inner Database object.

Change the code so that the Database is passed directly and pealing of
the abstraction happens at the call site.

This will also simplify future code which adds NodeStorage and
Temperature abstractions.
This PR introduces two changes:
1. Complete the logic that make nodes panic if their protocol version is old. Previously, the check was only triggered at epoch boundaries. If validators restart their nodes, their nodes will keep running. 
2. Introduce a new protocol feature that forbids blocks produced with old protocol versions. This can help kick out validators that are not upgraded.
This PR simplifies the code around simple nightsahde shard layout upgrade. Instead of adding the hacky field `simple_nightsahde_shard_layout` in genesis, now the logic is moved to inside AllEpochConfig
…cs (near#7523)

* network: introduce near_peer_message_sent_by_type_{bytes,total} metrics

* fmt
Currently, when data is read from RocksDB it’s converted from
RocksDB’s internal type (specifically DBPinnableSlice) to a vector.

The problem is that we often end up discarding this vector just as
soon as it’s created.  For example, when deserialising objects using
get_ser method an owned vector is unnecessary.  And when accessing
DBCol::State the data is converted into an Arc which forces another
allocation and memory copy.

Introduce DBBytes type which is what Database::get_raw_bytes and
Database::get_with_rc_stripped methods return.  For RocksDB the object
holds RocksDB specific DBPinnableSlice which than caller can treat as
a slice or convert to a vector or an arc as desired.

Furthermore, change Store::get to return data in this form.
Furthermore, for convenience, introduce Store::get_vec which does
conversion to a vector.
* direct routed message

* fixed tests

* compilation fix

* applied comments

* removing unnecessary clones

* compilation fix

Co-authored-by: near-bulldozer[bot] <73298989+near-bulldozer[bot]@users.noreply.github.com>
The test broke after recent changes to tracked shards configuration
options.  Fix it.
There was a compilation determinism issue in our instrumentation, due to iterating over a HashMap when generating the thunks. With this, I add:
- A fuzzer that tries to automatically detect non-determinism instances
- An output from this fuzzer that generates said non-determinism; auto-generated so that when running `cargo test` the regression would be caught
- The fix for the compilation determinism issue per se

I've been running the fuzzer locally for a bit to check there's no other obvious instance of compilation non-determinism, and coverage has already doubled since the fuzzer found the crash I checked in so hopefully other instances of compilation non-determinism would be hard to find.

cc @akhi3030 @matklad
because the test relies on the string in a log.
Passing nayduck run: https://nayduck.near.org/#/test/358900
This change will need to be reverted before mainnet protocol version reaches 100.
Set the cost of IO read and write bytes in qemu based estimations to 0.

The only real purpose of having qemu based measurements today
is having a stable output that can be used to detect performance
regressions. Since near#7484 this no longer works because read IO proofs
to be unstable.

Setting IO costs to 0 should stabilize gas costs at least partially.

Also setting default estimation to `time` to better reflect how we use it now.
mina86 and others added 5 commits September 2, 2022 11:30
This avoids an `expect` present in the second invocation.
See near#7287 for similar work.

This should be the final PR that is based splitting out tests based on `ProtocolFeature`.  I suspect that we could split some more tests from `process_blocks.rs` as well.  But we will have to figure out how to organise them.
This is part of a preparation to support TIER1 peers.
I've moved as much logic as possible inside the constructor, so that the number of arguments is minimal. I've also added a PeerActor::Config which is passed to the constructor, because PeerActor::new cannot return an error (due to Actix API constraints). Also moved accepting new connections from PeerManagerActor to NetworkState.
Allow to select only one metric instead of always running both
in `estimator-warehouse estimate`.

This is necessary because `icount` estimations have become very slow.
runtime/runtime/src/state_viewer/mod.rs Outdated Show resolved Hide resolved
@@ -224,7 +224,7 @@ impl<'a> TrieUpdateIterator<'a> {
}
None => None,
};
trie_iter.seek(&start_offset)?;
trie_iter.seek_prefix(&start_offset)?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The !key.starts_with(prefix) check in stop_cond on line 264 is no longer necessary.

@@ -948,7 +948,7 @@ mod tests {
let root = test_populate_trie(&tries, &Trie::EMPTY_ROOT, ShardUId::single_shard(), changes);
let trie = tries.get_trie_for_shard(ShardUId::single_shard(), root.clone());
let mut iter = trie.iter().unwrap();
iter.seek(&vec![0, 116, 101, 115, 116, 44]).unwrap();
iter.seek_prefix(&vec![0, 116, 101, 115, 116, 44]).unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should now pass if we change pairs[..2] to just pairs on line 957.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to create a new element

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added on 591a2e4

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that’s wrong. It should pass without that element. It doesn’t because the seek_prefix implementation isn’t correct. Similarly, assert!(key.starts_with(&query), mentioned in another comment fails the test because the current implementation is incorrect.

core/store/src/trie/iterator.rs Outdated Show resolved Hide resolved
core/store/src/trie/iterator.rs Outdated Show resolved Hide resolved
core/store/src/trie/mod.rs Outdated Show resolved Hide resolved
core/store/src/trie/mod.rs Outdated Show resolved Hide resolved
matklad and others added 3 commits September 2, 2022 16:07
When profiling memory usage, I've noticed non-trivial amount of space is occupied by net IO buffers. So, let's add metrics to track their size!

We use SockAddr, rather than `PeerId`, as a label here, as we don't know peer ids for outgoing connections. 

I don't worry much about perf hit here -- we more or less do a syscall here, that should dominate metric bump. 

The implementation is a bit hacky, but the "proper" fix would probably require unforking the FramedRead/Write infrastructure.
Debug page now has as section for catchup information

<img width="785" alt="Screen Shot 2022-09-02 at 11 58 39 AM" src="https://user-images.githubusercontent.com/34969888/188193706-291ae020-888d-4215-ac75-bd43056d280b.png">

Catchup info will also be printed as part of stats in log. 
```
2022-09-02T15:58:30.730076Z  INFO stats: Catchups
Sync block `CZL3BCD15QpZ4FLqWu7vAJ97JsVHJ1ngtZQupfhiVaL6`@61
Shard sync status: Shard 0 done, Shard 1 split applying total parts 3 done 1, Shard 2 split applying not started, Shard 3 split applying not started
Next blocks to catch up: `CZL3BCD15QpZ4FLqWu7vAJ97JsVHJ1ngtZQupfhiVaL6`@61
```
…ease (near#7538)

* add feature RejectBlocksWithOutdatedProtocolVersions to the next shardnet release

* address comments
mina86 and others added 5 commits September 5, 2022 07:02
Deprecate near_peer_message_received_total Prometheus metric in favour
of aggregating near_peer_message_received_by_type_total metric instead.
This ... is somewhat questionable API, let's remove it if we are not
using it.

Was added and removed in

* 061161b
* 1198ec7
Taking slice as argument is more flexible than taking reference to
a vector since the caller does not need to have to own the data.
In addition, slices also work if caller owns a boxed slice which
would need to be converted into a vector if function required
reference to a vector.  Slices also allow to take sub-slices easily.

The only remaining place is Store::push_component in near-network crate.
This is needed because near-network’s schema’s StoreUpdate::set doesn’t
work with borrowed types.  More generic magic is needed to make it work.
Separate crate for test-only logging utils doesn't really pull its
weight, let's just add a `testonly` module to the general o11y crate.

Especially given that there already were some prod dependencies on this
crate.
…ear#7544)

With cold storage approaching, it’s gonig to be necessary to be able to
open RocksDB instance with only a subset of defined columns.  Change how
RocksDB deals with column family handles.  Specifically, get_cf_handles
now accepts an iterator over DBCol values as argument and cf_handle will
return an error if trying to get a handle for a column that wasn’t set
up.
@vmarkushin
Copy link

Hey @mina86, could you review the changes we've just pushed? It's not 100% correct, but when trying to fix the seek_nibble_slice implementation we got stuck. test_iterator_seek is running infinitely 1/5 times (randomly) and it's not clear how to fix this because it requires a more deep understanding of how the iterator works, for which we don't have time, unfortunately, so, please help.

Index: core/store/src/trie/mod.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/core/store/src/trie/mod.rs b/core/store/src/trie/mod.rs
--- a/core/store/src/trie/mod.rs	(revision 7f44858bf6061e41822d6eb99a786843a964749b)
+++ b/core/store/src/trie/mod.rs	(date 1662386318080)
@@ -896,8 +896,8 @@
 
         let mut other_iter = trie.iter().unwrap();
         other_iter.seek_prefix(b"r").unwrap();
-        assert_eq!(other_iter.next().unwrap().unwrap().0, b"x".to_vec());
-        assert_eq!(other_iter.next().unwrap().unwrap().0, b"y".to_vec());
+        // assert_eq!(other_iter.next().unwrap().unwrap().0, b"x".to_vec());
+        // assert_eq!(other_iter.next().unwrap().unwrap().0, b"y".to_vec());
         assert_eq!(other_iter.next(), None);
         other_iter.seek_prefix(b"x").unwrap();
         assert_eq!(other_iter.next().unwrap().unwrap().0, b"x".to_vec());
@@ -1059,7 +1059,8 @@
                 let mut iterator = trie.iter().unwrap();
                 iterator.seek_prefix(&query).unwrap();
                 if let Some(Ok((key, _))) = iterator.next() {
-                    assert!(key >= query);
+                    assert!(key.starts_with(&query), "‘{key:x?}’ does not start with ‘{query:x?}’");
+                    // assert!(key.starts_with(&query), "‘{key:x?}’ does not start with ‘{query:x?}’");
                 }
             }
         }
Index: core/store/src/trie/iterator.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/core/store/src/trie/iterator.rs b/core/store/src/trie/iterator.rs
--- a/core/store/src/trie/iterator.rs	(revision 7f44858bf6061e41822d6eb99a786843a964749b)
+++ b/core/store/src/trie/iterator.rs	(date 1662386844161)
@@ -107,13 +107,22 @@
             self.descend_into_node(&hash)?;
             let Crumb { status, node, prefix_boundary } = self.trail.last_mut().unwrap();
             prev_prefix_boundary = prefix_boundary;
-            match &node.node {
+            match dbg!(&node.node) {
                 TrieNode::Empty => break,
                 TrieNode::Leaf(leaf_key, _) => {
                     let existing_key = NibbleSlice::from_encoded(leaf_key).0;
-                    if existing_key < key {
-                        self.key_nibbles.extend(existing_key.iter());
-                        *status = CrumbStatus::Exiting;
+                    if is_prefix_seek {
+                        // if existing_key == key {
+                        *prefix_boundary = true;
+                        *status = CrumbStatus::Exiting;
+                        self.key_nibbles.extend(existing_key.iter());
+                        break;
+                        // }
+                    } else {
+                        if existing_key < key {
+                            self.key_nibbles.extend(existing_key.iter());
+                            *status = CrumbStatus::Exiting;
+                        }
                     }
                     break;
                 }
@@ -128,18 +137,29 @@
                             hash = *child.unwrap_hash();
                             key = key.mid(1);
                         } else {
+                            println!("b");
+                            *prefix_boundary = true;
+                            // *prev_prefix_boundary = is_prefix_seek;
                             break;
                         }
                     }
                 }
                 TrieNode::Extension(ext_key, child) => {
                     let existing_key = NibbleSlice::from_encoded(ext_key).0;
+
                     if key.starts_with(&existing_key) {
                         key = key.mid(existing_key.len());
                         hash = *child.unwrap_hash();
                         *status = CrumbStatus::At;
                         self.key_nibbles.extend(existing_key.iter());
                     } else {
+                        if is_prefix_seek {
+                            *prefix_boundary = true;
+                            *status = CrumbStatus::Exiting;
+                            // self.key_nibbles.extend(existing_key.iter());
+                            break;
+                        }
+
                         if existing_key < key {
                             *status = CrumbStatus::Exiting;
                             self.key_nibbles.extend(existing_key.iter());

mina86 and others added 5 commits September 5, 2022 16:31
check_free_space_interval and free_space_threshold fields of RocksDB
struct have always the same value.  Turn them into constants.
Extend test_iterator_seek by adding verification of known existing
keys.
Metrics *are* part of o11y package, it's much better organizationally to
not split this stuff into two crates.
The database version definitions make more sense inside of near-store
than near-primitives.  Especially since majority of usage of them
is in near-store and the definitions aren’t used for anything specific
to the chain or other operations not involving the storage.

While at it, change RocksDB::get_version such that it checks whether the
database exists (and returns None if it doesn’t) and opens the database
with only DbVersion column configured (so that’ll properly read version
from old databases without modifying them).
Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>
@mina86
Copy link

mina86 commented Sep 5, 2022

By the way, could you make this a PR against near/nearcore repository and allow edits from maintainers? It’s going to be easier for me to follow up with merges and quick fixes.

@mina86
Copy link

mina86 commented Sep 5, 2022

test_iterator_seek is running infinitely 1/5 times (randomly)

I actually cannot reproduce it. I fetched this branch, merged upstream’s master and the test works perfectly fine:

Nevermind. I was fetching the wrong code.

@blasrodri
Copy link
Author

By the way, could you make this a PR against near/nearcore repository and allow edits from maintainers? It’s going to be easier for me to follow up with merges and quick fixes.

I cannot make this PR against near/nearcore because mpt-proof-rework isn't part of that repo. Will allow edit access from maintainers

@mina86
Copy link

mina86 commented Sep 5, 2022

Yes, it needs to be rebased with just the changes related to seek_prefix. Turns out this is the difficult part of the proof feature so it needs to be handled as a separate change.

@mina86
Copy link

mina86 commented Sep 5, 2022

Turns out this is a bit more complicated than I’ve initially imagine, but this should help:

@@ -62,26 +67,50 @@ impl<'a> TrieIterator<'a> {
     }

     /// Position the iterator on the first element with key => `key`.
-    pub fn seek<K: AsRef<[u8]>>(&mut self, key: K) -> Result<(), StorageError> {
-        self.seek_nibble_slice(NibbleSlice::new(key.as_ref())).map(drop)
+    pub fn seek_prefix<K: AsRef<[u8]>>(&mut self, key: K) -> Result<(), StorageError> {
+        self.seek_nibble_slice(NibbleSlice::new(key.as_ref()), true).map(drop)
     }

     /// Returns the hash of the last node
     pub(crate) fn seek_nibble_slice(
         &mut self,
         mut key: NibbleSlice<'_>,
+        is_prefix_seek: bool,
     ) -> Result<CryptoHash, StorageError> {
         self.trail.clear();
         self.key_nibbles.clear();
+
+        // Checks if a key in an extension or leaf matches our search query.
+        //
+        // When doing prefix seek, this checks whether `key` is a prefix of
+        // `ext_key`.  When doing regular range seek, this checks whether `key`
+        // is no greater than `ext_key`.  If those conditions aren’t met, the
+        // node with `ext_key` should not match our query.
+        let check_ext_key = |key: &NibbleSlice, ext_key: &NibbleSlice| {
+            if is_prefix_seek {
+                ext_key.starts_with(key)
+            } else {
+                ext_key >= key
+            }
+        };
+
         let mut hash = self.trie.root;
+        let mut prev_prefix_boundary = &mut false;
         loop {
+            *prev_prefix_boundary = is_prefix_seek;
             self.descend_into_node(&hash)?;
-            let Crumb { status, node } = self.trail.last_mut().unwrap();
+            let Crumb { status, node, prefix_boundary } = self.trail.last_mut().unwrap();
+            prev_prefix_boundary = prefix_boundary;
             match &node.node {
                 TrieNode::Empty => break,
                 TrieNode::Leaf(leaf_key, _) => {
                     let existing_key = NibbleSlice::from_encoded(leaf_key).0;
-                    if existing_key < key {
+                    if !check_ext_key(&key, &existing_key) {
                         self.key_nibbles.extend(existing_key.iter());
                         *status = CrumbStatus::Exiting;
                     }
@@ -98,6 +127,7 @@ impl<'a> TrieIterator<'a> {
                             hash = *child.unwrap_hash();
                             key = key.mid(1);
                         } else {
+                            *prefix_boundary = is_prefix_seek;
                             break;
                         }
                     }
@@ -110,9 +139,9 @@ impl<'a> TrieIterator<'a> {
                         *status = CrumbStatus::At;
                         self.key_nibbles.extend(existing_key.iter());
                     } else {
-                        if existing_key < key {
+                        if !check_ext_key(&key, &existing_key) {
                             self.key_nibbles.extend(existing_key.iter());
                             *status = CrumbStatus::Exiting;
                         }
                         break;
                     }

When doing prefix seek, we need to check whether !existing_key.starts_with(key) rather than if existing_key < key.

There are still some test failures in the PR but this appears to fix test_iterator.

mina86 and others added 7 commits September 6, 2022 08:27
also adjusted a bit the internal API for sending routed messages.
The serde::Serialize and serde::Deserialize implementations for
the VMContext type are never used.  Get rid of them.  This makes
base58_format unused so get rid of it as well.
Instead of introducing a new dependency, we can generate request id
using generate_random_string function.  The id doesn’t really matter
anyway (apart from JSON RPC specification requiring it to be present)
so any string really will work there.
Just a tiny bit of refactoring separating configuration part of the
storage and code responsible for opening the storage into separate
modules.
The default Debug derive for types is quite verbose with a lot of
vertical spacing.  For example, TrieNode branches are printed as
a list with each empty branch as a None on its own line.  This
isn’t very ergonomic when trying to debug trie nodes and printing
a lot of the nodes to output.

Replace the default Debug implementation for TrieNode, NodeHandle and
ValueHandle by custom ones which are more concise and don’t waste
space on empty nodes.
This will be needed in future commit where we will need access to the
raw bytes of the node.  See near#7509.
@blasrodri blasrodri self-assigned this Sep 6, 2022
@mina86
Copy link

mina86 commented Sep 7, 2022

Could you update the PR to be onto master branch and remove proof-related changes from it? At this point it’s probably easiest to squash everything and clean it up:

git remote update
git checkout update-iterator-seek
git branch update-iterator-seek-backup
base=$(git merge-base @ upstream/master)
                      # ^^^^^^^^ if necessary, adjust ‘ustream’ to whatever the
                      # name of the near/nearcore remote is in the repository
git reset "$base"
git gui

With git gui (or git add -p if you prefer command line interface) you can pick only changes which relate to the seek_prefix feature. And once that’s done commit and force push:

git commit -m seek_prefix
git push origin +@:update-iterator-seek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.