Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[smart_table] refine bucket_table to smart_table #6339

Merged
merged 3 commits into from
Feb 27, 2023
Merged

Conversation

lightmark
Copy link
Contributor

Description

Personally I researched again around linear hashing, spiral storage and extendible hashing schemes. And still admit the best option for us is linear hashing. So I made a similar change to bucket_table to intelligently set the configurations such as
bucket_size and split_threshold.

Also, add two public functions to change these two values at any time as it does not have to be fixed after creation. Those methods give the users more flexibility to adjust their needs on the fly.

Test Plan

cargo test

Copy link
Contributor

@davidiw davidiw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to be a pain, can we do this in two commits? a rename and a rewrite?

@lightmark lightmark force-pushed the smart_table branch 2 times, most recently from 1784912 to 76a5044 Compare February 1, 2023 09:28
};
split(&mut map, initial_buckets - 1);
// The default number of initial buckets is 2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for 2 specifically?

@@ -0,0 +1,358 @@
/// A smart table implementation based on linear hashing. (https://en.wikipedia.org/wiki/Linear_hashing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documentation on how to create and use smart_table here? Specifically, I think a developer flow would be great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by a developer flow? I suppose users will use it as a normal table except some configs that they can customize with new_with_config() which is commented above the new().

Copy link
Contributor

@movekevin movekevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more test coverage? There are only 3 tests here.

let bucket = table_with_length::borrow(&table.buckets, index);
let i = 0;
let len = vector::length(bucket);
while (i < len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places. You should use vector::any: https://github.com/aptos-labs/aptos-core/blob/main/aptos-move/framework/move-stdlib/sources/vector.move#L221. Here and in other places. Might also be better if this is done in a separate function since this code seems to be used in many functions here.

assert!(table.level == 3, 0);
let i = 0;
while (i < 4) {
split_one_bucket(&mut table);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good test. It's calling a private function for splitting the buckets instead of using public functions to insert elements to trigger splitting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I delete this one cuz it will be tested implicitly by others.

module aptos_std::smart_table {
use std::error;
use std::vector;
use aptos_std::aptos_hash::sip_hash_from_value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sip hash specifically? What are the considerations here (collision, gas cost, etc.)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good perf and prevention of hash flooding attack.

@lightmark
Copy link
Contributor Author

Can you add more test coverage? There are only 3 tests here.

I already deleted one and replaced it with the other. Though they are three, but it covers almost all the cases since the edge cases are not as many as smart vectors. What else in you mind?

Copy link
Contributor

@davidiw davidiw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to approve this, but we don't document the algorithm for this bucket at all. It seems like we rotate around which bucket we split. I know you didn't really change anything with the underlying algorithm, but maybe we can do better in this PR to make it clearer?

let bucket = table_with_length::borrow_mut(&mut table.buckets, index);
let i = 0;
let len = vector::length(bucket);
while (i < len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use vector::for_each_ref here. Can you check all while loops and see if you can replace with inline functions?

Copy link
Contributor

@movekevin movekevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few high-level comments:

  1. Why not move smart_table in the framework? We want it there so that many new modules can start relying on it (e.g. multisig account)
  2. Can we consider adding iterable functionalities (as inline functions) such as map, for_each, etc.? These are expensive operations but there are definitely use cases for them and they'd be very useful especially when the table is small. We can easily support this with the buckets.
  3. Can you consider whether we can add better support for reading data from smart table via API/CLI/SDK?

let len = vector::length(bucket);
while (i < len) {
let entry = vector::borrow(bucket, i);
if (&entry.key == &key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entry has a hash value and you know the hash value at this point as well as it's computed on line 225. So why not first check hash == entry.hash which seems cheaper than checking the key, because you expect n/2 key checks. Where n is the # of items in a bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

225 is not the hash. But I will separate it out.

}

/// Returns true iff `table` contains an entry for `key`.
public fun contains<K: drop, V>(table: &SmartTable<K, V>, key: K): bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm missing here is a good way to do a lookup if it's found or return false if not. A lot of the time you'd like to do things like "lookup if key exists otherwise insert a new value". With this API you seem to be looking up keys spuriously. I think something like a signature of

Pseudo-code:
Lookup(table: &SmartTable, key: K): Optional<&V>

would be most useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with you. But here we're trying to make API consistent with Table so ppl can use them interchangeably. Otherwise I would do the same as you suggested.

@lightmark
Copy link
Contributor Author

lightmark commented Feb 13, 2023

A few high-level comments:

  1. Why not move smart_table in the framework? We want it there so that many new modules can start relying on it (e.g. multisig account)
  2. Can we consider adding iterable functionalities (as inline functions) such as map, for_each, etc.? These are expensive operations but there are definitely use cases for them and they'd be very useful especially when the table is small. We can easily support this with the buckets.
  3. Can you consider whether we can add better support for reading data from smart table via API/CLI/SDK?
  1. Would this be another PR after AIP is approved?
  2. Sure.
  3. That may needs a lot of support from SDK and CLI.

@lightmark lightmark force-pushed the smart_table branch 2 times, most recently from f475ce8 to 7abaf00 Compare February 14, 2023 18:46
#[test]
fun test_any() {
let t = make();
let r = any(&t, |_k, v| *v >= 99);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line triggers:

thread 'test_data_structures' has overflowed its stack
fatal runtime error: stack overflow

Do you have any idea what's wrong with the implementation of any?
@movekevin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_map_ref, test_any and test_all do not work for the same reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must use compiler/cli build with --release. Its not related to inline functions, they are just hitting existing problems in the functional programming design of the Move compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I delete test_all for now cuz it always triggers a bug.

@lightmark lightmark force-pushed the smart_table branch 2 times, most recently from d4a4dc9 to fe85039 Compare February 14, 2023 22:34
Copy link
Contributor

@movekevin movekevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run gas benchmarking for smart table + vector?

buckets: TableWithLength<u64, vector<Entry<K, V>>>,
num_buckets: u64,
// number of bits to represent num_buckets
level: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking - this doesn't allow more than 256 (2^8) buckets. What's the rationale behind this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is 2^{level <= 256}.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on 4bae64cd32f71fb3018f275809f33ca57689ad9e

performance benchmark with full nodes : 6191 TPS, 6415 ms latency, 11600 ms p99 latency,(!) expired 100 out of 2643700 txns
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 4bae64cd32f71fb3018f275809f33ca57689ad9e

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 4bae64cd32f71fb3018f275809f33ca57689ad9e (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7771 TPS, 4904 ms latency, 6400 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 4bae64cd32f71fb3018f275809f33ca57689ad9e
compatibility::simple-validator-upgrade::single-validator-upgrade : 4945 TPS, 8028 ms latency, 10400 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 4bae64cd32f71fb3018f275809f33ca57689ad9e
compatibility::simple-validator-upgrade::half-validator-upgrade : 4659 TPS, 8599 ms latency, 10900 ms p99 latency,no expired txns
4. upgrading second batch to new version: 4bae64cd32f71fb3018f275809f33ca57689ad9e
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7128 TPS, 5400 ms latency, 8500 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 4bae64cd32f71fb3018f275809f33ca57689ad9e passed
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants