-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[smart_table] refine bucket_table to smart_table #6339
Conversation
bf416f1
to
73b600c
Compare
73b600c
to
e8885cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not to be a pain, can we do this in two commits? a rename and a rewrite?
1784912
to
76a5044
Compare
76a5044
to
4c90298
Compare
}; | ||
split(&mut map, initial_buckets - 1); | ||
// The default number of initial buckets is 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for 2 specifically?
@@ -0,0 +1,358 @@ | |||
/// A smart table implementation based on linear hashing. (https://en.wikipedia.org/wiki/Linear_hashing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add documentation on how to create and use smart_table here? Specifically, I think a developer flow would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by a developer flow? I suppose users will use it as a normal table except some configs that they can customize with new_with_config()
which is commented above the new().
aptos-move/move-examples/data_structures/sources/smart_table.move
Outdated
Show resolved
Hide resolved
aptos-move/move-examples/data_structures/sources/smart_table.move
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more test coverage? There are only 3 tests here.
let bucket = table_with_length::borrow(&table.buckets, index); | ||
let i = 0; | ||
let len = vector::length(bucket); | ||
while (i < len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in other places. You should use vector::any: https://github.com/aptos-labs/aptos-core/blob/main/aptos-move/framework/move-stdlib/sources/vector.move#L221. Here and in other places. Might also be better if this is done in a separate function since this code seems to be used in many functions here.
assert!(table.level == 3, 0); | ||
let i = 0; | ||
while (i < 4) { | ||
split_one_bucket(&mut table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good test. It's calling a private function for splitting the buckets instead of using public functions to insert elements to trigger splitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I delete this one cuz it will be tested implicitly by others.
module aptos_std::smart_table { | ||
use std::error; | ||
use std::vector; | ||
use aptos_std::aptos_hash::sip_hash_from_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sip hash specifically? What are the considerations here (collision, gas cost, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good perf and prevention of hash flooding attack.
93a7dbe
to
441fac7
Compare
I already deleted one and replaced it with the other. Though they are three, but it covers almost all the cases since the edge cases are not as many as smart vectors. What else in you mind? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to approve this, but we don't document the algorithm for this bucket at all. It seems like we rotate around which bucket we split. I know you didn't really change anything with the underlying algorithm, but maybe we can do better in this PR to make it clearer?
aptos-move/move-examples/data_structures/sources/bucket_table.move
Outdated
Show resolved
Hide resolved
aptos-move/move-examples/data_structures/sources/bucket_table.move
Outdated
Show resolved
Hide resolved
let bucket = table_with_length::borrow_mut(&mut table.buckets, index); | ||
let i = 0; | ||
let len = vector::length(bucket); | ||
while (i < len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use vector::for_each_ref here. Can you check all while loops and see if you can replace with inline functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few high-level comments:
- Why not move smart_table in the framework? We want it there so that many new modules can start relying on it (e.g. multisig account)
- Can we consider adding iterable functionalities (as inline functions) such as map, for_each, etc.? These are expensive operations but there are definitely use cases for them and they'd be very useful especially when the table is small. We can easily support this with the buckets.
- Can you consider whether we can add better support for reading data from smart table via API/CLI/SDK?
let len = vector::length(bucket); | ||
while (i < len) { | ||
let entry = vector::borrow(bucket, i); | ||
if (&entry.key == &key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entry has a hash value and you know the hash value at this point as well as it's computed on line 225. So why not first check hash == entry.hash which seems cheaper than checking the key, because you expect n/2 key checks. Where n is the # of items in a bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
225 is not the hash. But I will separate it out.
} | ||
|
||
/// Returns true iff `table` contains an entry for `key`. | ||
public fun contains<K: drop, V>(table: &SmartTable<K, V>, key: K): bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm missing here is a good way to do a lookup if it's found or return false if not. A lot of the time you'd like to do things like "lookup if key exists otherwise insert a new value". With this API you seem to be looking up keys spuriously. I think something like a signature of
Pseudo-code:
Lookup(table: &SmartTable, key: K): Optional<&V>
would be most useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree with you. But here we're trying to make API consistent with Table
so ppl can use them interchangeably. Otherwise I would do the same as you suggested.
|
f475ce8
to
7abaf00
Compare
#[test] | ||
fun test_any() { | ||
let t = make(); | ||
let r = any(&t, |_k, v| *v >= 99); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line triggers:
thread 'test_data_structures' has overflowed its stack
fatal runtime error: stack overflow
Do you have any idea what's wrong with the implementation of any
?
@movekevin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_map_ref, test_any and test_all do not work for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must use compiler/cli build with --release
. Its not related to inline functions, they are just hitting existing problems in the functional programming design of the Move compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I delete test_all
for now cuz it always triggers a bug.
d4a4dc9
to
fe85039
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run gas benchmarking for smart table + vector?
buckets: TableWithLength<u64, vector<Entry<K, V>>>, | ||
num_buckets: u64, | ||
// number of bits to represent num_buckets | ||
level: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking - this doesn't allow more than 256 (2^8) buckets. What's the rationale behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is 2^{level <= 256}.
aptos-move/move-examples/data_structures/sources/smart_table.move
Outdated
Show resolved
Hide resolved
fe85039
to
4bae64c
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
Personally I researched again around linear hashing, spiral storage and extendible hashing schemes. And still admit the best option for us is linear hashing. So I made a similar change to
bucket_table
to intelligently set the configurations such asbucket_size and split_threshold.
Also, add two public functions to change these two values at any time as it does not have to be fixed after creation. Those methods give the users more flexibility to adjust their needs on the fly.
Test Plan
cargo test