-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add complexity estimation of iterating over HashSet and HashMap #97215
Add complexity estimation of iterating over HashSet and HashMap #97215
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @thomcc (rust-highfive has picked a reviewer for you, use r? to override) |
I also tested what happens if limit amount of iteration by length of hash set and discovered that this slightly improve performance because hashbrown still searches items in empty buckets even after yielding all elements already. I don't think that this should be mentioned because
Benchmarks code
use std::collections::HashSet;
use criterion::{black_box, criterion_group, criterion_main, BatchSize, BenchmarkId, Criterion};
use rand::{Rng, SeedableRng};
use rand_chacha::ChaCha20Rng;
fn get_random_ints(amount: usize) -> Vec<usize> {
let mut v = vec![0; amount];
ChaCha20Rng::seed_from_u64(500).fill(v.as_mut_slice());
v
}
pub fn bench_iterations(c: &mut Criterion) {
let random_ints = get_random_ints(50);
let make_hash_set = |capacity: usize| -> HashSet<usize> {
let mut set = HashSet::with_capacity(capacity);
set.extend(random_ints.iter());
set
};
let mut group = c.benchmark_group("HashTableIteration");
let caps = [64, 256, 1024, 4096, 16384];
for capacity in caps {
group.bench_with_input(
BenchmarkId::new("HashTableIterationFull", capacity),
&capacity,
|b, &capacity| {
b.iter_batched(
|| make_hash_set(capacity),
|set| {
let sum: usize = set.iter().copied().sum();
(black_box(set), black_box(sum))
},
BatchSize::LargeInput,
)
},
);
}
for capacity in caps {
group.bench_with_input(
BenchmarkId::new("HashTableIterationLimited", capacity),
&capacity,
|b, &capacity| {
b.iter_batched(
|| make_hash_set(capacity),
|set| {
let sum: usize = set.iter().copied().take(set.len()).sum();
(black_box(set), black_box(sum))
},
BatchSize::LargeInput,
)
},
);
}
}
criterion_group!(benches, bench_iterations);
criterion_main!(benches); Criterion output
I don't think that there is any reason to include this benchmarks to our codebase because they don't have big value themselves. |
It is not obvious (at least for me) that complexity of iteration over hash tables depends on capacity and not length. Especially comparing with other containers like Vec or String. I think, this behaviour is worth mentioning. I run benchmark which tests iteration time for maps with length 50 and different capacities and get this results: ``` capacity - time 64 - 203.87 ns 256 - 351.78 ns 1024 - 607.87 ns 4096 - 965.82 ns 16384 - 3.1188 us ``` If you want to dig why it behaves such way, you can look current implementation in [hashbrown code](https://github.com/rust-lang/hashbrown/blob/f3a9f211d06f78c5beb81ac22ea08fdc269e068f/src/raw/mod.rs#L1933). Benchmarks code would be presented in PR related to this commit.
@bors r+ rollup |
📌 Commit de97d73 has been approved by |
…askrgr Rollup of 7 pull requests Successful merges: - rust-lang#97109 (Fix misleading `cannot infer type for type parameter` error) - rust-lang#97187 (Reverse condition in Vec::retain_mut doctest) - rust-lang#97201 (Fix typo) - rust-lang#97203 (Minor tweaks to rustc book summary formatting.) - rust-lang#97208 (Do not emit the lint `unused_attributes` for *inherent* `#[doc(hidden)]` associated items) - rust-lang#97215 (Add complexity estimation of iterating over HashSet and HashMap) - rust-lang#97220 (Add regression test for#81827) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Add shortcircuit in iteration if we yielded all elements Current implementation works little slower than `set.iter().take(set.len())`. See my comment [here](rust-lang/rust#97215 (comment)). So why not avoid extra integer which added by `Iterator::take` if we can add limiting logic into our iterator itself? I don't really know how this change affects [reflect_toogle_full](https://github.com/rust-lang/hashbrown/blob/f3a9f211d06f78c5beb81ac22ea08fdc269e068f/src/raw/mod.rs#L2019) and implementation of [FusedIterator](https://github.com/rust-lang/hashbrown/blob/f3a9f211d06f78c5beb81ac22ea08fdc269e068f/src/raw/mod.rs#L2150). Maybe I should make inner iterator "jump" to the end of its memory block?
It is not obvious (at least for me) that complexity of iteration over hash tables depends on capacity and not length. Especially comparing with other containers like Vec or String. I think, this behaviour is worth mentioning.
I run benchmark which tests iteration time for maps with length 50 and different capacities and get this results:
If you want to dig why it behaves such way, you can look current implementation in hashbrown code.
Benchmarks code would be presented in PR related to this commit.