Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove probe_loop in favor of Iterator #265

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

Swatinem
Copy link
Contributor

This tries to reduce compile times and slightly improve the code by avoiding theclosure-based probe_loop fn and related control flow types/matches. Instead this uses an Iterator with a special reload flag.

Its another attempt to slightly improve upon #203.
As a testcase, I used https://github.com/getsentry/symbolicator, and here is the difference in output of cargo llvm-lines -p symbolicator-service:

before:

  Lines                  Copies               Function name
  -----                  ------               -------------
  2460076                63753                (TOTAL)
    44782 (1.8%,  8.8%)    298 (0.5%,  2.3%)  moka::cht::map::bucket::BucketArray<K,V>::probe_loop
    15704 (0.6%, 21.2%)    151 (0.2%,  7.6%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if::{{closure}}
    11270 (0.5%, 28.7%)    151 (0.2%, 12.4%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if
     5580 (0.2%, 46.6%)     90 (0.1%, 24.6%)  moka::cht::map::bucket::BucketArray<K,V>::get
     5580 (0.2%, 46.9%)     90 (0.1%, 24.7%)  moka::cht::map::bucket::BucketArray<K,V>::get::{{closure}}

after:

  Lines                  Copies               Function name
  -----                  ------               -------------
  2370611                61032                (TOTAL)
    23652 (1.0%, 15.2%)    151 (0.2%,  4.8%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if
    10080 (0.4%, 30.3%)     90 (0.1%, 13.1%)  moka::cht::map::bucket::BucketArray<K,V>::get
     3360 (0.1%, 53.0%)     32 (0.1%, 26.2%)  <moka::cht::map::bucket::Probe<K,V> as core::iter::traits::iterator::Iterator>::next

I focused this on two of the functions that were touched. Its certainly possible that going by this measure alone does not paint an exact picture either. Maybe I am focusing on something that does not matter at all in the end.

This tries to reduce compile times and slightly improve the code by avoiding theclosure-based `probe_loop` fn and related control flow types/matches. Instead this uses an Iterator with a special `reload` flag.
@tatsuya6502
Copy link
Member

Hi. Thank you so much for working on this. I am thinking to accept this PR for v0.11.2 because of the following reasons:

  • It reduces the size of the generated codes.
  • It reduces compile time a bit (~5%?)
  • The new code would be easier to understand than the original.

As for runtime performance, this change made the benchmark result a bit worse. The new code is between ~5% to ~7% slower than the original. I think this is because the new code is preventing the compiler from inlining some function calls (so the size of the generated codes have reduced). It is a trade-off between the size of the generated codes and the performance.

I like the new code better because its clearer and easier to understand. So it will be a good trade-off. I also think that the current probing algorithm can be improved, which could improve the performance over the number we lose now. So I would rather keep the code simple and improve the probing algorithm later.

@tatsuya6502
Copy link
Member

Here are the benchmark results:

  • Ubuntu 22.04 x86_64
  • Intel Core i7-12700F
    • 8 × performance cores × 2 hyper threads
    • 4 × efficient cores

single-client

multi-clients

@tatsuya6502
Copy link
Member

All checks should pass after merging into the master branch (which has the upgraded MSRV 1.65).

https://github.com/moka-rs/moka/compare/probe-iter-2 (See that the last commit passed)

Copy link
Contributor Author

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very thorough analysis, I’m always surprised how much of a difference such small changes can make. But at the same time I’m also curious how stable those numbers are.

Comment on lines +404 to +405
// FIXME: this will panic if `len() == 0`
let this_bucket = (offset, &buckets[offset]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change I’m a bit uncertain about, maybe using an Option for this_bucket could avoid that, and also remove some panic code as well. If you mention you want to optimize the probing itself either way, it might not be worth worth doing this immediately, if indeed len will never be 0.

Copy link
Member

@tatsuya6502 tatsuya6502 Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the cht code and confirmed that len will never be 0. So this line will never cause a panic.

BucketArray can be created by default method or with_length function.

default uses 128 as len:

Self::with_length(0, BUCKET_ARRAY_DEFAULT_LENGTH)

with_length takes an arbitrary len in usize, but it's caller (below) will never call it with 0:

moka/src/cht/segment.rs

Lines 170 to 184 in e5690f8

if capacity == 0 {
unsafe {
ptr::write_bytes(segments.as_mut_ptr(), 0, actual_num_segments);
segments.set_len(actual_num_segments);
}
} else {
let actual_capacity = (capacity * 2 / actual_num_segments).next_power_of_two();
for _ in 0..actual_num_segments {
segments.push(Segment {
bucket_array: Atomic::new(BucketArray::with_length(0, actual_capacity)),
len: AtomicUsize::new(0),
});
}
}

capacity actual_num_segments actual_capacity (len)
0 any n/a (does not call with_length)
1 1 2
1 2 1
1 4 1
2 1 4

@tatsuya6502
Copy link
Member

tatsuya6502 commented Jun 6, 2023

But at the same time I’m also curious how stable those numbers are.

I am not sure what you mean by stable. If it is about the stability of the numbers between different runs of the same benchmark, then I would say those numbers are stable.

However, the numbers do not mean much with other workloads, especially ones with both high cache miss rate and large miss penalty (the time spend to calculate/get the value). For such workloads, latency differences in a cache operation will be negligible because the miss penalty is much larger than the latency difference.

I think we will need to talk about the direction of the optimization; smaller code size/shorter compile time vs. runtime performance. It seems many cases are trade offs between them. These benchmark numbers may be meaningful for some workloads (e.g., with very high hit rates) but not for others. So I think it is okay to give lower priority on runtime performance than code size/compile time.

(I will accept this PR anyway)

Copy link
Member

@tatsuya6502 tatsuya6502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Thank you for working on this!

@tatsuya6502 tatsuya6502 added this pull request to the merge queue Jun 7, 2023
Merged via the queue into moka-rs:master with commit 90fa847 Jun 7, 2023
@tatsuya6502 tatsuya6502 mentioned this pull request Jun 7, 2023
github-merge-queue bot pushed a commit that referenced this pull request Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants