Simplify HashMap Bucket interface #40561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

bors merged 1 commit into rust-lang:master from arthurprs:hm-adapt2

Apr 6, 2017

Contributor

arthurprs commented Mar 15, 2017 •

edited

Loading

Simplify HashMap Bucket interface

Store capacity_mask instead of capacity

Move bucket index into RawBucket

Valid bucket index is now always within [0..table_capacity)

Simplify iterators by moving logic into RawBuckets

Clone RawTable using RawBucket

Make retain aware of the number of elements

The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.

This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.

➜  hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt 
 name                        pre:: ns/iter  adp:: ns/iter  diff ns/iter   diff % 
 clone_10_000                74,364         39,736              -34,628  -46.57% 
 grow_100_000                8,343,553      8,233,785          -109,768   -1.32% 
 grow_10_000                 817,825        723,958             -93,867  -11.48% 
 grow_big_value_100_000      18,418,979     17,906,186         -512,793   -2.78% 
 grow_big_value_10_000       1,219,242      1,103,334          -115,908   -9.51% 
 insert_1000                 74,546         58,343              -16,203  -21.74% 
 insert_100_000              6,743,770      6,238,017          -505,753   -7.50% 
 insert_10_000               798,079        719,123             -78,956   -9.89% 
 insert_1_000_000            275,215,605    266,975,875      -8,239,730   -2.99% 
 insert_int_bigvalue_10_000  1,517,387      1,419,838           -97,549   -6.43% 
 insert_str_10_000           316,179        278,896             -37,283  -11.79% 
 insert_string_10_000        770,927        747,449             -23,478   -3.05% 
 iter_keys_100_000           386,099        333,104             -52,995  -13.73% 
 iterate_100_000             387,320        355,707             -31,613   -8.16% 
 lookup_100_000              206,757        193,063             -13,694   -6.62% 
 lookup_100_000_unif         219,366        193,180             -26,186  -11.94% 
 lookup_1_000_000            206,456        205,716                -740   -0.36% 
 lookup_1_000_000_unif       659,934        629,659             -30,275   -4.59% 
 lru_sim                     20,194,334     18,442,149       -1,752,185   -8.68% 
 merge_shuffle               1,168,044      1,063,055          -104,989   -8.99%

Note 2: I may have messed up porting the diff, let's see what CI says.

rust-highfive assigned alexcrichton

Contributor

rust-highfive commented Mar 15, 2017

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

gereeter reviewed

View reviewed changes

src/libstd/collections/hash/table.rs Outdated

    
            @@ -126,9 +126,10 @@ unsafe impl<K: Send, V: Send> Send for RawTable<K, V> {}
          
              unsafe impl<K: Sync, V: Sync> Sync for RawTable<K, V> {}

              pub struct RawBucket<K, V> {

                  hash: *mut HashUint,

                  _hash: *mut HashUint,

Contributor

gereeter Mar 15, 2017

Why was this renamed? If it was to disambiguate with the method, I think hash_ would be more appropriate than _hash, since the standard practice is to use prefix _ to denote "unused."

Contributor Author

arthurprs Mar 16, 2017 •

edited

Loading

It's for disambiguation. I think we can come up with a better name instead, moving the _ to the end hurts the eye. Like hash_start, hash_base, first_hash?

src/libstd/collections/hash/table.rs Outdated

    
                  fn first_bucket_raw(&self) -> RawBucket<K, V> {

                      let hashes_size = self.capacity * size_of::<HashUint>();

                      let pairs_size = self.capacity * size_of::<(K, V)>();

                  fn raw_bucket(&self) -> RawBucket<K, V> {

Contributor

gereeter Mar 15, 2017

This seems like a worse name, since it isn't clear which raw bucket is returned. Maybe first_raw_bucket or raw_bucket_first or stick with the original first_bucket_raw (presumably paired with bucket_at_raw).

Contributor Author

arthurprs Mar 16, 2017

I'll switch callers to raw_bucket_at(0).

src/libstd/collections/hash/table.rs

    
                          pair: self.pair.offset(count),

                          _marker: marker::PhantomData,

                      }

                  unsafe fn hash(&self) -> *mut HashUint {

Contributor

gereeter Mar 15, 2017

Could you add a comment noting that these functions are unsafe because you can safely create out-of-bounds buckets?

src/libstd/collections/hash/table.rs Outdated

    
                  // We use *const to ensure covariance with respect to K and V

                  pair: *const (K, V),

                  _pair: *const (K, V),

                  idx: usize,

Contributor

gereeter Mar 15, 2017

If this is always within [0..table_capacity), could you add a comment mentioning that.

src/libstd/collections/hash/table.rs Outdated

    
                      RevMoveBuckets {

                          raw: raw_bucket.offset(self.capacity as isize),

                          hashes_end: raw_bucket.hash,

                          raw: self.raw_bucket_at(self.capacity()),

Contributor

gereeter Mar 15, 2017

You said in your description that bucket indices are in [0..table_capacity), but this is one past that. Did you mean [0..table_capacity]?

Contributor Author

arthurprs Mar 16, 2017

This is intentionally set to one past the end for the reverse style iterator. I will add a comment line on that.

src/libstd/collections/hash/table.rs Outdated

    
                          marker: marker::PhantomData,

                      }

                  }

              }

              impl<'a, K, V> Iterator for RawBuckets<'a, K, V> {

                  type Item = RawBucket<K, V>;

                  type Item = (*mut HashUint, *mut (K, V));

Contributor

gereeter Mar 15, 2017

It seems odd to have an iterator named RawBuckets that doesn't return RawBuckets.

Contributor Author

arthurprs Mar 16, 2017

Good point, hopefully that won't affect performance negatively.

src/libstd/collections/hash/table.rs Outdated

    
                      if self.elems_left == 0 {

                          return None;

                      }

                      loop {

Contributor

gereeter Mar 16, 2017

Nit: could there be a blank line before this loop?

src/libstd/collections/hash/table.rs Outdated

    
                      (self.elems_left, Some(self.elems_left))

                  }

              }

              impl<'a, K, V> ExactSizeIterator for RawBuckets<'a, K, V> {

Contributor

gereeter Mar 16, 2017

Nit: could there be a blank line before this impl.

Member

alexcrichton commented Mar 16, 2017

@bors: delegate=gereeter

Collaborator

bors commented Mar 16, 2017

✌️ @gereeter can now approve this pull request

Contributor Author

arthurprs commented Mar 16, 2017

Thanks for the review @gereeter. I'll updated the PR later.

arthurprs force-pushed the hm-adapt2 branch from bc0ce25 to 97340c9 Compare

March 16, 2017 14:11

Collaborator

bors commented Mar 19, 2017

☔ The latest upstream changes (presumably #40538) made this pull request unmergeable. Please resolve the merge conflicts.

arthurprs force-pushed the hm-adapt2 branch from 97340c9 to 970bf5a Compare

March 19, 2017 22:26

Contributor Author

arthurprs commented Mar 19, 2017

Rebased.

arthurprs force-pushed the hm-adapt2 branch from 970bf5a to 3d8b25d Compare

March 20, 2017 12:35

Member

alexcrichton commented Mar 20, 2017

cc @bluss, @pczarn

Collaborator

bors commented Mar 25, 2017

☔ The latest upstream changes (presumably #40806) made this pull request unmergeable. Please resolve the merge conflicts.

arthurprs force-pushed the hm-adapt2 branch 2 times, most recently from e190879 to 39802b2 Compare

March 25, 2017 22:13

Contributor Author

arthurprs commented Apr 4, 2017 •

edited

Loading

(gentle ping)

pczarn reviewed

View reviewed changes

src/libstd/collections/hash/map.rs Outdated

    
            @@ -503,7 +498,7 @@ fn robin_hood<'a, K: 'a, V: 'a>(bucket: FullBucketMut<'a, K, V>,
          
                      loop {

                          displacement += 1;

                          let probe = bucket.next();

                          debug_assert!(probe.index() != idx_end);

                          debug_assert!(probe.index() != start_index);

Contributor

pczarn Apr 4, 2017

Is this invariant not as strict as the old one? Either way it's ok. You could add a comment that this assertion is simplified.

Contributor Author

arthurprs Apr 4, 2017 •

edited

Loading

It's not. I'll updat the code to do the same check as before.

pczarn approved these changes

View reviewed changes

Contributor

pczarn left a comment •

edited

Loading

Excellent. ~~Please make a rebase~~

arthurprs force-pushed the hm-adapt2 branch 2 times, most recently from 3451af6 to fdb88fe Compare

April 4, 2017 17:37

Contributor Author

arthurprs commented Apr 4, 2017

Updated with @pczarn review.


          Simplify HashMap Bucket interface

f07ebd6

* Store capacity_mask instead of capacity
* Move bucket index into RawBucket
* Bucket index is now always within [0..table_capacity)
* Clone RawTable using RawBucket
* Simplify iterators by moving logic into RawBuckets
* Make retain aware of the number of elements

arthurprs force-pushed the hm-adapt2 branch from fdb88fe to f07ebd6 Compare

April 4, 2017 20:43

Member

alexcrichton commented Apr 5, 2017

@bors: r=pczarn

Collaborator

bors commented Apr 5, 2017

📌 Commit f07ebd6 has been approved by pczarn

Member

alexcrichton commented Apr 5, 2017

@bors: delegate=pczarn

Collaborator

bors commented Apr 5, 2017

✌️ @pczarn can now approve this pull request

Collaborator

bors commented Apr 5, 2017

⌛ Testing commit f07ebd6 with merge b181396...

Collaborator

bors commented Apr 5, 2017

💔 Test failed - status-appveyor

Member

alexcrichton commented Apr 5, 2017 via email

@bors: retry * network failure #40474

On Wed, Apr 5, 2017 at 2:08 AM, bors ***@***.***> wrote: 💔 Test failed - status-appveyor <https://ci.appveyor.com/project/rust-lang/rust/build/1.0.2769> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#40561 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD95G7JvqimISUpYtdiDu2ak4yAt2-Dks5rs0v6gaJpZM4Menm5> .

frewsxcv added a commit to frewsxcv/rust that referenced this pull request


          Rollup merge of rust-lang#40561 - arthurprs:hm-adapt2, r=pczarn

c7bdcce

Simplify HashMap Bucket interface

> Simplify HashMap Bucket interface
>
> * Store capacity_mask instead of capacity
> * Move bucket index into RawBucket
> * Valid bucket index is now always within [0..table_capacity)
> * Simplify iterators by moving logic into RawBuckets
> * Clone RawTable using RawBucket
> * Make retain aware of the number of elements

The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.

This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.

```
➜  hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt
 name                        pre:: ns/iter  adp:: ns/iter  diff ns/iter   diff %
 clone_10_000                74,364         39,736              -34,628  -46.57%
 grow_100_000                8,343,553      8,233,785          -109,768   -1.32%
 grow_10_000                 817,825        723,958             -93,867  -11.48%
 grow_big_value_100_000      18,418,979     17,906,186         -512,793   -2.78%
 grow_big_value_10_000       1,219,242      1,103,334          -115,908   -9.51%
 insert_1000                 74,546         58,343              -16,203  -21.74%
 insert_100_000              6,743,770      6,238,017          -505,753   -7.50%
 insert_10_000               798,079        719,123             -78,956   -9.89%
 insert_1_000_000            275,215,605    266,975,875      -8,239,730   -2.99%
 insert_int_bigvalue_10_000  1,517,387      1,419,838           -97,549   -6.43%
 insert_str_10_000           316,179        278,896             -37,283  -11.79%
 insert_string_10_000        770,927        747,449             -23,478   -3.05%
 iter_keys_100_000           386,099        333,104             -52,995  -13.73%
 iterate_100_000             387,320        355,707             -31,613   -8.16%
 lookup_100_000              206,757        193,063             -13,694   -6.62%
 lookup_100_000_unif         219,366        193,180             -26,186  -11.94%
 lookup_1_000_000            206,456        205,716                -740   -0.36%
 lookup_1_000_000_unif       659,934        629,659             -30,275   -4.59%
 lru_sim                     20,194,334     18,442,149       -1,752,185   -8.68%
 merge_shuffle               1,168,044      1,063,055          -104,989   -8.99%
```

Note 2: I may have messed up porting the diff, let's see what CI says.

frewsxcv mentioned this pull request

Rollup of 9 pull requests #41093

Closed

arielb1 mentioned this pull request

Rollup of 12 pull requests #41094

Closed

arielb1 pushed a commit to arielb1/rust that referenced this pull request


          Rollup merge of rust-lang#40561 - arthurprs:hm-adapt2, r=pczarn

327b9be

Simplify HashMap Bucket interface

> Simplify HashMap Bucket interface
>
> * Store capacity_mask instead of capacity
> * Move bucket index into RawBucket
> * Valid bucket index is now always within [0..table_capacity)
> * Simplify iterators by moving logic into RawBuckets
> * Clone RawTable using RawBucket
> * Make retain aware of the number of elements

The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.

This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.

```
➜  hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt
 name                        pre:: ns/iter  adp:: ns/iter  diff ns/iter   diff %
 clone_10_000                74,364         39,736              -34,628  -46.57%
 grow_100_000                8,343,553      8,233,785          -109,768   -1.32%
 grow_10_000                 817,825        723,958             -93,867  -11.48%
 grow_big_value_100_000      18,418,979     17,906,186         -512,793   -2.78%
 grow_big_value_10_000       1,219,242      1,103,334          -115,908   -9.51%
 insert_1000                 74,546         58,343              -16,203  -21.74%
 insert_100_000              6,743,770      6,238,017          -505,753   -7.50%
 insert_10_000               798,079        719,123             -78,956   -9.89%
 insert_1_000_000            275,215,605    266,975,875      -8,239,730   -2.99%
 insert_int_bigvalue_10_000  1,517,387      1,419,838           -97,549   -6.43%
 insert_str_10_000           316,179        278,896             -37,283  -11.79%
 insert_string_10_000        770,927        747,449             -23,478   -3.05%
 iter_keys_100_000           386,099        333,104             -52,995  -13.73%
 iterate_100_000             387,320        355,707             -31,613   -8.16%
 lookup_100_000              206,757        193,063             -13,694   -6.62%
 lookup_100_000_unif         219,366        193,180             -26,186  -11.94%
 lookup_1_000_000            206,456        205,716                -740   -0.36%
 lookup_1_000_000_unif       659,934        629,659             -30,275   -4.59%
 lru_sim                     20,194,334     18,442,149       -1,752,185   -8.68%
 merge_shuffle               1,168,044      1,063,055          -104,989   -8.99%
```

Note 2: I may have messed up porting the diff, let's see what CI says.

arielb1 mentioned this pull request

Rollup of 12 pull requests #41098

Merged

Collaborator

bors commented Apr 5, 2017

⌛ Testing commit f07ebd6 with merge 1e10204...

Contributor

arielb1 commented Apr 5, 2017

@bors retry - prioritizing rollup

bors added a commit that referenced this pull request


          Auto merge of #41098 - arielb1:rollup, r=arielb1

6cd15a0

Rollup of 12 pull requests

- Successful merges: #40479, #40561, #40709, #40815, #40909, #40927, #40943, #41015, #41028, #41052, #41054, #41065
- Failed merges:

bors merged commit f07ebd6 into rust-lang:master

arthurprs mentioned this pull request

Decrease HashMap Load Factor #38003

Closed

arthurprs mentioned this pull request

*: use FnvHashMap again tikv/tikv#2211

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet