Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup of 10 pull requests #37093

Closed
wants to merge 29 commits into from
Closed

Rollup of 10 pull requests #37093

wants to merge 29 commits into from

Conversation

SimonSapin and others added 29 commits September 26, 2016 18:15
test_dedup_shared has been exactly the same as test_dedup_unique since
6f16df4, three years ago.
These can be used to determine the type of the underlying IP
address
`opaque::Decoder::read_str` is very hot within `rustc` due to its use in
the reading of crate metadata, and it currently returns a `String`. This
commit changes it to instead return a `Cow<str>`, which avoids a heap
allocation.

This change reduces the number of calls to `malloc` by almost 10% in
some benchmarks.

This is a [breaking-change] to libserialize.
The show up separately in rustdoc.

This is a separate commit to keep the previous one’s diff shorter.
…chton

Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at rust-lang#21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```
Add two functions to check type of SockAddr

These can be used to determine the type of the underlying IP
address

r? @alexcrichton
Move IdxSetBuf and BitSlice to rustc_data_structures

Resolves a FIXME
…pat, r=nrc

Fix importing inaccessible `extern crate`s (with a warning)

Fixes rust-lang#36747, fixes rust-lang#37020, and fixes rust-lang#37021.
r? @nrc
…, r=bluss

Add comparison operators to boolean const eval.

I think it might be worth adding tests here, but since I don't know how or where to do that, I have not done so yet. Willing to do so if asked and given an explanation as to how.

Fixes rust-lang#37047.
Avoid allocations in `Decoder::read_str`.

`opaque::Decoder::read_str` is very hot within `rustc` due to its use in
the reading of crate metadata, and it currently returns a `String`. This
commit changes it to instead return a `Cow<str>`, which avoids a heap
allocation.

This change reduces the number of calls to `malloc` by almost 10% in
some benchmarks.

This is a [breaking-change] to libserialize.
Error monitor should emit error to stderr instead of stdout

We are pretty consistent about emitting to stderr, except for when there is actually an error, in which case we emit to stdout. This seems a bit backwards. This PR just changes that exception to emit to stderr. This is useful for the RLS since the LS protocol uses stdout (grrr).

r? @alexcrichton
…excrichton

macros: expand `#[derive]`s after other attribute macros and improve intra-`#[derive]` ordering

Fixes serde-rs/serde#577.
cc rust-lang#35900
r? @alexcrichton
@rust-highfive
Copy link
Collaborator

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@sophiajt
Copy link
Contributor Author

@bors r+ p=1

@bors
Copy link
Contributor

bors commented Oct 11, 2016

📌 Commit 3cf07a8 has been approved by jonathandturner

@bors
Copy link
Contributor

bors commented Oct 12, 2016

⌛ Testing commit 3cf07a8 with merge 278f9ff...

@alexcrichton
Copy link
Member

@bors: retry force clean

@bors
Copy link
Contributor

bors commented Oct 12, 2016

⌛ Testing commit 3cf07a8 with merge ccb8b3e...

bors added a commit that referenced this pull request Oct 12, 2016
Rollup of 10 pull requests

- Successful merges: #36692, #36743, #36762, #36991, #37023, #37050, #37056, #37064, #37066, #37067
- Failed merges:
@bors
Copy link
Contributor

bors commented Oct 12, 2016

💔 Test failed - auto-linux-64-nopt-t

@Centril Centril added the rollup A PR which is a rollup label Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rollup A PR which is a rollup
Projects
None yet
Development

Successfully merging this pull request may close these issues.