-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize the is_permutation family and _Hash::operator== for multicontainers #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the is_permutation family and _Hash::operator== for multicontainers #423
Conversation
…taniers slightly.
<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.
<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
* For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
* Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
* We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.
Performance results:
```
Benchmark Before (ns) After (ns) Percent Better
HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83%
HashRandomUnequal<unordered_multimap>/10 137 97 41.24%
HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98%
HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61%
HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58%
HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47%
HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51%
HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17%
HashRandomUnequal<hash_multimap>/10 138 101 36.63%
HashRandomUnequal<hash_multimap>/100 1613 1154 39.77%
HashRandomUnequal<hash_multimap>/512 10385 7178 44.68%
HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96%
HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97%
HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12%
HashRandomEqual<unordered_multimap>/1 16 9.4 70.21%
HashRandomEqual<unordered_multimap>/10 126 89.2 41.26%
HashRandomEqual<unordered_multimap>/100 1644 1133 45.10%
HashRandomEqual<unordered_multimap>/512 10532 7183 46.62%
HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45%
HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30%
HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65%
HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51%
HashRandomEqual<hash_multimap>/10 123 94.1 30.71%
HashRandomEqual<hash_multimap>/100 1645 1151 42.92%
HashRandomEqual<hash_multimap>/512 10177 7144 42.46%
HashRandomEqual<hash_multimap>/4096 172994 121381 42.52%
HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85%
HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11%
HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14%
HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98%
HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50%
HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10%
HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46%
HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49%
HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27%
HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59%
HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33%
HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10%
HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02%
HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53%
HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94%
HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84%
```
Benchmark code:
```
#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>
using namespace std;
template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
std::minstd_rand rng(std::random_device{}());
const auto range0 = static_cast<ptrdiff_t>(state.range(0));
vector<pair<unsigned, unsigned>> testData;
testData.resize(range0 * 5);
const auto dataEnd = testData.begin() + range0;
std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
testData.clear();
std::unordered_multimap<unsigned, unsigned> b = a;
next(b.begin(), b.size() - 1)->second = 1u;
for (auto &&_ : state) {
(void)_;
assert(a != b);
}
}
BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
std::minstd_rand rng(std::random_device{}());
const auto range0 = static_cast<ptrdiff_t>(state.range(0));
vector<pair<unsigned, unsigned>> testData;
testData.resize(range0 * 5);
const auto dataEnd = testData.begin() + range0;
std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
testData.clear();
std::unordered_multimap<unsigned, unsigned> b = a;
for (auto &&_ : state) {
(void)_;
assert(a == b);
}
}
BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
std::unordered_multimap<unsigned, unsigned> a;
std::unordered_multimap<unsigned, unsigned> b;
const auto range0 = static_cast<ptrdiff_t>(state.range(0));
for (ptrdiff_t idx = 0; idx < range0; ++idx) {
a.emplace(0, 1);
b.emplace(1, 0);
}
a.emplace(1, 0);
b.emplace(0, 1);
for (auto &&_ : state) {
(void)_;
assert(a != b);
}
}
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_MAIN();
|
I am specifically interested in more perf tests folks think I should run. |
This comment has been minimized.
This comment has been minimized.
Fixed, feel free to just edit stuff like that :P |
I started to, but was concerned you might have a dual internal PR that needs to be corrected also. |
There are no test changes needed for this so I'm going to let it be approved here before making a dual one |
Resolves microsoftGH-6, microsoftGH-38, and drive-by fixes microsoftGH-414. Constexprizes the following algorithms and enables all relevant libc++ tests: * adjacent_find * all_of * any_of * binary_search * copy * copy_backward * copy_if * copy_n * count * count_if * equal * equal_range * exchange * fill * fill_n * find * find_end * find_first_of * find_if * find_if_not * for_each * for_each_n * generate * generate_n * includes * is_heap * is_heap * is_heap_until * is_partitioned * is_permutation * is_sorted * is_sorted_until * iter_swap * lexicographical_compare * lower_bound * make_heap * merge * mismatch * move * move_backward * next_permutation * none_of * nth_element * partial_sort * partial_sort_copy * partition * partition_copy * partition_point * pop_heap * prev_permutation * push_heap * remove * remove_copy * remove_copy_if * remove_if * replace * replace_copy * replace_copy_if * replace_if * reverse_copy * revese * rotate * rotate_copy * search * search_n * set_difference * set_intersection * set_symmetric_difference * set_union * sort * sort_heap * swap * swap_ranges * transform * unique * unique_copy * upper_bound This commit also contains the contents of microsoft#423 to workaround DevCom-883631, it will be rebased and that content removed once that is merged (the other one needs to be merged first). Specific notes: `<xutility>`: * The `_Ptr_copy_cat` family are moved down next to std::copy as that is their first consumer. * The core language loop for `copy_n` is fairly long and so it was extracted into its own function, `_Copy_n_core` (note similar name schema to `_Copy_memmove`) * `reverse` was changed to use early-returns for its optimization passes; this removes the nice thing of having if constexpr not instantiate the loop. However, this form allows the loop to not be duplicated.
| // check the first elements for equivalence when !_Standard | ||
| if (_Right._Traitsobj(_Keyval, _Traits::_Kfn(*_First2))) { | ||
| return {}; | ||
| } | ||
|
|
||
| // trim matching prefixes | ||
| const size_t _LHashval = _Traitsobj(_Keyval); | ||
| const size_type _LBucket = _LHashval & _Mask; | ||
| const auto _LBucket_hi = _Vec._Mypair._Myval2._Myfirst[(_LBucket << 1) + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had sufficient coffee this time this is the main difference between the two if constexpr branches.
I you would store the pointer after _LBucket_hi you could also write
++_First1;
const bool _Left_range_end =
_First1 == _LBucket_hi_next || _Traitsobj(_Keyval, _Traits::_Kfn(*_First1));With that the only difference between the two loops (besides comments) is _LBucket_hi_next vs _Unchecked_end(). Given that you use a temporary variable anyway you could assign _Unchecked_end() to _LBucket_hi_next in the _Traits::_Standard case
Dunno if that is truly worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will do one extra pointer dereference per bucket (to find _LBucket_hi_next); but maybe we don't care about the perf of !_Standard (that is, hash_foo rather than unordered_foo) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, applied.
CaseyCarter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My non-resolved comments are suggestions, so I "Approve with suggestions."
Is there anything I missed other than the |
No - I think I was confused by clicking the through the comments and observing old state which I thought was current state. |
| #endif // _HAS_CXX17 | ||
|
|
||
| // FUNCTION TEMPLATE _Count_pr | ||
| // FUNCTION TEMPLATE is_permutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, #279 tracks a possible improvement to is_permutation, although it isn't necessary to implement at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that looks interesting but would prefer that be a separate change (as it'll need its own perf tests and similar justification)
stl/inc/xhash
Outdated
| } | ||
|
|
||
| struct _Multi_equal_check_result { | ||
| bool _Equal_possible; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mildly uncomfortable with the lack of DMIs here; return {} initializes this bool to false but the risk of garbage-init is higher than I would like. Would it be better to add DMIs for {false} and {} for the iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests all the functions returning this type were inlined so I have no objection to DMIs.
stl/inc/xhash
Outdated
|
|
||
| _NODISCARD _Multi_equal_check_result _Multi_equal_check_equal_range( | ||
| const _Hash& _Right, _Unchecked_const_iterator _First1) const { | ||
| // check equal_range of elements matching *_First1 match the equal_range of elements in _Right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment grammar is difficult to parse.
stl/inc/xhash
Outdated
| const auto _Last1 = _Unchecked_end(); | ||
| auto _First1 = _Unchecked_begin(); | ||
| while (_First1 != _Last1) { | ||
| auto _Result = _Multi_equal_check_equal_range(_Right, _First1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can _Result be const?
stl/inc/xutility
Outdated
| // narrowing _Iter_diff_t<_FwdIt1> to _Iter_diff_t<_FwdIt2> is OK because if the 2nd range is shorter than | ||
| // the 1st, the user already triggered UB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: I recommend saying "second" and "first" instead of aggressively abbreviating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, though technically I didn't edit that here :P. Will fix!
stl/inc/xutility
Outdated
| for (; _First1 != _Last1 && _First2 != _Last2; ++_First1, (void) ++_First2) { | ||
| for (;;) { // trim matching prefix | ||
| if (_First1 == _Last1) { | ||
| return _First2 == _Last2; // sequences are equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is only true if we return true, although I suppose it is fairly obvious.
…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.
…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.
<xutility>4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implementation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.
<xhash>1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.
Performance results:
Benchmark code:
Checklist
Be sure you've read README.md and understand the scope of this repo.
If you're unsure about a box, leave it unchecked. A maintainer will help you.
_Uglyas perhttps://eel.is/c++draft/lex.name#3.1 or there are no product code changes.
verified by an STL maintainer before automated testing is enabled on GitHub,
leave this unchecked for initial submission).
members, adding virtual functions, changing whether a type is an aggregate
or trivially copyable, etc.).
the C++ Working Draft (including any cited standards), other WG21 papers
(excluding reference implementations outside of proposed standard wording),
and LWG issues as reference material. If they were derived from a project
that's already listed in NOTICE.txt, that's fine, but please mention it.
If they were derived from any other project (including Boost and libc++,
which are not yet listed in NOTICE.txt), you must mention it here,
so we can determine whether the license is compatible and what else needs
to be done.