Skip to content

Conversation

@BillyONeal
Copy link
Member

@BillyONeal BillyONeal commented Jan 15, 2020

<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implementation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.

<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:

  • For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we started with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
  • Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
  • We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.

The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.

Performance results:

image

Benchmark code:

#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>

using namespace std;

template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    next(b.begin(), b.size() - 1)->second = 1u;
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    for (auto &&_ : state) {
        (void)_;
        assert(a == b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
    std::unordered_multimap<unsigned, unsigned> a;
    std::unordered_multimap<unsigned, unsigned> b;
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    for (ptrdiff_t idx = 0; idx < range0; ++idx) {
        a.emplace(0, 1);
        b.emplace(1, 0);
    }

    a.emplace(1, 0);
    b.emplace(0, 1);
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);

BENCHMARK_MAIN();

Checklist

Be sure you've read README.md and understand the scope of this repo.

If you're unsure about a box, leave it unchecked. A maintainer will help you.

  • Identifiers in product code changes are properly _Ugly as per
    https://eel.is/c++draft/lex.name#3.1 or there are no product code changes.
  • The STL builds successfully and all tests have passed (must be manually
    verified by an STL maintainer before automated testing is enabled on GitHub,
    leave this unchecked for initial submission).
  • These changes introduce no known ABI breaks (adding members, renaming
    members, adding virtual functions, changing whether a type is an aggregate
    or trivially copyable, etc.).
  • These changes were written from scratch using only this repository,
    the C++ Working Draft (including any cited standards), other WG21 papers
    (excluding reference implementations outside of proposed standard wording),
    and LWG issues as reference material. If they were derived from a project
    that's already listed in NOTICE.txt, that's fine, but please mention it.
    If they were derived from any other project (including Boost and libc++,
    which are not yet listed in NOTICE.txt), you must mention it here,
    so we can determine whether the license is compatible and what else needs
    to be done.

…taniers slightly.

<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.

<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
* For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
* Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
* We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.

Performance results:

```
Benchmark	Before (ns)	After (ns)	Percent Better
HashRandomUnequal<unordered_multimap>/1	18.7	11.7	59.83%
HashRandomUnequal<unordered_multimap>/10	137	97	41.24%
HashRandomUnequal<unordered_multimap>/100	1677	1141	46.98%
HashRandomUnequal<unordered_multimap>/512	10386	7036	47.61%
HashRandomUnequal<unordered_multimap>/4096	173807	119391	45.58%
HashRandomUnequal<unordered_multimap>/32768	2898405	1529710	89.47%
HashRandomUnequal<unordered_multimap>/100000	27441112	18602792	47.51%
HashRandomUnequal<hash_multimap>/1	18.9	11.8	60.17%
HashRandomUnequal<hash_multimap>/10	138	101	36.63%
HashRandomUnequal<hash_multimap>/100	1613	1154	39.77%
HashRandomUnequal<hash_multimap>/512	10385	7178	44.68%
HashRandomUnequal<hash_multimap>/4096	171718	120115	42.96%
HashRandomUnequal<hash_multimap>/32768	3352231	1510245	121.97%
HashRandomUnequal<hash_multimap>/100000	26532471	19209741	38.12%
HashRandomEqual<unordered_multimap>/1	16	9.4	70.21%
HashRandomEqual<unordered_multimap>/10	126	89.2	41.26%
HashRandomEqual<unordered_multimap>/100	1644	1133	45.10%
HashRandomEqual<unordered_multimap>/512	10532	7183	46.62%
HashRandomEqual<unordered_multimap>/4096	174580	120029	45.45%
HashRandomEqual<unordered_multimap>/32768	3031653	1455416	108.30%
HashRandomEqual<unordered_multimap>/100000	26100504	19240571	35.65%
HashRandomEqual<hash_multimap>/1	15.9	9.38	69.51%
HashRandomEqual<hash_multimap>/10	123	94.1	30.71%
HashRandomEqual<hash_multimap>/100	1645	1151	42.92%
HashRandomEqual<hash_multimap>/512	10177	7144	42.46%
HashRandomEqual<hash_multimap>/4096	172994	121381	42.52%
HashRandomEqual<hash_multimap>/32768	3045242	1966513	54.85%
HashRandomEqual<hash_multimap>/100000	26013781	22025482	18.11%
HashUnequalDifferingBuckets<unordered_multimap>/2	5.87	3.41	72.14%
HashUnequalDifferingBuckets<unordered_multimap>/10	12	3.39	253.98%
HashUnequalDifferingBuckets<unordered_multimap>/100	106	3.41	3008.50%
HashUnequalDifferingBuckets<unordered_multimap>/512	691	3.46	19871.10%
HashUnequalDifferingBuckets<unordered_multimap>/4096	6965	3.47	200620.46%
HashUnequalDifferingBuckets<unordered_multimap>/32768	91451	3.46	2642992.49%
HashUnequalDifferingBuckets<unordered_multimap>/100000	290430	3.52	8250752.27%
HashUnequalDifferingBuckets<hash_multimap>/2	5.97	3.4	75.59%
HashUnequalDifferingBuckets<hash_multimap>/10	11.8	3.54	233.33%
HashUnequalDifferingBuckets<hash_multimap>/100	105	3.54	2866.10%
HashUnequalDifferingBuckets<hash_multimap>/512	763	3.46	21952.02%
HashUnequalDifferingBuckets<hash_multimap>/4096	6862	3.4	201723.53%
HashUnequalDifferingBuckets<hash_multimap>/32768	94583	3.4	2781752.94%
HashUnequalDifferingBuckets<hash_multimap>/100000	287996	3.43	8396284.84%
```

Benchmark code:
```
#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>

using namespace std;

template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    next(b.begin(), b.size() - 1)->second = 1u;
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    for (auto &&_ : state) {
        (void)_;
        assert(a == b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
    std::unordered_multimap<unsigned, unsigned> a;
    std::unordered_multimap<unsigned, unsigned> b;
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    for (ptrdiff_t idx = 0; idx < range0; ++idx) {
        a.emplace(0, 1);
        b.emplace(1, 0);
    }

    a.emplace(1, 0);
    b.emplace(0, 1);
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);

BENCHMARK_MAIN();
@BillyONeal BillyONeal requested a review from a team as a code owner January 15, 2020 21:24
@BillyONeal BillyONeal self-assigned this Jan 15, 2020
@BillyONeal BillyONeal changed the title Optimize the is_permutation family and _Hash::operator== for multicontainers slightly. Optimize the is_permutation family and _Hash::operator== for multicontainers Jan 15, 2020
@BillyONeal
Copy link
Member Author

I am specifically interested in more perf tests folks think I should run.

@CaseyCarter

This comment has been minimized.

@BillyONeal
Copy link
Member Author

The SHOUTY banners were attached to functions which were implmentation details

s/implmentation/implementation/

Fixed, feel free to just edit stuff like that :P

@CaseyCarter
Copy link
Contributor

Fixed, feel free to just edit stuff like that :P

I started to, but was concerned you might have a dual internal PR that needs to be corrected also.

@BillyONeal
Copy link
Member Author

Fixed, feel free to just edit stuff like that :P

I started to, but was concerned you might have a dual internal PR that needs to be corrected also.

There are no test changes needed for this so I'm going to let it be approved here before making a dual one

BillyONeal added a commit to BillyONeal/STL that referenced this pull request Jan 16, 2020
Resolves microsoftGH-6, microsoftGH-38, and drive-by fixes microsoftGH-414.

Constexprizes the following algorithms and enables all relevant libc++ tests:

* adjacent_find
* all_of
* any_of
* binary_search
* copy
* copy_backward
* copy_if
* copy_n
* count
* count_if
* equal
* equal_range
* exchange
* fill
* fill_n
* find
* find_end
* find_first_of
* find_if
* find_if_not
* for_each
* for_each_n
* generate
* generate_n
* includes
* is_heap
* is_heap
* is_heap_until
* is_partitioned
* is_permutation
* is_sorted
* is_sorted_until
* iter_swap
* lexicographical_compare
* lower_bound
* make_heap
* merge
* mismatch
* move
* move_backward
* next_permutation
* none_of
* nth_element
* partial_sort
* partial_sort_copy
* partition
* partition_copy
* partition_point
* pop_heap
* prev_permutation
* push_heap
* remove
* remove_copy
* remove_copy_if
* remove_if
* replace
* replace_copy
* replace_copy_if
* replace_if
* reverse_copy
* revese
* rotate
* rotate_copy
* search
* search_n
* set_difference
* set_intersection
* set_symmetric_difference
* set_union
* sort
* sort_heap
* swap
* swap_ranges
* transform
* unique
* unique_copy
* upper_bound

This commit also contains the contents of microsoft#423 to workaround DevCom-883631, it will be rebased and that content removed once that is merged (the other one needs to be merged first).

Specific notes:

`<xutility>`:
* The `_Ptr_copy_cat` family are moved down next to std::copy as that is their first consumer.
* The core language loop for `copy_n` is fairly long and so it was extracted into its own function, `_Copy_n_core` (note similar name schema to `_Copy_memmove`)
* `reverse` was changed to use early-returns for its optimization passes; this removes the nice thing of having if constexpr not instantiate the loop. However, this form allows the loop to not be duplicated.
@BillyONeal BillyONeal mentioned this pull request Jan 16, 2020
4 tasks
Comment on lines 1997 to 2005
// check the first elements for equivalence when !_Standard
if (_Right._Traitsobj(_Keyval, _Traits::_Kfn(*_First2))) {
return {};
}

// trim matching prefixes
const size_t _LHashval = _Traitsobj(_Keyval);
const size_type _LBucket = _LHashval & _Mask;
const auto _LBucket_hi = _Vec._Mypair._Myval2._Myfirst[(_LBucket << 1) + 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had sufficient coffee this time this is the main difference between the two if constexpr branches.

I you would store the pointer after _LBucket_hi you could also write

    ++_First1;
    const bool _Left_range_end =  
        _First1 == _LBucket_hi_next || _Traitsobj(_Keyval, _Traits::_Kfn(*_First1));

With that the only difference between the two loops (besides comments) is _LBucket_hi_next vs _Unchecked_end(). Given that you use a temporary variable anyway you could assign _Unchecked_end() to _LBucket_hi_next in the _Traits::_Standard case

Dunno if that is truly worth it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will do one extra pointer dereference per bucket (to find _LBucket_hi_next); but maybe we don't care about the perf of !_Standard (that is, hash_foo rather than unordered_foo) ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, applied.

Copy link
Contributor

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My non-resolved comments are suggestions, so I "Approve with suggestions."

@BillyONeal
Copy link
Member Author

My non-resolved comments are suggestions, so I "Approve with suggestions."

Is there anything I missed other than the !_Pred thing?

@CaseyCarter
Copy link
Contributor

Is there anything I missed other than the !_Pred thing?

No - I think I was confused by clicking the through the comments and observing old state which I thought was current state.

#endif // _HAS_CXX17

// FUNCTION TEMPLATE _Count_pr
// FUNCTION TEMPLATE is_permutation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, #279 tracks a possible improvement to is_permutation, although it isn't necessary to implement at this time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that looks interesting but would prefer that be a separate change (as it'll need its own perf tests and similar justification)

stl/inc/xhash Outdated
}

struct _Multi_equal_check_result {
bool _Equal_possible;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mildly uncomfortable with the lack of DMIs here; return {} initializes this bool to false but the risk of garbage-init is higher than I would like. Would it be better to add DMIs for {false} and {} for the iterator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests all the functions returning this type were inlined so I have no objection to DMIs.

stl/inc/xhash Outdated

_NODISCARD _Multi_equal_check_result _Multi_equal_check_equal_range(
const _Hash& _Right, _Unchecked_const_iterator _First1) const {
// check equal_range of elements matching *_First1 match the equal_range of elements in _Right
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment grammar is difficult to parse.

stl/inc/xhash Outdated
const auto _Last1 = _Unchecked_end();
auto _First1 = _Unchecked_begin();
while (_First1 != _Last1) {
auto _Result = _Multi_equal_check_equal_range(_Right, _First1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can _Result be const?

stl/inc/xutility Outdated
Comment on lines 4772 to 4773
// narrowing _Iter_diff_t<_FwdIt1> to _Iter_diff_t<_FwdIt2> is OK because if the 2nd range is shorter than
// the 1st, the user already triggered UB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: I recommend saying "second" and "first" instead of aggressively abbreviating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though technically I didn't edit that here :P. Will fix!

stl/inc/xutility Outdated
for (; _First1 != _Last1 && _First2 != _Last2; ++_First1, (void) ++_First2) {
for (;;) { // trim matching prefix
if (_First1 == _Last1) {
return _First2 == _Last2; // sequences are equal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is only true if we return true, although I suppose it is fairly obvious.

@BillyONeal BillyONeal merged commit ff403e3 into microsoft:master Jan 19, 2020
@BillyONeal BillyONeal deleted the optimize_permutations branch January 19, 2020 23:48
fengjixuchui pushed a commit to fengjixuchui/STL that referenced this pull request Jan 20, 2020
…tainers (microsoft#423)

* Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly.

<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.

<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
* For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
* Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
* We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.

Performance results:

```
Benchmark	Before (ns)	After (ns)	Percent Better
HashRandomUnequal<unordered_multimap>/1	18.7	11.7	59.83%
HashRandomUnequal<unordered_multimap>/10	137	97	41.24%
HashRandomUnequal<unordered_multimap>/100	1677	1141	46.98%
HashRandomUnequal<unordered_multimap>/512	10386	7036	47.61%
HashRandomUnequal<unordered_multimap>/4096	173807	119391	45.58%
HashRandomUnequal<unordered_multimap>/32768	2898405	1529710	89.47%
HashRandomUnequal<unordered_multimap>/100000	27441112	18602792	47.51%
HashRandomUnequal<hash_multimap>/1	18.9	11.8	60.17%
HashRandomUnequal<hash_multimap>/10	138	101	36.63%
HashRandomUnequal<hash_multimap>/100	1613	1154	39.77%
HashRandomUnequal<hash_multimap>/512	10385	7178	44.68%
HashRandomUnequal<hash_multimap>/4096	171718	120115	42.96%
HashRandomUnequal<hash_multimap>/32768	3352231	1510245	121.97%
HashRandomUnequal<hash_multimap>/100000	26532471	19209741	38.12%
HashRandomEqual<unordered_multimap>/1	16	9.4	70.21%
HashRandomEqual<unordered_multimap>/10	126	89.2	41.26%
HashRandomEqual<unordered_multimap>/100	1644	1133	45.10%
HashRandomEqual<unordered_multimap>/512	10532	7183	46.62%
HashRandomEqual<unordered_multimap>/4096	174580	120029	45.45%
HashRandomEqual<unordered_multimap>/32768	3031653	1455416	108.30%
HashRandomEqual<unordered_multimap>/100000	26100504	19240571	35.65%
HashRandomEqual<hash_multimap>/1	15.9	9.38	69.51%
HashRandomEqual<hash_multimap>/10	123	94.1	30.71%
HashRandomEqual<hash_multimap>/100	1645	1151	42.92%
HashRandomEqual<hash_multimap>/512	10177	7144	42.46%
HashRandomEqual<hash_multimap>/4096	172994	121381	42.52%
HashRandomEqual<hash_multimap>/32768	3045242	1966513	54.85%
HashRandomEqual<hash_multimap>/100000	26013781	22025482	18.11%
HashUnequalDifferingBuckets<unordered_multimap>/2	5.87	3.41	72.14%
HashUnequalDifferingBuckets<unordered_multimap>/10	12	3.39	253.98%
HashUnequalDifferingBuckets<unordered_multimap>/100	106	3.41	3008.50%
HashUnequalDifferingBuckets<unordered_multimap>/512	691	3.46	19871.10%
HashUnequalDifferingBuckets<unordered_multimap>/4096	6965	3.47	200620.46%
HashUnequalDifferingBuckets<unordered_multimap>/32768	91451	3.46	2642992.49%
HashUnequalDifferingBuckets<unordered_multimap>/100000	290430	3.52	8250752.27%
HashUnequalDifferingBuckets<hash_multimap>/2	5.97	3.4	75.59%
HashUnequalDifferingBuckets<hash_multimap>/10	11.8	3.54	233.33%
HashUnequalDifferingBuckets<hash_multimap>/100	105	3.54	2866.10%
HashUnequalDifferingBuckets<hash_multimap>/512	763	3.46	21952.02%
HashUnequalDifferingBuckets<hash_multimap>/4096	6862	3.4	201723.53%
HashUnequalDifferingBuckets<hash_multimap>/32768	94583	3.4	2781752.94%
HashUnequalDifferingBuckets<hash_multimap>/100000	287996	3.43	8396284.84%
```

Benchmark code:
```
#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>

using namespace std;

template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    next(b.begin(), b.size() - 1)->second = 1u;
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    for (auto &&_ : state) {
        (void)_;
        assert(a == b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
    std::unordered_multimap<unsigned, unsigned> a;
    std::unordered_multimap<unsigned, unsigned> b;
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    for (ptrdiff_t idx = 0; idx < range0; ++idx) {
        a.emplace(0, 1);
        b.emplace(1, 0);
    }

    a.emplace(1, 0);
    b.emplace(0, 1);
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);

BENCHMARK_MAIN();

* Apply a bunch of code review comments from Casey.

* clang-format

* Apply @miscco's code deduplication idea for <xhash>.

* Fix code review comments from Stephan: comments and add DMIs.
fengjixuchui pushed a commit to fengjixuchui/STL that referenced this pull request Jan 20, 2020
…tainers (microsoft#423)

* Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly.

<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.

<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
* For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
* Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
* We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.

Performance results:

```
Benchmark	Before (ns)	After (ns)	Percent Better
HashRandomUnequal<unordered_multimap>/1	18.7	11.7	59.83%
HashRandomUnequal<unordered_multimap>/10	137	97	41.24%
HashRandomUnequal<unordered_multimap>/100	1677	1141	46.98%
HashRandomUnequal<unordered_multimap>/512	10386	7036	47.61%
HashRandomUnequal<unordered_multimap>/4096	173807	119391	45.58%
HashRandomUnequal<unordered_multimap>/32768	2898405	1529710	89.47%
HashRandomUnequal<unordered_multimap>/100000	27441112	18602792	47.51%
HashRandomUnequal<hash_multimap>/1	18.9	11.8	60.17%
HashRandomUnequal<hash_multimap>/10	138	101	36.63%
HashRandomUnequal<hash_multimap>/100	1613	1154	39.77%
HashRandomUnequal<hash_multimap>/512	10385	7178	44.68%
HashRandomUnequal<hash_multimap>/4096	171718	120115	42.96%
HashRandomUnequal<hash_multimap>/32768	3352231	1510245	121.97%
HashRandomUnequal<hash_multimap>/100000	26532471	19209741	38.12%
HashRandomEqual<unordered_multimap>/1	16	9.4	70.21%
HashRandomEqual<unordered_multimap>/10	126	89.2	41.26%
HashRandomEqual<unordered_multimap>/100	1644	1133	45.10%
HashRandomEqual<unordered_multimap>/512	10532	7183	46.62%
HashRandomEqual<unordered_multimap>/4096	174580	120029	45.45%
HashRandomEqual<unordered_multimap>/32768	3031653	1455416	108.30%
HashRandomEqual<unordered_multimap>/100000	26100504	19240571	35.65%
HashRandomEqual<hash_multimap>/1	15.9	9.38	69.51%
HashRandomEqual<hash_multimap>/10	123	94.1	30.71%
HashRandomEqual<hash_multimap>/100	1645	1151	42.92%
HashRandomEqual<hash_multimap>/512	10177	7144	42.46%
HashRandomEqual<hash_multimap>/4096	172994	121381	42.52%
HashRandomEqual<hash_multimap>/32768	3045242	1966513	54.85%
HashRandomEqual<hash_multimap>/100000	26013781	22025482	18.11%
HashUnequalDifferingBuckets<unordered_multimap>/2	5.87	3.41	72.14%
HashUnequalDifferingBuckets<unordered_multimap>/10	12	3.39	253.98%
HashUnequalDifferingBuckets<unordered_multimap>/100	106	3.41	3008.50%
HashUnequalDifferingBuckets<unordered_multimap>/512	691	3.46	19871.10%
HashUnequalDifferingBuckets<unordered_multimap>/4096	6965	3.47	200620.46%
HashUnequalDifferingBuckets<unordered_multimap>/32768	91451	3.46	2642992.49%
HashUnequalDifferingBuckets<unordered_multimap>/100000	290430	3.52	8250752.27%
HashUnequalDifferingBuckets<hash_multimap>/2	5.97	3.4	75.59%
HashUnequalDifferingBuckets<hash_multimap>/10	11.8	3.54	233.33%
HashUnequalDifferingBuckets<hash_multimap>/100	105	3.54	2866.10%
HashUnequalDifferingBuckets<hash_multimap>/512	763	3.46	21952.02%
HashUnequalDifferingBuckets<hash_multimap>/4096	6862	3.4	201723.53%
HashUnequalDifferingBuckets<hash_multimap>/32768	94583	3.4	2781752.94%
HashUnequalDifferingBuckets<hash_multimap>/100000	287996	3.43	8396284.84%
```

Benchmark code:
```
#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>

using namespace std;

template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    next(b.begin(), b.size() - 1)->second = 1u;
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    for (auto &&_ : state) {
        (void)_;
        assert(a == b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
    std::unordered_multimap<unsigned, unsigned> a;
    std::unordered_multimap<unsigned, unsigned> b;
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    for (ptrdiff_t idx = 0; idx < range0; ++idx) {
        a.emplace(0, 1);
        b.emplace(1, 0);
    }

    a.emplace(1, 0);
    b.emplace(0, 1);
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);

BENCHMARK_MAIN();

* Apply a bunch of code review comments from Casey.

* clang-format

* Apply @miscco's code deduplication idea for <xhash>.

* Fix code review comments from Stephan: comments and add DMIs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants