From fea9cba412933da902e6392227e4aac034eddb86 Mon Sep 17 00:00:00 2001 From: Billy Robert O'Neal III Date: Wed, 15 Jan 2020 13:22:15 -0800 Subject: [PATCH 1/5] Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal/1 18.7 11.7 59.83% HashRandomUnequal/10 137 97 41.24% HashRandomUnequal/100 1677 1141 46.98% HashRandomUnequal/512 10386 7036 47.61% HashRandomUnequal/4096 173807 119391 45.58% HashRandomUnequal/32768 2898405 1529710 89.47% HashRandomUnequal/100000 27441112 18602792 47.51% HashRandomUnequal/1 18.9 11.8 60.17% HashRandomUnequal/10 138 101 36.63% HashRandomUnequal/100 1613 1154 39.77% HashRandomUnequal/512 10385 7178 44.68% HashRandomUnequal/4096 171718 120115 42.96% HashRandomUnequal/32768 3352231 1510245 121.97% HashRandomUnequal/100000 26532471 19209741 38.12% HashRandomEqual/1 16 9.4 70.21% HashRandomEqual/10 126 89.2 41.26% HashRandomEqual/100 1644 1133 45.10% HashRandomEqual/512 10532 7183 46.62% HashRandomEqual/4096 174580 120029 45.45% HashRandomEqual/32768 3031653 1455416 108.30% HashRandomEqual/100000 26100504 19240571 35.65% HashRandomEqual/1 15.9 9.38 69.51% HashRandomEqual/10 123 94.1 30.71% HashRandomEqual/100 1645 1151 42.92% HashRandomEqual/512 10177 7144 42.46% HashRandomEqual/4096 172994 121381 42.52% HashRandomEqual/32768 3045242 1966513 54.85% HashRandomEqual/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets/2 5.87 3.41 72.14% HashUnequalDifferingBuckets/10 12 3.39 253.98% HashUnequalDifferingBuckets/100 106 3.41 3008.50% HashUnequalDifferingBuckets/512 691 3.46 19871.10% HashUnequalDifferingBuckets/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets/2 5.97 3.4 75.59% HashUnequalDifferingBuckets/10 11.8 3.54 233.33% HashUnequalDifferingBuckets/100 105 3.54 2866.10% HashUnequalDifferingBuckets/512 763 3.46 21952.02% HashUnequalDifferingBuckets/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include #include #include #include #include #include #include #include using namespace std; template