Improve shuffle quality #1309

RAMitchell · 2020-10-10T03:45:39Z

This is another version of #1257, without using any proprietary hash functions.

I have included thrust::random::ranlux48 as an additional hash, as well as improved the mixing function (before I used XOR alone, which turns out to be a poor choice) to match boost::hash_combine.

I also increased the number of rounds from 8->16.

This passes all tests from #1257 and fixes #1256

@djns99 please review.

djns99 · 2020-10-10T04:04:37Z

thrust/system/detail/generic/shuffle.inl

-    return (uint32_t)((1ull << (uint64_t)w) - 1ull);
+  // Equivalent to boost::hash_combine
+  __host__ __device__ size_t hash_combine(uint64_t lhs, uint64_t rhs) const {
+    lhs ^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2);


I think this is intended for 32 bit numbers. Though this is probably fine since the inputs can be uint32_t, but we would get less randomness from the key so its a trade-off.
I believe boost uses a variant murmur hash 2 for uint64_t.

inline void hash_combine_impl(boost::uint64_t& h, boost::uint64_t k) { const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995); const int r = 47; k *= m; k ^= k >> r; k *= m; h ^= k; h *= m; // Completely arbitrary number, to prevent 0's // from hashing to 0. h += 0xe6546b64; }

Ok will use this one.

Using the above function causes tests to fail for some reason. I will keep my version for now.

If you are going to do that might be worth changing the key to uint32_t so it takes less space

We can leave that for future work.

Ok no worries. I am happy with it then.

thrust/system/detail/generic/shuffle.inl

alliepiper

Thanks for redoing this!

I've requested a few minor changes (mostly style and docs) inline, once those are addressed I'll start testing.

testing/shuffle.cu

Changes addressed.

alliepiper · 2020-10-13T21:43:32Z

LGTM, thanks for the quick updates!

DVS CL 29194481.

alliepiper · 2020-10-14T19:24:59Z

This is failing our internal CI. Can you take a look at this error? Looks like an easy fix.


/dvs/p4/build/sw/gpgpu/thrust/testing/unittest/assertions.h: In instantiation of 'void unittest::assert_equal(T1, T2, const string&, int) [with T1 = long unsigned int; T2 = int; std::__cxx11::string = std::__cxx11::basic_string]':
/dvs/p4/build/sw/gpgpu/thrust/testing//shuffle.cu:407:109: required from here
/dvs/p4/build/sw/gpgpu/thrust/testing/unittest/assertions.h:120:9: error: comparison of integer expressions of different signedness: 'long unsigned int' and 'int' [-Werror=sign-compare]

alliepiper · 2020-10-30T17:30:13Z

DVS CL: 29265465

alliepiper · 2020-11-03T19:21:24Z

CI found some janky casts in the cephes code. Added some alignment hints to make sure these work out in practice.

New DVS CL: 29278660

alliepiper · 2020-11-05T16:21:53Z

More XLC fixes, new DVS CL: 29287861

alliepiper · 2020-11-09T21:20:04Z

New DVS CL: 29301680

alliepiper · 2020-11-10T17:27:09Z

testing/shuffle.cu

+  static constexpr unsigned short A[] = {
+      0x6661, 0x2733, 0x9850, 0x3f4a, 0xe943, 0xb580, 0x7fbd,
+      0xbf43, 0x5ebb, 0x20dc, 0x019f, 0x3f4a, 0xa5a1, 0x16b0,
+      0xc16c, 0xbf66, 0x554b, 0x5555, 0x5555, 0x3fb5};
+  static constexpr unsigned short B[] = {
+      0x6761, 0x8ff3, 0x8901, 0xc095, 0xb93e, 0x355b, 0xf234, 0xc0e2,
+      0x89e5, 0xf890, 0x3d73, 0xc114, 0xdb51, 0xf994, 0xbc82, 0xc131,
+      0xf20b, 0x0219, 0x4589, 0xc13a, 0x055e, 0x5418, 0x0c67, 0xc12a};
+  static constexpr unsigned short C[] = {
+      /*0x0000,0x0000,0x0000,0x3ff0,*/
+      0x12b2, 0x1cf3, 0xfd0d, 0xc075, 0xd757, 0x7b89, 0xaa0d, 0xc0d0,
+      0x4c9b, 0xb974, 0xeb84, 0xc10a, 0x0043, 0x7195, 0x6286, 0xc131,
+      0xf34c, 0x892f, 0x5255, 0xc143, 0xe14a, 0x6a11, 0xce4b, 0xc13e};


We need to come up with a different implementation for this. I'm not sure why this implementation is written this way, but it's defining a buffer of 16-bit ints that are then bitcast into 64-bit floats.

I've tried using some alignment hints to work around this, but some compilers still aren't happy, and tbh making these sorts of assumptions about alignment, endianness, and type representations makes me nervous.

@RAMitchell @djns99 please find an alternate implementation for this -- if we can get something that passes tests by Friday I can include this PR in 1.11.0, otherwise it may slip to 1.12.0.

@RAMitchell @allisonvacanti
I am not able to test right now, but here are the tables as a double. This should resolve the issues I believe:

static constexpr double A[] = {0.000811614167470508488140545910738410384510643780,-0.000595061904284301438315674115386855191900394857,0.000793650340457716942620114419781884862459264696,-0.002777777777300996942672073330982129846233874559,0.083333333333333189929525985917280195280909538269}; static constexpr double B[] = {-1378.251525691208598800585605204105377197265625,-38801.631513463784358464181423187255859375,-331612.9927388711948879063129425048828125,-1162370.97492762305773794651031494140625,-1721737.00820839661173522472381591796875,-853555.66424576542340219020843505859375}; static constexpr double C[] = {-351.8157014365234545039129443466663360595703125,-17064.21066518811494461260735988616943359375,-220528.59055385444662533700466156005859375,-1139334.44367982516996562480926513671875,-2532523.07177582941949367523193359375,-2018891.4143353276886045932769775390625};

alliepiper · 2020-11-10T17:31:39Z

testing/shuffle.cu

+  static constexpr unsigned short A[] = {
+      0x6661, 0x2733, 0x9850, 0x3f4a, 0xe943, 0xb580, 0x7fbd,
+      0xbf43, 0x5ebb, 0x20dc, 0x019f, 0x3f4a, 0xa5a1, 0x16b0,
+      0xc16c, 0xbf66, 0x554b, 0x5555, 0x5555, 0x3fb5};
+  static constexpr unsigned short B[] = {
+      0x6761, 0x8ff3, 0x8901, 0xc095, 0xb93e, 0x355b, 0xf234, 0xc0e2,
+      0x89e5, 0xf890, 0x3d73, 0xc114, 0xdb51, 0xf994, 0xbc82, 0xc131,
+      0xf20b, 0x0219, 0x4589, 0xc13a, 0x055e, 0x5418, 0x0c67, 0xc12a};
+  static constexpr unsigned short C[] = {
+      /*0x0000,0x0000,0x0000,0x3ff0,*/
+      0x12b2, 0x1cf3, 0xfd0d, 0xc075, 0xd757, 0x7b89, 0xaa0d, 0xc0d0,
+      0x4c9b, 0xb974, 0xeb84, 0xc10a, 0x0043, 0x7195, 0x6286, 0xc131,
+      0xf34c, 0x892f, 0x5255, 0xc143, 0xe14a, 0x6a11, 0xce4b, 0xc13e};


I've tried using some alignment hints to work around this, but some compilers still aren't happy, and tbh making these sorts of assumptions about alignment, endianness, and type representations makes me nervous.

@RAMitchell @djns99 please find an alternate implementation for this -- if we can get something that passes tests by Friday I can include this PR in 1.11.0, otherwise it may slip to 1.12.0.

alliepiper · 2020-11-12T16:37:52Z

Thanks for quick fix -- restarted testing under CL 29315896

alliepiper · 2020-11-12T22:50:06Z

Tests are passing 👍 Merging now.

Thanks to both of you for helping me get this PR over all of the hurdles :)

alliepiper · 2020-11-19T17:40:04Z

@RAMitchell @djns99 QA noticed that this patch drops the performance of thrust::shuffle by 60-95%. I don't think we want to revert the patch over this -- the shuffle quality improvements are worth paying a bit of perf, so I'll have them update their baselines.

Would either of you be interested in optimizing the new implementation to see if some of the performance can be recovered?

RAMitchell · 2020-11-19T21:31:33Z

Yep, we will be doing more work on it over summer. Most importantly we now have an accurate baseline. Can I ask how the 60-95% is measured?

alliepiper · 2020-11-20T16:50:19Z

They're running the shuffle benchmarks in internal/benchmark/bench.cu and comparing the average throughput before and after this PR.

djns99 reviewed Oct 10, 2020

View reviewed changes

alliepiper added this to the 1.11.0 milestone Oct 12, 2020

alliepiper added the type: enhancement New feature or request. label Oct 12, 2020

alliepiper previously requested changes Oct 12, 2020

View reviewed changes

testing/shuffle.cu Show resolved Hide resolved

testing/shuffle.cu Outdated Show resolved Hide resolved

testing/shuffle.cu Outdated Show resolved Hide resolved

testing/shuffle.cu Outdated Show resolved Hide resolved

testing/shuffle.cu Outdated Show resolved Hide resolved

alliepiper mentioned this pull request Oct 12, 2020

Updated shuffle implementation to use better hash function #1257

Closed

RAMitchell force-pushed the shuffle branch from b786614 to fe351fc Compare October 12, 2020 21:12

alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels Oct 13, 2020

alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels Oct 14, 2020

alliepiper added the testing: gpuCI in progress Started gpuCI testing. label Oct 15, 2020

alliepiper added testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 15, 2020

alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 30, 2020

alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. labels Nov 3, 2020

alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Nov 3, 2020

alliepiper added the testing: gpuCI passed Passed gpuCI testing. label Nov 5, 2020

alliepiper self-assigned this Nov 9, 2020

alliepiper force-pushed the shuffle branch from 528904a to bf48992 Compare November 10, 2020 17:24

alliepiper reviewed Nov 10, 2020

View reviewed changes

alliepiper suggested changes Nov 10, 2020

View reviewed changes

alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. labels Nov 10, 2020

alliepiper assigned RAMitchell and unassigned alliepiper Nov 10, 2020

Change to doubles

0253cf1

alliepiper approved these changes Nov 12, 2020

View reviewed changes

alliepiper merged commit a61e4e1 into NVIDIA:main Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve shuffle quality #1309

Improve shuffle quality #1309

RAMitchell commented Oct 10, 2020 •

edited

Loading

djns99 Oct 10, 2020 •

edited

Loading

RAMitchell Oct 10, 2020

RAMitchell Oct 10, 2020

djns99 Oct 10, 2020

RAMitchell Oct 10, 2020

djns99 Oct 12, 2020

alliepiper left a comment

alliepiper commented Oct 13, 2020

alliepiper commented Oct 14, 2020

alliepiper commented Oct 30, 2020

alliepiper commented Nov 3, 2020

alliepiper commented Nov 5, 2020

alliepiper commented Nov 9, 2020

alliepiper Nov 10, 2020

alliepiper Nov 10, 2020

djns99 Nov 12, 2020

alliepiper Nov 10, 2020

alliepiper commented Nov 12, 2020

alliepiper commented Nov 12, 2020

alliepiper commented Nov 19, 2020

RAMitchell commented Nov 19, 2020

alliepiper commented Nov 20, 2020

Improve shuffle quality #1309

Improve shuffle quality #1309

Conversation

RAMitchell commented Oct 10, 2020 • edited Loading

djns99 Oct 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alliepiper left a comment

Choose a reason for hiding this comment

alliepiper commented Oct 13, 2020

alliepiper commented Oct 14, 2020

alliepiper commented Oct 30, 2020

alliepiper commented Nov 3, 2020

alliepiper commented Nov 5, 2020

alliepiper commented Nov 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alliepiper commented Nov 12, 2020

alliepiper commented Nov 12, 2020

alliepiper commented Nov 19, 2020

RAMitchell commented Nov 19, 2020

alliepiper commented Nov 20, 2020

RAMitchell commented Oct 10, 2020 •

edited

Loading

djns99 Oct 10, 2020 •

edited

Loading