-
Notifications
You must be signed in to change notification settings - Fork 758
Conversation
return (uint32_t)((1ull << (uint64_t)w) - 1ull); | ||
// Equivalent to boost::hash_combine | ||
__host__ __device__ size_t hash_combine(uint64_t lhs, uint64_t rhs) const { | ||
lhs ^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is intended for 32 bit numbers. Though this is probably fine since the inputs can be uint32_t, but we would get less randomness from the key so its a trade-off.
I believe boost uses a variant murmur hash 2 for uint64_t.
inline void hash_combine_impl(boost::uint64_t& h,
boost::uint64_t k)
{
const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
const int r = 47;
k *= m;
k ^= k >> r;
k *= m;
h ^= k;
h *= m;
// Completely arbitrary number, to prevent 0's
// from hashing to 0.
h += 0xe6546b64;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will use this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the above function causes tests to fail for some reason. I will keep my version for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to do that might be worth changing the key to uint32_t so it takes less space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave that for future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no worries. I am happy with it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for redoing this!
I've requested a few minor changes (mostly style and docs) inline, once those are addressed I'll start testing.
LGTM, thanks for the quick updates! DVS CL 29194481. |
This is failing our internal CI. Can you take a look at this error? Looks like an easy fix.
|
DVS CL: 29265465 |
CI found some janky casts in the cephes code. Added some alignment hints to make sure these work out in practice. New DVS CL: 29278660 |
More XLC fixes, new DVS CL: 29287861 |
New DVS CL: 29301680 |
testing/shuffle.cu
Outdated
static constexpr unsigned short A[] = { | ||
0x6661, 0x2733, 0x9850, 0x3f4a, 0xe943, 0xb580, 0x7fbd, | ||
0xbf43, 0x5ebb, 0x20dc, 0x019f, 0x3f4a, 0xa5a1, 0x16b0, | ||
0xc16c, 0xbf66, 0x554b, 0x5555, 0x5555, 0x3fb5}; | ||
static constexpr unsigned short B[] = { | ||
0x6761, 0x8ff3, 0x8901, 0xc095, 0xb93e, 0x355b, 0xf234, 0xc0e2, | ||
0x89e5, 0xf890, 0x3d73, 0xc114, 0xdb51, 0xf994, 0xbc82, 0xc131, | ||
0xf20b, 0x0219, 0x4589, 0xc13a, 0x055e, 0x5418, 0x0c67, 0xc12a}; | ||
static constexpr unsigned short C[] = { | ||
/*0x0000,0x0000,0x0000,0x3ff0,*/ | ||
0x12b2, 0x1cf3, 0xfd0d, 0xc075, 0xd757, 0x7b89, 0xaa0d, 0xc0d0, | ||
0x4c9b, 0xb974, 0xeb84, 0xc10a, 0x0043, 0x7195, 0x6286, 0xc131, | ||
0xf34c, 0x892f, 0x5255, 0xc143, 0xe14a, 0x6a11, 0xce4b, 0xc13e}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to come up with a different implementation for this. I'm not sure why this implementation is written this way, but it's defining a buffer of 16-bit ints that are then bitcast into 64-bit floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried using some alignment hints to work around this, but some compilers still aren't happy, and tbh making these sorts of assumptions about alignment, endianness, and type representations makes me nervous.
@RAMitchell @djns99 please find an alternate implementation for this -- if we can get something that passes tests by Friday I can include this PR in 1.11.0, otherwise it may slip to 1.12.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RAMitchell @allisonvacanti
I am not able to test right now, but here are the tables as a double. This should resolve the issues I believe:
static constexpr double A[] = {0.000811614167470508488140545910738410384510643780,-0.000595061904284301438315674115386855191900394857,0.000793650340457716942620114419781884862459264696,-0.002777777777300996942672073330982129846233874559,0.083333333333333189929525985917280195280909538269};
static constexpr double B[] = {-1378.251525691208598800585605204105377197265625,-38801.631513463784358464181423187255859375,-331612.9927388711948879063129425048828125,-1162370.97492762305773794651031494140625,-1721737.00820839661173522472381591796875,-853555.66424576542340219020843505859375};
static constexpr double C[] = {-351.8157014365234545039129443466663360595703125,-17064.21066518811494461260735988616943359375,-220528.59055385444662533700466156005859375,-1139334.44367982516996562480926513671875,-2532523.07177582941949367523193359375,-2018891.4143353276886045932769775390625};
testing/shuffle.cu
Outdated
static constexpr unsigned short A[] = { | ||
0x6661, 0x2733, 0x9850, 0x3f4a, 0xe943, 0xb580, 0x7fbd, | ||
0xbf43, 0x5ebb, 0x20dc, 0x019f, 0x3f4a, 0xa5a1, 0x16b0, | ||
0xc16c, 0xbf66, 0x554b, 0x5555, 0x5555, 0x3fb5}; | ||
static constexpr unsigned short B[] = { | ||
0x6761, 0x8ff3, 0x8901, 0xc095, 0xb93e, 0x355b, 0xf234, 0xc0e2, | ||
0x89e5, 0xf890, 0x3d73, 0xc114, 0xdb51, 0xf994, 0xbc82, 0xc131, | ||
0xf20b, 0x0219, 0x4589, 0xc13a, 0x055e, 0x5418, 0x0c67, 0xc12a}; | ||
static constexpr unsigned short C[] = { | ||
/*0x0000,0x0000,0x0000,0x3ff0,*/ | ||
0x12b2, 0x1cf3, 0xfd0d, 0xc075, 0xd757, 0x7b89, 0xaa0d, 0xc0d0, | ||
0x4c9b, 0xb974, 0xeb84, 0xc10a, 0x0043, 0x7195, 0x6286, 0xc131, | ||
0xf34c, 0x892f, 0x5255, 0xc143, 0xe14a, 0x6a11, 0xce4b, 0xc13e}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried using some alignment hints to work around this, but some compilers still aren't happy, and tbh making these sorts of assumptions about alignment, endianness, and type representations makes me nervous.
@RAMitchell @djns99 please find an alternate implementation for this -- if we can get something that passes tests by Friday I can include this PR in 1.11.0, otherwise it may slip to 1.12.0.
Thanks for quick fix -- restarted testing under CL 29315896 |
Tests are passing 👍 Merging now. Thanks to both of you for helping me get this PR over all of the hurdles :) |
@RAMitchell @djns99 QA noticed that this patch drops the performance of Would either of you be interested in optimizing the new implementation to see if some of the performance can be recovered? |
Yep, we will be doing more work on it over summer. Most importantly we now have an accurate baseline. Can I ask how the 60-95% is measured? |
They're running the shuffle benchmarks in |
This is another version of #1257, without using any proprietary hash functions.
I have included thrust::random::ranlux48 as an additional hash, as well as improved the mixing function (before I used XOR alone, which turns out to be a poor choice) to match boost::hash_combine.
I also increased the number of rounds from 8->16.
This passes all tests from #1257 and fixes #1256
@djns99 please review.