-
Notifications
You must be signed in to change notification settings - Fork 770
[SYCL][CUDA][USM] Improve CUDA USM memory allocation functions #1577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #1467 ? |
Strictly speaking alignment == 0 doesn't mean there is no requirement. It means the alignment should be equal to the largest type supported by OpenCL(CUDA in our case). |
The intent of alignment == 0 is "just do the default thing" whatever that may be. |
In USM.adoc#aligned_alloc I see only
for the various allocation functions. |
Ah, in cl_intel_unified_shared_memory.asciidoc I found
So, should the functions in For CUDA I could set the maximum alignment to |
sycl/plugins/cuda/pi_cuda.cpp
Outdated
@@ -3406,7 +3406,7 @@ pi_result cuda_piextUSMHostAlloc(void **result_ptr, pi_context context, | |||
} catch (pi_result error) { | |||
result = error; | |||
} | |||
assert(reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0); | |||
assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0)); | |
assert((alignment == 0) || (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0)); |
"or" confuses a lot of people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm.. OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason to do this change is avoid build issues. See #1501.
Actually, after checking on a different machine, it looks like the |
I took the opportunity of changing most checks from |
@fwyzard , please add this in the comment about USM alignment in corresponding API in pi.h |
Does it look reasonable ? |
sycl/plugins/cuda/pi_cuda.cpp
Outdated
assert(context != nullptr); | ||
assert(properties == nullptr); | ||
// check the the context is valid | ||
if (context == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, keep asserts. It's expected that the higher level passes right arguments, otherwise it is an internal bug.
Sorry, updated to apply the |
b767511
to
522810b
Compare
@bader could you trigger the Lit_With_Cuda test ? |
@tfzhu, @vladimirlaz, could you trigger the Lit_With_Cuda test for this PR, please? |
assert(queue != nullptr); | ||
assert(ptr != nullptr); | ||
// check that the pointer is valid | ||
if (ptr == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, return assert.
@@ -3396,17 +3396,29 @@ pi_result cuda_piEnqueueMemUnmap(pi_queue command_queue, pi_mem memobj, | |||
pi_result cuda_piextUSMHostAlloc(void **result_ptr, pi_context context, | |||
pi_usm_mem_properties *properties, size_t size, | |||
pi_uint32 alignment) { | |||
// from empirical testing with CUDA 10.2 on a Tesla K40 | |||
static constexpr pi_uint32 max_alignment = 0x200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to find any CUDA docs that specify the alignment but only finding queryable values for textures and the ominous "The allocated memory is suitably aligned for any kind of variable."...
Looking at CUDA types I wonder if that means that 16-byte alignment is the minimum (as I guess needed to align a double4 - see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#built-in-vector-types)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to find any CUDA docs that specify the alignment but only finding queryable values for textures and the ominous "The allocated memory is suitably aligned for any kind of variable."...
I asked NVIDIA about it, they opened a ticket internally to clarify the documentation, but no idea of the time scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pordding them! 🤞 that they see the priority in it!
@@ -3416,18 +3428,31 @@ pi_result cuda_piextUSMDeviceAlloc(void **result_ptr, pi_context context, | |||
pi_device device, | |||
pi_usm_mem_properties *properties, | |||
size_t size, pi_uint32 alignment) { | |||
// from empirical testing with CUDA 10.2 on a Tesla K40 | |||
static constexpr pi_uint32 max_alignment = 0x200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the constant global to share it between the different functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering if host and device could have different default alignments - but given the lack of details, I guess we could use a global static
for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a valid thought!
Awesome to see the XFAILs go for USM! |
Is this work realted to #1603 (as that issue puzzles me)? |
Yes, as I added at the bottom there, most (all?) CUDA USM XFAILs are fixed by this PR. My bad for not noticing it when I made the initial implementation: I had asserts disabled in my build, which hid most of the failures. |
@@ -1,4 +1,3 @@ | |||
// XFAIL: cuda | |||
// piextUSM*Alloc functions for CUDA are not behaving as described in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing in our CI.
http://ci.llvm.intel.com:8010/#/builders/37/builds/705/steps/15/logs/FAIL__SYCL__allocator_vector_cpp
Any ideas why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can have a look later today or tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@bjoernknafla, just in case he might know if this is a known issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error could come from CUDA event handling needing a thread pool as the CUDA callbacks currently used cannot issue CUDA API calls - and in some cases the callback holds the last existing reference to an event.
If I understand correctly, the the following PR is meant as a way towards a solution: #1471
Though the event problem does not fail tests, as the CUDA implementation "just" leaks the event in such cases and ignores the error... So it might be a different problem I am not aware off.
Any update on where we are with this? |
I am trying to test this PR and had the crazy idea to rebase it on top of the latest sycl branch, which is taking forever. It would be really nice if the maintainers could deal with this, since #1467 and this PR have been open since April. |
@@ -1298,7 +1298,8 @@ using pi_usm_migration_flags = _pi_usm_migration_flags; | |||
/// \param context is the pi_context | |||
/// \param pi_usm_mem_properties are optional allocation properties | |||
/// \param size_t is the size of the allocation | |||
/// \param alignment is the desired alignment of the allocation | |||
/// \param alignment is the desired alignment of the allocation. 0 indicates no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, fix LIT tests.
@fwyzard, I think USM memory allocation functions are working on CUDA. Can we close this pull request? |
The base branch was changed.
@bader unfortunately I won't have time to look back at this for the next weeks. I'm OK with closing the PR, and eventually resurrect it when I can work on it again. |
Allow memory allocations with
0
alignment, to signify no alignment requirements.Return
PI_INVALID_VALUE
for memory operations on anullptr
, instead of failing with an assert.