[SYCL][CUDA][USM] Improve CUDA USM memory allocation functions #1577

fwyzard · 2020-04-23T13:47:09Z

Allow memory allocations with 0 alignment, to signify no alignment requirements.

Return PI_INVALID_VALUE for memory operations on a nullptr, instead of failing with an assert.

fwyzard · 2020-04-23T13:47:33Z

Fixes #1467 ?

romanovvlad · 2020-04-23T13:57:02Z

Strictly speaking alignment == 0 doesn't mean there is no requirement. It means the alignment should be equal to the largest type supported by OpenCL(CUDA in our case).
@jbrodman Could you please comment?

jbrodman · 2020-04-23T14:04:20Z

Strictly speaking alignment == 0 doesn't mean there is no requirement. It means the alignment should be equal to the largest type supported by OpenCL(CUDA in our case).
@jbrodman Could you please comment?

The intent of alignment == 0 is "just do the default thing" whatever that may be.

fwyzard · 2020-04-23T14:04:21Z

In USM.adoc#aligned_alloc I see only

size_t alignment - specifies the byte alignment. Must be a valid alignment supported by the implementation

for the various allocation functions.

fwyzard · 2020-04-23T14:10:58Z

Ah, in cl_intel_unified_shared_memory.asciidoc I found

alignment is the minimum alignment in bytes for the requested host allocation. It must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context. If alignment is 0, a default alignment will be used that is equal to the size of the largest data type supported by any OpenCL device in context.

So, should the functions in pi_cuda.cpp follow the same guidelines about valid values and error reporting ?

For CUDA I could set the maximum alignment to 1024 (empirical value) or at least 128 (CUDA cache line size).

romanovvlad · 2020-04-23T14:11:24Z

sycl/plugins/cuda/pi_cuda.cpp

@@ -3406,7 +3406,7 @@ pi_result cuda_piextUSMHostAlloc(void **result_ptr, pi_context context,
  } catch (pi_result error) {
    result = error;
  }
-  assert(reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0);
+  assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));


Suggested change

assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

assert((alignment == 0) || (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

"or" confuses a lot of people.

The primary reason to do this change is avoid build issues. See #1501.

fwyzard · 2020-04-23T14:16:05Z

Actually, after checking on a different machine, it looks like the cuMemAlloc and cuMemAllocHost return memory aligned to 512 bytes (0x200) .

fwyzard · 2020-04-23T14:40:16Z

I took the opportunity of changing most checks from asserts to return the appropriate error value.

smaslov-intel · 2020-04-23T17:20:55Z

Strictly speaking alignment == 0 doesn't mean there is no requirement. It means the alignment should be equal to the largest type supported by OpenCL(CUDA in our case).
@jbrodman Could you please comment?

The intent of alignment == 0 is "just do the default thing" whatever that may be.

@fwyzard , please add this in the comment about USM alignment in corresponding API in pi.h

fwyzard · 2020-04-24T13:55:57Z

Does it look reasonable ?

romanovvlad · 2020-04-24T15:11:58Z

sycl/plugins/cuda/pi_cuda.cpp

-  assert(context != nullptr);
-  assert(properties == nullptr);
+  // check the the context is valid
+  if (context == nullptr)


Please, keep asserts. It's expected that the higher level passes right arguments, otherwise it is an internal bug.

sycl/plugins/cuda/pi_cuda.cpp

fwyzard · 2020-04-24T23:48:36Z

Sorry, updated to apply the clang-format patch.

sycl/plugins/cuda/pi_cuda.cpp

fwyzard · 2020-05-06T07:19:25Z

@bader could you trigger the Lit_With_Cuda test ?

bader · 2020-05-06T09:00:41Z

@tfzhu, @vladimirlaz, could you trigger the Lit_With_Cuda test for this PR, please?

sycl/plugins/cuda/pi_cuda.cpp

romanovvlad · 2020-05-07T07:55:55Z

sycl/plugins/cuda/pi_cuda.cpp

  assert(queue != nullptr);
-  assert(ptr != nullptr);
+  // check that the pointer is valid
+  if (ptr == nullptr) {


Please, return assert.

bjoernknafla · 2020-05-07T09:40:36Z

sycl/plugins/cuda/pi_cuda.cpp

@@ -3396,17 +3396,29 @@ pi_result cuda_piEnqueueMemUnmap(pi_queue command_queue, pi_mem memobj,
 pi_result cuda_piextUSMHostAlloc(void **result_ptr, pi_context context,
                                 pi_usm_mem_properties *properties, size_t size,
                                 pi_uint32 alignment) {
+  // from empirical testing with CUDA 10.2 on a Tesla K40
+  static constexpr pi_uint32 max_alignment = 0x200;


Trying to find any CUDA docs that specify the alignment but only finding queryable values for textures and the ominous "The allocated memory is suitably aligned for any kind of variable."...

Looking at CUDA types I wonder if that means that 16-byte alignment is the minimum (as I guess needed to align a double4 - see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#built-in-vector-types)?

Trying to find any CUDA docs that specify the alignment but only finding queryable values for textures and the ominous "The allocated memory is suitably aligned for any kind of variable."...

I asked NVIDIA about it, they opened a ticket internally to clarify the documentation, but no idea of the time scale.

Thank you for pordding them! 🤞 that they see the priority in it!

bjoernknafla · 2020-05-07T09:42:19Z

sycl/plugins/cuda/pi_cuda.cpp

@@ -3416,18 +3428,31 @@ pi_result cuda_piextUSMDeviceAlloc(void **result_ptr, pi_context context,
                                   pi_device device,
                                   pi_usm_mem_properties *properties,
                                   size_t size, pi_uint32 alignment) {
+  // from empirical testing with CUDA 10.2 on a Tesla K40
+  static constexpr pi_uint32 max_alignment = 0x200;


Should we make the constant global to share it between the different functions?

I was considering if host and device could have different default alignments - but given the lack of details, I guess we could use a global static for the moment.

That is a valid thought!

bjoernknafla · 2020-05-07T09:54:19Z

Awesome to see the XFAILs go for USM!

bjoernknafla · 2020-05-07T09:54:47Z

Is this work realted to #1603 (as that issue puzzles me)?

fwyzard · 2020-05-07T09:56:24Z

Yes, as I added at the bottom there, most (all?) CUDA USM XFAILs are fixed by this PR.

My bad for not noticing it when I made the initial implementation: I had asserts disabled in my build, which hid most of the failures.

bader · 2020-05-07T12:55:48Z

sycl/test/usm/allocator_vector.cpp

@@ -1,4 +1,3 @@
-// XFAIL: cuda
 // piextUSM*Alloc functions for CUDA are not behaving as described in


This test is failing in our CI.
http://ci.llvm.intel.com:8010/#/builders/37/builds/705/steps/15/logs/FAIL__SYCL__allocator_vector_cpp

Any ideas why?

I can have a look later today or tomorrow.

+@bjoernknafla, just in case he might know if this is a known issue.

That error could come from CUDA event handling needing a thread pool as the CUDA callbacks currently used cannot issue CUDA API calls - and in some cases the callback holds the last existing reference to an event.

If I understand correctly, the the following PR is meant as a way towards a solution: #1471

Though the event problem does not fail tests, as the CUDA implementation "just" leaks the event in such cases and ignores the error... So it might be a different problem I am not aware off.

jbrodman · 2020-05-27T14:43:53Z

Any update on where we are with this?

jeffhammond · 2020-07-31T23:57:01Z

I am trying to test this PR and had the crazy idea to rebase it on top of the latest sycl branch, which is taking forever. It would be really nice if the maintainers could deal with this, since #1467 and this PR have been open since April.

romanovvlad · 2020-08-26T13:52:25Z

sycl/include/CL/sycl/detail/pi.h

@@ -1298,7 +1298,8 @@ using pi_usm_migration_flags = _pi_usm_migration_flags;
 /// \param context is the pi_context
 /// \param pi_usm_mem_properties are optional allocation properties
 /// \param size_t is the size of the allocation
-/// \param alignment is the desired alignment of the allocation
+/// \param alignment is the desired alignment of the allocation. 0 indicates no


Please, fix LIT tests.

bader · 2021-04-21T06:26:00Z

@fwyzard, I think USM memory allocation functions are working on CUDA. Can we close this pull request?

The base branch was changed.

fwyzard · 2021-04-21T07:57:18Z

@bader unfortunately I won't have time to look back at this for the next weeks.
From a quick look here on GitHub it seems that some of the changes could still be useful - but it would take more work to rebase them, address the conflicts, retest, etc.

I'm OK with closing the PR, and eventually resurrect it when I can work on it again.

fwyzard requested a review from smaslov-intel as a code owner April 23, 2020 13:47

fwyzard mentioned this pull request Apr 23, 2020

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

Closed

fwyzard force-pushed the fix_1467 branch from 30397d1 to 10f7fe5 Compare April 23, 2020 13:48

romanovvlad reviewed Apr 23, 2020

View reviewed changes

fwyzard force-pushed the fix_1467 branch from 10f7fe5 to fa9ba67 Compare April 23, 2020 14:39

fwyzard force-pushed the fix_1467 branch from fa9ba67 to 62c8783 Compare April 23, 2020 14:46

fwyzard changed the title ~~[SYCL][CUDA][USM] Allow unaligned memory allocations~~ [SYCL][CUDA][USM] Improve CUDA USM memory allocation functions Apr 23, 2020

bader requested a review from romanovvlad April 23, 2020 15:58

romanovvlad requested changes Apr 24, 2020

View reviewed changes

smaslov-intel previously approved these changes Apr 24, 2020

View reviewed changes

fwyzard dismissed smaslov-intel’s stale review via b6cc4c0 April 24, 2020 18:30

fwyzard force-pushed the fix_1467 branch from e9e852d to b6cc4c0 Compare April 24, 2020 18:30

smaslov-intel previously approved these changes Apr 24, 2020

View reviewed changes

fwyzard dismissed smaslov-intel’s stale review via b34d5e6 April 24, 2020 23:48

fwyzard force-pushed the fix_1467 branch from b6cc4c0 to b34d5e6 Compare April 24, 2020 23:48

smaslov-intel previously approved these changes Apr 25, 2020

View reviewed changes

bader reviewed Apr 25, 2020

View reviewed changes

sycl/plugins/cuda/pi_cuda.cpp Show resolved Hide resolved

fwyzard dismissed smaslov-intel’s stale review via b767511 April 25, 2020 10:03

fwyzard force-pushed the fix_1467 branch 2 times, most recently from b767511 to 522810b Compare April 25, 2020 10:43

bader previously approved these changes May 1, 2020

View reviewed changes

smaslov-intel previously approved these changes May 2, 2020

View reviewed changes

bader requested a review from romanovvlad May 2, 2020 08:27

romanovvlad requested changes May 7, 2020

View reviewed changes

bjoernknafla reviewed May 7, 2020

View reviewed changes

bjoernknafla mentioned this pull request May 7, 2020

Re-enable CUDA tests #1542

Closed

romanovvlad approved these changes May 7, 2020

View reviewed changes

bader reviewed May 7, 2020

View reviewed changes

bader mentioned this pull request Jun 22, 2020

CUDA_ERROR_ILLEGAL_ADDRESS in wait() #1920

Closed

bader mentioned this pull request Jul 29, 2020

[SYCL][USM] Improve USM Allocator. #2026

Merged

romanovvlad self-requested a review August 26, 2020 13:50

romanovvlad requested changes Aug 26, 2020

View reviewed changes

bader mentioned this pull request Oct 2, 2020

[SYCL][CUDA] Eliminate incorrect assertions and enable cuda usm tests #2557

Merged

fwyzard changed the base branch from sycl to intel April 21, 2021 07:53

fwyzard changed the base branch from intel to sycl April 21, 2021 07:53

fwyzard requested a review from a team as a code owner April 21, 2021 07:53

fwyzard closed this Apr 21, 2021

	assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));
	assert((alignment == 0) \|\| (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

		@@ -1,4 +1,3 @@
		// XFAIL: cuda
		// piextUSM*Alloc functions for CUDA are not behaving as described in

[SYCL][CUDA][USM] Improve CUDA USM memory allocation functions #1577

[SYCL][CUDA][USM] Improve CUDA USM memory allocation functions #1577

Uh oh!

Conversation

fwyzard commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fwyzard commented Apr 23, 2020

Uh oh!

romanovvlad commented Apr 23, 2020

Uh oh!

jbrodman commented Apr 23, 2020

Uh oh!

fwyzard commented Apr 23, 2020

Uh oh!

fwyzard commented Apr 23, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fwyzard commented Apr 23, 2020

Uh oh!

fwyzard commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smaslov-intel commented Apr 23, 2020

Uh oh!

fwyzard commented Apr 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fwyzard commented Apr 24, 2020

Uh oh!

Uh oh!

fwyzard commented May 6, 2020

Uh oh!

bader commented May 6, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjoernknafla commented May 7, 2020

Uh oh!

bjoernknafla commented May 7, 2020

Uh oh!

fwyzard commented May 7, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjoernknafla May 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrodman commented May 27, 2020

Uh oh!

jeffhammond commented Jul 31, 2020

Uh oh!

Choose a reason for hiding this comment

fwyzard commented Apr 23, 2020 •

edited

Loading

fwyzard commented Apr 23, 2020 •

edited

Loading

bjoernknafla May 7, 2020 •

edited

Loading