[SYCL][ABI-Break] Improve Queue fill #13788

keyradical · 2024-05-15T09:41:04Z

Changed the queue.fill() implementation to make use of the native functions for a specific backend. Also, unified the implementation with the one for memset, since it is just an 8-bit subset operation of fill.

In the CUDA case, both memset and fill are currently calling urEnqueueUSMFill which depending on the size of the filling pattern calls either cuMemsetD8Async, cuMemsetD16Async, cuMemsetD32Async or commonMemSetLargePattern. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling cuMemsetD8Async. In other backends, the behaviour is analogous.

The fill method was just invoking a parallel_for to fill the memory with the pattern which was making this operation quite slow.

keyradical · 2024-06-28T13:12:49Z

Only unsigned char is guaranteed to have pure binary representation with no trap behaviour and no padding - and is therefore the only type suitable for type punning/copying other types. See the description of memset

Thus: those vector<char> should become vector<unsigned char> for consistency with memset et al, as well as to ensure that you don't violate aliasing rules

Thanks for this tip! I didn't know this before. The thing is, this depends on the already existing member of the handler class, the std::vector<char> MPattern which is used to create command group classes such such as CGFillUSM, CGFill, CGMemset, CGFill2DUSM, CGMemset2DUSM, not sure if that's all. They all seem to use char for the MPattern. I agree though that this should be changed. I can do it in a separate PR or here if you think that it won't pollute this PR with too many unrelated changes. What do you think @ldrumm ?

edit:
or I could make the changes only to CGFillUSM which is relevant here and make this temporary workaround in handler.cpp:367:

diff --git a/sycl/source/handler.cpp b/sycl/source/handler.cpp
index 872e31d9b440..51564336ef31 100644
--- a/sycl/source/handler.cpp
+++ b/sycl/source/handler.cpp
@@ -364,10 +364,13 @@ event handler::finalize() {
     CommandGroup.reset(new detail::CGCopyUSM(MSrcPtr, MDstPtr, MLength,
                                              std::move(CGData), MCodeLoc));
     break;
-  case detail::CG::FillUSM:
+  case detail::CG::FillUSM: {
+    std::vector<unsigned char> MPatternU(MPattern.size());
+    std::memcpy(MPatternU.data(), MPattern.data(), MPattern.size());
     CommandGroup.reset(new detail::CGFillUSM(
-        std::move(MPattern), MDstPtr, MLength, std::move(CGData), MCodeLoc));
+        std::move(MPatternU), MDstPtr, MLength, std::move(CGData), MCodeLoc));
     break;
+  }

aelovikov-intel · 2024-06-28T15:31:12Z

From https://en.cppreference.com/w/cpp/language/types:

char — type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type). Multibyte characters strings use this type to represent code units. For every value of type unsigned char in range [0, 255], converting the value to char and then back to unsigned char produces the original value.(since C++11) The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed.

I don't see how unsigned char can be better than just char. And if you want to be really fancy about byte manipulations, just use std::byte.

keyradical · 2024-07-01T12:11:16Z

To make this change from char to unsigned char or std::byte I'd need to change the type of std::vector<char> MPattern, and that requires then changing all the following command group classes which make use of it. All of them such as CGMemset etc. make use of char so this would require changing it in a lot of different places which are not really related to this PR.

keyradical · 2024-07-02T10:09:14Z

@intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers could you have a look at this PR? This comment summarizes what has been done since #12702.

PietroGhg

Native CPU LGTM, thank you

keyradical · 2024-07-05T09:42:21Z

LGTM afters s/char/unsigned char/g,

This wasn't actually that complicated. This is done now @ldrumm.

keyradical · 2024-07-05T14:15:07Z

@intel/llvm-gatekeepers this is now ready to be merged

Konrad Kusiak added 16 commits March 27, 2024 16:19

Implemented native USM Fill and removed piextUSMEnqueueMemset symbol

a5bb5d4

Added overload to fill usm and pointed to UR PR tag

bd59ffc

Fixed handling ABI breaking interfaces

c0a5439

Adjusted graph test to treat memset as memfill

221e852

Updated windows symbols

c696e69

Merge branch 'sycl' into improvedQueueFill

cb8c475

Changed to correctly setting memset node and modified native_cpu symbols

0fc3f06

Updated graph usm fill tests and Command Graph docs

d98d82e

Merge branch 'sycl' into improvedQueueFill

ea6a3b4

Merge branch 'sycl' into improvedQueueFill

ef43753

Merge branch 'sycl' into improvedQueueFill

c832525

Merge branch 'sycl' into improvedQueueFill

37e4cf9

Merge branch 'sycl' into improvedQueueFill

b15e7cf

Fixed a comment

00e451f

Merge branch 'sycl' into improvedQueueFill

545f4e7

Merge branch 'sycl' into improvedQueueFill

a972ce2

keyradical temporarily deployed to WindowsCILock May 15, 2024 09:41 — with GitHub Actions Inactive

keyradical temporarily deployed to WindowsCILock May 15, 2024 10:14 — with GitHub Actions Inactive

keyradical marked this pull request as ready for review May 16, 2024 08:21

keyradical requested review from a team as code owners May 16, 2024 08:21

keyradical requested review from ldrumm and uditagarwal97 May 16, 2024 08:21

keyradical mentioned this pull request May 16, 2024

Q.fill() improvements fail on gen12 #13787

Closed

Merge branch 'sycl' into improvedQueueFill

1ba386a

keyradical temporarily deployed to WindowsCILock May 21, 2024 12:06 — with GitHub Actions Inactive

keyradical temporarily deployed to WindowsCILock May 21, 2024 13:23 — with GitHub Actions Inactive

keyradical had a problem deploying to WindowsCILock July 1, 2024 11:27 — with GitHub Actions Error

keyradical force-pushed the improvedQueueFill branch from 315b265 to ca2f561 Compare July 1, 2024 11:46

Konrad Kusiak added 2 commits July 1, 2024 12:52

Merge branch 'sycl' into improvedQueueFill

1586217

merged sycl tip and resolved conflicts

51a9136

keyradical had a problem deploying to WindowsCILock July 1, 2024 11:57 — with GitHub Actions Failure

keyradical temporarily deployed to WindowsCILock July 1, 2024 13:14 — with GitHub Actions Inactive

PietroGhg approved these changes Jul 2, 2024

View reviewed changes

aarongreig approved these changes Jul 2, 2024

View reviewed changes

Konrad Kusiak added 2 commits July 5, 2024 08:41

Merge branch 'sycl' into improvedQueueFill

fe3ab0b

Updated symbols linux

6c6d81a

keyradical had a problem deploying to WindowsCILock July 5, 2024 07:58 — with GitHub Actions Failure

keyradical had a problem deploying to WindowsCILock July 5, 2024 08:19 — with GitHub Actions Error

Fixed EnqueueFunctionsEvents unittest

f4c59ca

keyradical had a problem deploying to WindowsCILock July 5, 2024 08:20 — with GitHub Actions Failure

keyradical had a problem deploying to WindowsCILock July 5, 2024 08:44 — with GitHub Actions Error

Changed char to unsigned char

648d3b9

keyradical had a problem deploying to WindowsCILock July 5, 2024 09:06 — with GitHub Actions Error

Updated symbols windows

e2a9aa0

keyradical temporarily deployed to WindowsCILock July 5, 2024 09:30 — with GitHub Actions Inactive

keyradical added the abi-break change that's breaking abi and waiting for the next window to be able to merge label Jul 5, 2024

keyradical temporarily deployed to WindowsCILock July 5, 2024 10:24 — with GitHub Actions Inactive

ldrumm approved these changes Jul 5, 2024

View reviewed changes

martygrant merged commit 0ccb0b7 into intel:sycl Jul 5, 2024

coldav mentioned this pull request Jul 9, 2024

Use of urEnqueueUSMFill on OpenCL enforces requirement on Host Mem Allocs for odd sizes of pattern oneapi-src/unified-runtime#1837

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][ABI-Break] Improve Queue fill #13788

[SYCL][ABI-Break] Improve Queue fill #13788

Uh oh!

keyradical commented May 15, 2024 •

edited

Loading

Uh oh!

keyradical commented Jun 28, 2024 •

edited

Loading

Uh oh!

aelovikov-intel commented Jun 28, 2024

Uh oh!

keyradical commented Jul 1, 2024

Uh oh!

keyradical commented Jul 2, 2024

Uh oh!

PietroGhg left a comment

Uh oh!

keyradical commented Jul 5, 2024

Uh oh!

keyradical commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[SYCL][ABI-Break] Improve Queue fill #13788

[SYCL][ABI-Break] Improve Queue fill #13788

Uh oh!

Conversation

keyradical commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keyradical commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aelovikov-intel commented Jun 28, 2024

Uh oh!

keyradical commented Jul 1, 2024

Uh oh!

keyradical commented Jul 2, 2024

Uh oh!

PietroGhg left a comment

Choose a reason for hiding this comment

Uh oh!

keyradical commented Jul 5, 2024

Uh oh!

keyradical commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

keyradical commented May 15, 2024 •

edited

Loading

keyradical commented Jun 28, 2024 •

edited

Loading