Skip to content

Conversation

@keyradical
Copy link
Contributor

@keyradical keyradical commented May 15, 2024

Changed the queue.fill() implementation to make use of the native functions for a specific backend. Also, unified the implementation with the one for memset, since it is just an 8-bit subset operation of fill.

In the CUDA case, both memset and fill are currently calling urEnqueueUSMFill which depending on the size of the filling pattern calls either cuMemsetD8Async, cuMemsetD16Async, cuMemsetD32Async or commonMemSetLargePattern. Before this patch memset was using the same thing, just beforehand setting patternSize always to 1 byte which resulted in calling cuMemsetD8Async. In other backends, the behaviour is analogous.

The fill method was just invoking a parallel_for to fill the memory with the pattern which was making this operation quite slow.

@keyradical keyradical marked this pull request as ready for review May 16, 2024 08:21
@keyradical keyradical requested review from a team as code owners May 16, 2024 08:21
@keyradical
Copy link
Contributor Author

keyradical commented Jun 28, 2024

Only unsigned char is guaranteed to have pure binary representation with no trap behaviour and no padding - and is therefore the only type suitable for type punning/copying other types. See the description of memset

Thus: those vector<char> should become vector<unsigned char> for consistency with memset et al, as well as to ensure that you don't violate aliasing rules

Thanks for this tip! I didn't know this before. The thing is, this depends on the already existing member of the handler class, the std::vector<char> MPattern which is used to create command group classes such such as CGFillUSM, CGFill, CGMemset, CGFill2DUSM, CGMemset2DUSM, not sure if that's all. They all seem to use char for the MPattern. I agree though that this should be changed. I can do it in a separate PR or here if you think that it won't pollute this PR with too many unrelated changes. What do you think @ldrumm ?

edit:
or I could make the changes only to CGFillUSM which is relevant here and make this temporary workaround in handler.cpp:367:

diff --git a/sycl/source/handler.cpp b/sycl/source/handler.cpp
index 872e31d9b440..51564336ef31 100644
--- a/sycl/source/handler.cpp
+++ b/sycl/source/handler.cpp
@@ -364,10 +364,13 @@ event handler::finalize() {
     CommandGroup.reset(new detail::CGCopyUSM(MSrcPtr, MDstPtr, MLength,
                                              std::move(CGData), MCodeLoc));
     break;
-  case detail::CG::FillUSM:
+  case detail::CG::FillUSM: {
+    std::vector<unsigned char> MPatternU(MPattern.size());
+    std::memcpy(MPatternU.data(), MPattern.data(), MPattern.size());
     CommandGroup.reset(new detail::CGFillUSM(
-        std::move(MPattern), MDstPtr, MLength, std::move(CGData), MCodeLoc));
+        std::move(MPatternU), MDstPtr, MLength, std::move(CGData), MCodeLoc));
     break;
+  }

@aelovikov-intel
Copy link
Contributor

From https://en.cppreference.com/w/cpp/language/types:

char — type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type). Multibyte characters strings use this type to represent code units. For every value of type unsigned char in range [​0​, 255], converting the value to char and then back to unsigned char produces the original value.(since C++11) The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed.

I don't see how unsigned char can be better than just char. And if you want to be really fancy about byte manipulations, just use std::byte.

@keyradical
Copy link
Contributor Author

To make this change from char to unsigned char or std::byte I'd need to change the type of std::vector<char> MPattern, and that requires then changing all the following command group classes which make use of it. All of them such as CGMemset etc. make use of char so this would require changing it in a lot of different places which are not really related to this PR.

@keyradical
Copy link
Contributor Author

@intel/dpcpp-nativecpu-pi-reviewers @intel/unified-runtime-reviewers could you have a look at this PR? This comment summarizes what has been done since #12702.

Copy link
Contributor

@PietroGhg PietroGhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native CPU LGTM, thank you

@keyradical keyradical added the abi-break change that's breaking abi and waiting for the next window to be able to merge label Jul 5, 2024
@keyradical
Copy link
Contributor Author

LGTM afters s/char/unsigned char/g,

This wasn't actually that complicated. This is done now @ldrumm.

@keyradical
Copy link
Contributor Author

@intel/llvm-gatekeepers this is now ready to be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

abi-break change that's breaking abi and waiting for the next window to be able to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants