Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Select device image based on compile_target device image property #14909

Merged
merged 18 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 62 additions & 13 deletions sycl/source/detail/program_manager/program_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1243,6 +1243,26 @@ void CheckJITCompilationForImage(const RTDeviceBinaryImage *const &Image,
}
}

const char *getArchName(const device &Device) {
namespace syclex = sycl::ext::oneapi::experimental;
auto Arch = Device.get_info<syclex::info::device::architecture>();
switch (Arch) {
#define __SYCL_ARCHITECTURE(ARCH, VAL) \
case syclex::architecture::ARCH: \
return #ARCH;
#define __SYCL_ARCHITECTURE_ALIAS(ARCH, VAL)
#include <sycl/ext/oneapi/experimental/architectures.def>
#undef __SYCL_ARCHITECTURE
#undef __SYCL_ARCHITECTURE_ALIAS
}
return "unknown";
}

sycl_device_binary getRawImg(RTDeviceBinaryImage *Img) {
return reinterpret_cast<sycl_device_binary>(
const_cast<sycl_device_binary>(&Img->getRawData()));
}

template <typename StorageKey>
RTDeviceBinaryImage *getBinImageFromMultiMap(
const std::unordered_multimap<StorageKey, RTDeviceBinaryImage *> &ImagesSet,
Expand All @@ -1251,16 +1271,48 @@ RTDeviceBinaryImage *getBinImageFromMultiMap(
if (ItBegin == ItEnd)
return nullptr;

std::vector<sycl_device_binary> RawImgs(std::distance(ItBegin, ItEnd));
auto It = ItBegin;
for (unsigned I = 0; It != ItEnd; ++It, ++I)
RawImgs[I] = reinterpret_cast<sycl_device_binary>(
const_cast<sycl_device_binary>(&It->second->getRawData()));
// Here, we aim to filter out all the device images from
// [ItBegin, ItEnd) range that are not AOT compiled for the
// Device (checked using info::device::architecture).
std::string_view ArchName = getArchName(Device);
std::vector<RTDeviceBinaryImage *> DeviceFilteredImgs;
DeviceFilteredImgs.reserve(std::distance(ItBegin, ItEnd));
for (auto It = ItBegin; It != ItEnd; ++It) {
auto PropRange = It->second->getDeviceRequirements();
auto PropIt =
std::find_if(PropRange.begin(), PropRange.end(), [&](const auto &Prop) {
return Prop->Name == std::string_view("compile_target");
});
auto AddImg = [&]() { DeviceFilteredImgs.push_back(It->second); };

std::vector<ur_device_binary_t> UrBinaries(RawImgs.size());
for (uint32_t BinaryCount = 0; BinaryCount < RawImgs.size(); BinaryCount++) {
UrBinaries[BinaryCount].pDeviceTargetSpec =
getUrDeviceTarget(RawImgs[BinaryCount]->DeviceTargetSpec);
// Device image has no compile_target property, so it is not even AOT
// compiled.
if (PropIt == PropRange.end()) {
AddImg();
maarquitos14 marked this conversation as resolved.
Show resolved Hide resolved
continue;
}

// Device image has the compile_target property, so make sure it is equal
// to the Device's architecture.
auto CompileTargetByteArray = DeviceBinaryProperty(*PropIt).asByteArray();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is asCString helper:

const char *asCString() const;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the property data guaranteed to be null terminated though? From my understanding the memory layout of these property values come from the PropertyValue class, which I don't think explicitly null terminates the data.

PropertyValue(const llvm::StringRef &Str)
: PropertyValue(reinterpret_cast<const byte *>(Str.data()),
Str.size() * sizeof(char) * /* bits in one byte */ 8) {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the property data guaranteed to be null terminated though?

Good point. I still think that using asCString would be a better fit, but I assume it requires an update to the helper itself to be made safe first. Therefore, that seems like an improvement that can/should be done separately

CompileTargetByteArray.dropBytes(8);
std::string_view CompileTarget(
reinterpret_cast<const char *>(&CompileTargetByteArray[0]),
CompileTargetByteArray.size());
if ((ArchName == CompileTarget) ||
(ArchName == "x86_64" && CompileTarget == "spir64_x86_64")) {
jzc marked this conversation as resolved.
Show resolved Hide resolved
AddImg();
}
}

if (DeviceFilteredImgs.size() == 0)
jzc marked this conversation as resolved.
Show resolved Hide resolved
return nullptr;

std::vector<ur_device_binary_t> UrBinaries(DeviceFilteredImgs.size());
for (uint32_t BinaryCount = 0; BinaryCount < DeviceFilteredImgs.size();
BinaryCount++) {
UrBinaries[BinaryCount].pDeviceTargetSpec = getUrDeviceTarget(
getRawImg(DeviceFilteredImgs[BinaryCount])->DeviceTargetSpec);
}

uint32_t ImgInd = 0;
Expand All @@ -1269,8 +1321,7 @@ RTDeviceBinaryImage *getBinImageFromMultiMap(
getSyclObjImpl(Context)->getPlugin()->call(
urDeviceSelectBinary, getSyclObjImpl(Device)->getHandleRef(),
UrBinaries.data(), UrBinaries.size(), &ImgInd);
std::advance(ItBegin, ImgInd);
return ItBegin->second;
return DeviceFilteredImgs[ImgInd];
}

RTDeviceBinaryImage &
Expand Down Expand Up @@ -1299,10 +1350,8 @@ ProgramManager::getDeviceImage(const std::string &KernelName,
std::lock_guard<std::mutex> KernelIDsGuard(m_KernelIDsMutex);
if (auto KernelId = m_KernelName2KernelIDs.find(KernelName);
KernelId != m_KernelName2KernelIDs.end()) {
// Kernel ID presence guarantees that we have bin image in the storage.
Img = getBinImageFromMultiMap(m_KernelIDs2BinImage, KernelId->second,
Context, Device);
assert(Img && "No binary image found for kernel id");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that this assert is effectively replaced by an assert within vector::operator[]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this assert because it is now not guaranteed the Img will be non null in this case. Specifically, there could be no image compatible with the current device, and getBinImageFromMultiMap will return nullptr:
https://github.com/intel/llvm/pull/14909/files#diff-78dd7f7ba0b6120dece1ae4ab5a09c9936ff654a1de2c31ff2dbb1fc58d90393R1308-R1309
In this case, the control flow then flows down to the exception. This flow is demonstrated by the NoDeviceKernel test.

} else {
Img = getBinImageFromMultiMap(m_ServiceKernels, KernelName, Context,
Device);
Expand Down
1 change: 1 addition & 0 deletions sycl/unittests/program_manager/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
set(CMAKE_CXX_EXTENSIONS OFF)
add_sycl_unittest(ProgramManagerTests OBJECT
CompileTarget.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be alphabetically sorted? Not sure really, some of the elements don't seem to be.

BuildLog.cpp
DynamicLinking.cpp
itt_annotations.cpp
Expand Down
249 changes: 249 additions & 0 deletions sycl/unittests/program_manager/CompileTarget.cpp
jzc marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
//==------------- CompileTarget.cpp --- CompileTarget unit test ------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#define SYCL2020_DISABLE_DEPRECATION_WARNINGS

#include <helpers/MockKernelInfo.hpp>
#include <helpers/UrImage.hpp>
#include <helpers/UrMock.hpp>

#include <gtest/gtest.h>

using namespace sycl;

namespace sycl::unittest {
static inline UrImage
generateImageWithCompileTarget(std::string KernelName,
std::string CompileTarget) {
std::vector<char> Data(8 + CompileTarget.size());
std::copy(CompileTarget.begin(), CompileTarget.end(), Data.data() + 8);
UrProperty CompileTargetProperty("compile_target", Data,
SYCL_PROPERTY_TYPE_BYTE_ARRAY);
UrPropertySet PropSet;
PropSet.insert(__SYCL_PROPERTY_SET_SYCL_DEVICE_REQUIREMENTS,
{CompileTargetProperty});

std::vector<unsigned char> Bin(CompileTarget.begin(), CompileTarget.end());
// Null terminate the data so it can be interpreted as c string.
Bin.push_back(0);

UrArray<UrOffloadEntry> Entries = makeEmptyKernels({KernelName});

UrImage Img{SYCL_DEVICE_BINARY_TYPE_NATIVE, // Format
__SYCL_DEVICE_BINARY_TARGET_SPIRV64_GEN, // DeviceTargetSpec
"", // Compile options
"", // Link options
std::move(Bin),
std::move(Entries),
std::move(PropSet)};

return Img;
}
} // namespace sycl::unittest

class SingleTaskKernel;
class NDRangeKernel;
class RangeKernel;
class NoDeviceKernel;

MOCK_INTEGRATION_HEADER(SingleTaskKernel)
MOCK_INTEGRATION_HEADER(NDRangeKernel)
MOCK_INTEGRATION_HEADER(RangeKernel)
MOCK_INTEGRATION_HEADER(NoDeviceKernel)

static sycl::unittest::UrImage Img[] = {
sycl::unittest::generateDefaultImage({"SingleTaskKernel"}),
sycl::unittest::generateImageWithCompileTarget("SingleTaskKernel",
"spir64_x86_64"),
sycl::unittest::generateImageWithCompileTarget("SingleTaskKernel",
"intel_gpu_pvc"),
sycl::unittest::generateImageWithCompileTarget("SingleTaskKernel",
"intel_gpu_skl"),
sycl::unittest::generateDefaultImage({"NDRangeKernel"}),
sycl::unittest::generateImageWithCompileTarget("NDRangeKernel",
"spir64_x86_64"),
sycl::unittest::generateImageWithCompileTarget("NDRangeKernel",
"intel_gpu_pvc"),
sycl::unittest::generateImageWithCompileTarget("NDRangeKernel",
"intel_gpu_skl"),
sycl::unittest::generateDefaultImage({"RangeKernel"}),
sycl::unittest::generateImageWithCompileTarget("RangeKernel",
"spir64_x86_64"),
sycl::unittest::generateImageWithCompileTarget("RangeKernel",
"intel_gpu_pvc"),
sycl::unittest::generateImageWithCompileTarget("RangeKernel",
"intel_gpu_skl"),
sycl::unittest::generateImageWithCompileTarget("NoDeviceKernel",
"intel_gpu_bdw"),
};

static sycl::unittest::UrImageArray<std::size(Img)> ImgArray{Img};
maarquitos14 marked this conversation as resolved.
Show resolved Hide resolved

ur_device_handle_t MockSklDeviceHandle =
reinterpret_cast<ur_device_handle_t>(1);
ur_device_handle_t MockPvcDeviceHandle =
reinterpret_cast<ur_device_handle_t>(2);
ur_device_handle_t MockX86DeviceHandle =
reinterpret_cast<ur_device_handle_t>(3);
int SklIp = 0x02400009;
jzc marked this conversation as resolved.
Show resolved Hide resolved
int PvcIp = 0x030f0000;
int X86Ip = 0;

ur_device_handle_t MockDevices[] = {
MockSklDeviceHandle,
MockPvcDeviceHandle,
MockX86DeviceHandle,
};

static ur_result_t redefinedDeviceGet(void *pParams) {
auto params = *static_cast<ur_device_get_params_t *>(pParams);
if (*params.ppNumDevices) {
**params.ppNumDevices = static_cast<uint32_t>(std::size(MockDevices));
return UR_RESULT_SUCCESS;
}

if (*params.pphDevices && *params.pNumEntries <= std::size(MockDevices)) {
jzc marked this conversation as resolved.
Show resolved Hide resolved
for (uint32_t i = 0; i < *params.pNumEntries; ++i) {
(*params.pphDevices)[i] = MockDevices[i];
}
}

return UR_RESULT_SUCCESS;
}

std::vector<std::string> createWithBinaryLog;
static ur_result_t redefinedProgramCreateWithBinary(void *pParams) {
auto params = *static_cast<ur_program_create_with_binary_params_t *>(pParams);
createWithBinaryLog.push_back(
reinterpret_cast<const char *>(*params.ppBinary));
return UR_RESULT_SUCCESS;
}

static ur_result_t redefinedDeviceGetInfo(void *pParams) {
auto params = *static_cast<ur_device_get_info_params_t *>(pParams);
if (*params.ppropName == UR_DEVICE_INFO_IP_VERSION && *params.ppPropValue) {
int &ret = *static_cast<int *>(*params.ppPropValue);
if (*params.phDevice == MockSklDeviceHandle)
ret = SklIp;
if (*params.phDevice == MockPvcDeviceHandle)
ret = PvcIp;
if (*params.phDevice == MockX86DeviceHandle)
ret = X86Ip;
}
if (*params.ppropName == UR_DEVICE_INFO_TYPE &&
*params.phDevice == MockX86DeviceHandle) {
if (*params.ppPropValue)
*static_cast<ur_device_type_t *>(*params.ppPropValue) =
UR_DEVICE_TYPE_CPU;
if (*params.ppPropSizeRet)
**params.ppPropSizeRet = sizeof(UR_DEVICE_TYPE_CPU);
}
return UR_RESULT_SUCCESS;
}

namespace syclex = sycl::ext::oneapi::experimental;
auto archSelector(syclex::architecture arch) {
return [=](const device &dev) {
if (dev.get_info<syclex::info::device::architecture>() == arch) {
return 1;
}
return -1;
};
}

class CompileTargetTest : public testing::Test {
protected:
sycl::unittest::UrMock<> Mock;
CompileTargetTest() {
mock::getCallbacks().set_before_callback("urProgramCreateWithBinary",
&redefinedProgramCreateWithBinary);
mock::getCallbacks().set_after_callback("urDeviceGetInfo",
&redefinedDeviceGetInfo);
mock::getCallbacks().set_after_callback("urDeviceGet", &redefinedDeviceGet);
}
};

template <typename F>
void checkUsedImageWithCompileTarget(const char *compile_target, F &&f) {
createWithBinaryLog.clear();
ASSERT_EQ(createWithBinaryLog.size(), 0U);
f();
ASSERT_EQ(createWithBinaryLog.size(), 1U);
EXPECT_EQ(createWithBinaryLog.back(), compile_target);
}

void launchSingleTaskKernel(queue q) {
q.single_task<SingleTaskKernel>([]() {});
}

TEST_F(CompileTargetTest, SingleTask) {
checkUsedImageWithCompileTarget("intel_gpu_skl", [&]() {
launchSingleTaskKernel(
queue{archSelector(syclex::architecture::intel_gpu_skl)});
});

checkUsedImageWithCompileTarget("intel_gpu_pvc", [&]() {
launchSingleTaskKernel(
queue{archSelector(syclex::architecture::intel_gpu_pvc)});
});

checkUsedImageWithCompileTarget("spir64_x86_64", [&]() {
launchSingleTaskKernel(queue{archSelector(syclex::architecture::x86_64)});
});
}

void launchNDRangeKernel(queue q) {
q.submit([&](handler &cgh) {
cgh.parallel_for<NDRangeKernel>(nd_range<1>(1, 1), [](auto) {});
});
}

TEST_F(CompileTargetTest, NDRangeKernel) {
checkUsedImageWithCompileTarget("intel_gpu_skl", [&]() {
launchNDRangeKernel(
queue{archSelector(syclex::architecture::intel_gpu_skl)});
});

checkUsedImageWithCompileTarget("intel_gpu_pvc", [&]() {
launchNDRangeKernel(
queue{archSelector(syclex::architecture::intel_gpu_pvc)});
});

checkUsedImageWithCompileTarget("spir64_x86_64", [&]() {
launchNDRangeKernel(queue{archSelector(syclex::architecture::x86_64)});
});
}

void launchRangeKernel(queue q) {
q.submit([&](handler &cgh) {
cgh.parallel_for<NDRangeKernel>(range<1>(1), [](auto) {});
});
}

TEST_F(CompileTargetTest, RangeKernel) {
checkUsedImageWithCompileTarget("intel_gpu_skl", [&]() {
launchRangeKernel(queue{archSelector(syclex::architecture::intel_gpu_skl)});
});

checkUsedImageWithCompileTarget("intel_gpu_pvc", [&]() {
launchRangeKernel(queue{archSelector(syclex::architecture::intel_gpu_pvc)});
});

checkUsedImageWithCompileTarget("spir64_x86_64", [&]() {
launchRangeKernel(queue{archSelector(syclex::architecture::x86_64)});
});
}

TEST_F(CompileTargetTest, NoDeviceKernel) {
try {
queue{}.single_task<NoDeviceKernel>([]() {});
} catch (sycl::exception &e) {
ASSERT_EQ(e.what(),
std::string("No kernel named NoDeviceKernel was found"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this exception message should be improved, because it can be quite confusing. A kernel may be right in front of a developer, but it is not that we couldn't find it, it is our program is not compatible with a device because of how it was compiler.

I suppose that this error message is what we will see even without this patch, so I'm fine with addressing that separately.

}
}
Loading