Skip to content

Commit

Permalink
add svm samples
Browse files Browse the repository at this point in the history
  • Loading branch information
bashbaug committed May 25, 2024
1 parent 631d81d commit 7ac1560
Show file tree
Hide file tree
Showing 18 changed files with 3,224 additions and 1,747 deletions.
3,912 changes: 2,165 additions & 1,747 deletions include/CL/opencl.hpp

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions samples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ add_subdirectory( images )
add_subdirectory( opengl )
add_subdirectory( python )
add_subdirectory( vulkan )
add_subdirectory( svm )
add_subdirectory( usm )

add_subdirectory( 00_enumopencl )
Expand Down
11 changes: 11 additions & 0 deletions samples/svm/00_svmqueries/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Copyright (c) 2024 Ben Ashbaugh
#
# SPDX-License-Identifier: MIT

add_opencl_sample(
TEST
NUMBER 00
TARGET svmqueries
VERSION 200
CATEGORY svm
SOURCES main.cpp)
19 changes: 19 additions & 0 deletions samples/svm/00_svmqueries/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# usmqueries

## Sample Purpose

This sample queries and prints the Unified Shared Memory capabilities of a device.
Many USM samples require specific USM capabilities and this sample can be used to verify if it will or will not run on a device.

## Key APIs and Concepts

This sample demonstrates the new device queries for Unified Shared Memory capabilities.
This sample currently uses c APIs to perform the device queries because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.

## Command Line Options

| Option | Default Value | Description |
|:--|:-:|:--|
| `-d <index>` | 0 | Specify the index of the OpenCL device in the platform to execute on the sample on.
| `-p <index>` | 0 | Specify the index of the OpenCL platform to execute the sample on.
68 changes: 68 additions & 0 deletions samples/svm/00_svmqueries/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
// Copyright (c) 2024 Ben Ashbaugh
//
// SPDX-License-Identifier: MIT
*/

#include <popl/popl.hpp>

#include <CL/opencl.hpp>

void PrintSVMCaps(
const char* label,
cl_device_svm_capabilities svmcaps )
{
printf("%s: %s%s%s%s\n",
label,
( svmcaps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER ) ? "\n\t\tCL_DEVICE_SVM_COARSE_GRAIN_BUFFER" : "",
( svmcaps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER ) ? "\n\t\tCL_DEVICE_SVM_FINE_GRAIN_BUFFER" : "",
( svmcaps & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM ) ? "\n\t\tCL_DEVICE_SVM_FINE_GRAIN_SYSTEM" : "",
( svmcaps & CL_DEVICE_SVM_ATOMICS ) ? "\n\t\tCL_DEVICE_SVM_ATOMICS" : "" );
}

int main(
int argc,
char** argv )
{
int platformIndex = 0;
int deviceIndex = 0;

{
popl::OptionParser op("Supported Options");
op.add<popl::Value<int>>("p", "platform", "Platform Index", platformIndex, &platformIndex);
op.add<popl::Value<int>>("d", "device", "Device Index", deviceIndex, &deviceIndex);
bool printUsage = false;
try {
op.parse(argc, argv);
} catch (std::exception& e) {
fprintf(stderr, "Error: %s\n\n", e.what());
printUsage = true;
}

if (printUsage || !op.unknown_options().empty() || !op.non_option_args().empty()) {
fprintf(stderr,
"Usage: usmqueries [options]\n"
"%s", op.help().c_str());
return -1;
}
}

std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);

printf("Running on platform: %s\n",
platforms[platformIndex].getInfo<CL_PLATFORM_NAME>().c_str() );

std::vector<cl::Device> devices;
platforms[platformIndex].getDevices(CL_DEVICE_TYPE_ALL, &devices);

printf("Running on device: %s\n",
devices[deviceIndex].getInfo<CL_DEVICE_NAME>().c_str() );

cl_device_svm_capabilities svmcaps = devices[deviceIndex].getInfo<CL_DEVICE_SVM_CAPABILITIES>();
PrintSVMCaps( "CL_DEVICE_SVM_CAPABILITIES", svmcaps );

printf("Cleaning up...\n");

return 0;
}
11 changes: 11 additions & 0 deletions samples/svm/100_cgsvmhelloworld/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Copyright (c) 2024 Ben Ashbaugh
#
# SPDX-License-Identifier: MIT

add_opencl_sample(
TEST
NUMBER 100
TARGET cgsvmhelloworld
VERSION 200
CATEGORY svm
SOURCES main.cpp)
35 changes: 35 additions & 0 deletions samples/svm/100_cgsvmhelloworld/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# dmemhelloworld

## Sample Purpose

This is the first Unified Shared Memory sample that meaningfully stores and uses data in a Unified Shared Memory allocation.

This sample demonstrates usage of device memory allocations.
Other similar samples demonstrate usage of host memory and shared memory allocations.
Device memory allocations are owned by a specific device, and generally trade off high performance for limited access.
Kernels operating on device memory should perform just as well, if not better, than OpenCL buffers or Shared Virtual Memory allocations.

The sample initializes a source USM allocation, copies it to a destination USM allocation using a kernel, then checks on the host that the copy was performed correctly.

## Key APIs and Concepts

This sample allocates device memory using `clDeviceMemAllocINTEL` and frees it using `clMemFreeINTEL`.

Since device memory cannot be directly accessed by the host, this sample initializes the source buffer by copying into it using `clEnqueueMemcpyINTEL`.
This sample also uses `clEnqueueMemcpyINTEL` to copy out of the destination buffer to verify that the copy was performed correctly.

Within a kernel, a Unified Shared Memory allocation can be accessed similar to an OpenCL buffer (a `cl_mem`), or a Shared Virtual Memory allocation.
Unified Shared Memory allocations are set as an argument to a kernel using `clSetKernelArgMemPointerINTEL`.

Since Unified Shared Memory is an OpenCL extension, this sample uses the `OpenCLExt` extension loader library to query the extension APIs.
Please see the OpenCL Extension Loader [README](https://github.com/bashbaug/opencl-extension-loader) for more detail.

This sample currently uses c APIs because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.

## Command Line Options

| Option | Default Value | Description |
|:--|:-:|:--|
| `-d <index>` | 0 | Specify the index of the OpenCL device in the platform to execute on the sample on.
| `-p <index>` | 0 | Specify the index of the OpenCL platform to execute the sample on.
161 changes: 161 additions & 0 deletions samples/svm/100_cgsvmhelloworld/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
/*
// Copyright (c) 2024 Ben Ashbaugh
//
// SPDX-License-Identifier: MIT
*/

#include <popl/popl.hpp>

#include <CL/opencl.hpp>

const size_t gwx = 1024*1024;

static const char kernelString[] = R"CLC(
kernel void CopyBuffer( global uint* dst, global uint* src )
{
uint id = get_global_id(0);
dst[id] = src[id];
}
)CLC";

int main(
int argc,
char** argv )
{
int platformIndex = 0;
int deviceIndex = 0;

{
popl::OptionParser op("Supported Options");
op.add<popl::Value<int>>("p", "platform", "Platform Index", platformIndex, &platformIndex);
op.add<popl::Value<int>>("d", "device", "Device Index", deviceIndex, &deviceIndex);
bool printUsage = false;
try {
op.parse(argc, argv);
} catch (std::exception& e) {
fprintf(stderr, "Error: %s\n\n", e.what());
printUsage = true;
}

if (printUsage || !op.unknown_options().empty() || !op.non_option_args().empty()) {
fprintf(stderr,
"Usage: dmemhelloworld [options]\n"
"%s", op.help().c_str());
return -1;
}
}

std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);

printf("Running on platform: %s\n",
platforms[platformIndex].getInfo<CL_PLATFORM_NAME>().c_str() );

std::vector<cl::Device> devices;
platforms[platformIndex].getDevices(CL_DEVICE_TYPE_ALL, &devices);

printf("Running on device: %s\n",
devices[deviceIndex].getInfo<CL_DEVICE_NAME>().c_str() );

cl_device_svm_capabilities svmcaps = devices[deviceIndex].getInfo<CL_DEVICE_SVM_CAPABILITIES>();
if( svmcaps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER ) {
printf("Device supports CL_DEVICE_SVM_COARSE_GRAIN_BUFFER.\n");
} else {
printf("Device does not support CL_DEVICE_SVM_COARSE_GRAIN_BUFFER, exiting.\n");
return -1;
}

cl::Context context{devices[deviceIndex]};
cl::CommandQueue commandQueue{context, devices[deviceIndex]};

cl::Program program{ context, kernelString };
program.build();
cl::Kernel kernel = cl::Kernel{ program, "CopyBuffer" };

cl_uint* src = (cl_uint*)clSVMAlloc(
context(),
CL_MEM_READ_WRITE,
gwx * sizeof(cl_uint),
0 );
cl_uint* dst = (cl_uint*)clSVMAlloc(
context(),
CL_MEM_READ_WRITE,
gwx * sizeof(cl_uint),
0 );

if( src && dst )
{
// initialization
{
commandQueue.enqueueMapSVM(
src,
CL_TRUE,
CL_MAP_WRITE_INVALIDATE_REGION,
gwx * sizeof(cl_uint) );
for( size_t i = 0; i < gwx; i++ )
{
src[i] = (cl_uint)(i);
}

commandQueue.enqueueUnmapSVM( src );
}

// execution
kernel.setArg( 0, dst );
kernel.setArg( 1, src );
commandQueue.enqueueNDRangeKernel(
kernel,
cl::NullRange,
cl::NDRange{gwx} );

// verification
{
commandQueue.enqueueMapSVM(
dst,
CL_TRUE,
CL_MAP_READ,
gwx * sizeof(cl_uint) );

unsigned int mismatches = 0;

for( size_t i = 0; i < gwx; i++ )
{
if( dst[i] != i )
{
if( mismatches < 16 )
{
fprintf(stderr, "MisMatch! dst[%d] == %08X, want %08X\n",
(unsigned int)i,
dst[i],
(unsigned int)i );
}
mismatches++;
}
}

commandQueue.enqueueUnmapSVM( dst );

if( mismatches )
{
fprintf(stderr, "Error: Found %d mismatches / %d values!!!\n",
mismatches,
(unsigned int)gwx );
}
else
{
printf("Success.\n");
}
}
}
else
{
printf("Allocation failed - does this device support SVM?\n");
}

printf("Cleaning up...\n");

clSVMFree( context(), src );
clSVMFree( context(), dst );

return 0;
}
11 changes: 11 additions & 0 deletions samples/svm/101_cgsvmlinkedlist/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Copyright (c) 2024 Ben Ashbaugh
#
# SPDX-License-Identifier: MIT

add_opencl_sample(
TEST
NUMBER 101
TARGET cgsvmlinkedlist
VERSION 200
CATEGORY svm
SOURCES main.cpp)
26 changes: 26 additions & 0 deletions samples/svm/101_cgsvmlinkedlist/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# dmemlinkedlist

## Sample Purpose

This sample demonstrates how to build a linked list on the host in device Unified Shared Memory, access and modify the linked list in a kernel, then access and check the contents of the linked list on the host.

Because device Unified Shared Memory cannot be directly read from or written to on the host, the linked list must be constructed and verified using explicit memory copies.

## Key APIs and Concepts

This sample demonstrates how to indicate that a kernel may access any device Unified Shared Memory allocation using `clSetKernelExecInfo` and `CL_KERNEL_EXEC_INFO_INDIRECT_DEVICE_ACCESS_INTEL`, without specifying all allocations explicitly.
For kernels that operate on complex data structures consisting of many Unified Shared Memory allocations, this can considerably improve API efficiency.

Since Unified Shared Memory is an OpenCL extension, this sample uses the `OpenCLExt` extension loader library to query the extension APIs.
Please see the OpenCL Extension Loader [README](https://github.com/bashbaug/opencl-extension-loader) for more detail.

This sample currently uses c APIs because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.

## Command Line Options

| Option | Default Value | Description |
|:--|:-:|:--|
| `-d <index>` | 0 | Specify the index of the OpenCL device in the platform to execute on the sample on.
| `-p <index>` | 0 | Specify the index of the OpenCL platform to execute the sample on.
| `-n <number>` | 4 | Specify the number of linked list nodes to create.
Loading

0 comments on commit 7ac1560

Please sign in to comment.