Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
bashbaug committed May 26, 2024
1 parent 7ac1560 commit 1cd9541
Show file tree
Hide file tree
Showing 10 changed files with 113 additions and 153 deletions.
11 changes: 11 additions & 0 deletions include/CL/opencl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1546,6 +1546,13 @@ inline cl_int getInfoHelper(Func f, cl_uint name, T* param, int, typename T::cl_
F(cl_command_queue_info, CL_QUEUE_FAMILY_INTEL, cl_uint) \
F(cl_command_queue_info, CL_QUEUE_INDEX_INTEL, cl_uint)

#define CL_HPP_PARAM_NAME_CL_INTEL_UNIFIED_SHARED_MEMORY_(F) \
F(cl_device_info, CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL, cl_device_unified_shared_memory_capabilities_intel ) \
F(cl_device_info, CL_DEVICE_DEVICE_MEM_CAPABILITIES_INTEL, cl_device_unified_shared_memory_capabilities_intel ) \
F(cl_device_info, CL_DEVICE_SINGLE_DEVICE_SHARED_MEM_CAPABILITIES_INTEL, cl_device_unified_shared_memory_capabilities_intel ) \
F(cl_device_info, CL_DEVICE_CROSS_DEVICE_SHARED_MEM_CAPABILITIES_INTEL, cl_device_unified_shared_memory_capabilities_intel ) \
F(cl_device_info, CL_DEVICE_SHARED_SYSTEM_MEM_CAPABILITIES_INTEL, cl_device_unified_shared_memory_capabilities_intel )

template <typename enum_type, cl_int Name>
struct param_traits {};

Expand Down Expand Up @@ -1809,6 +1816,10 @@ CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_NUM_THREADS_PER_EU_INTEL,
CL_HPP_DECLARE_PARAM_TRAITS_(cl_device_info, CL_DEVICE_FEATURE_CAPABILITIES_INTEL, cl_device_feature_capabilities_intel)
#endif // cl_intel_device_attribute_query

#if defined(cl_intel_unified_shared_memory)
CL_HPP_PARAM_NAME_CL_INTEL_UNIFIED_SHARED_MEMORY_(CL_HPP_DECLARE_PARAM_TRAITS_)
#endif // cl_intel_command_queue_families

// Convenience functions

template <typename Func, typename T>
Expand Down
15 changes: 5 additions & 10 deletions samples/svm/00_svmqueries/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
# usmqueries
# svmqueries

## Sample Purpose

This sample queries and prints the Unified Shared Memory capabilities of a device.
Many USM samples require specific USM capabilities and this sample can be used to verify if it will or will not run on a device.
This sample queries and prints the Shared Virtual Memory (SVM) capabilities for all devices in the system.
Many SVM samples require specific SVM capabilities and this sample can be used to verify if it will or will not run on a device.

## Key APIs and Concepts

This sample demonstrates the new device queries for Unified Shared Memory capabilities.
This sample currently uses c APIs to perform the device queries because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.
This sample demonstrates the one query for Shared Virtual Memory capabilities.

## Command Line Options

| Option | Default Value | Description |
|:--|:-:|:--|
| `-d <index>` | 0 | Specify the index of the OpenCL device in the platform to execute on the sample on.
| `-p <index>` | 0 | Specify the index of the OpenCL platform to execute the sample on.
None
37 changes: 22 additions & 15 deletions samples/svm/00_svmqueries/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ void PrintSVMCaps(
const char* label,
cl_device_svm_capabilities svmcaps )
{
printf("%s: %s%s%s%s\n",
printf("\t%s: %s%s%s%s\n",
label,
( svmcaps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER ) ? "\n\t\tCL_DEVICE_SVM_COARSE_GRAIN_BUFFER" : "",
( svmcaps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER ) ? "\n\t\tCL_DEVICE_SVM_FINE_GRAIN_BUFFER" : "",
Expand All @@ -24,13 +24,9 @@ int main(
int argc,
char** argv )
{
int platformIndex = 0;
int deviceIndex = 0;

{
popl::OptionParser op("Supported Options");
op.add<popl::Value<int>>("p", "platform", "Platform Index", platformIndex, &platformIndex);
op.add<popl::Value<int>>("d", "device", "Device Index", deviceIndex, &deviceIndex);

bool printUsage = false;
try {
op.parse(argc, argv);
Expand All @@ -41,7 +37,7 @@ int main(

if (printUsage || !op.unknown_options().empty() || !op.non_option_args().empty()) {
fprintf(stderr,
"Usage: usmqueries [options]\n"
"Usage: svmqueries [options]\n"
"%s", op.help().c_str());
return -1;
}
Expand All @@ -50,17 +46,28 @@ int main(
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);

printf("Running on platform: %s\n",
platforms[platformIndex].getInfo<CL_PLATFORM_NAME>().c_str() );
for( size_t i = 0; i < platforms.size(); i++ )
{
printf( "Platform[%zu]: %s\n",
i,
platforms[i].getInfo<CL_PLATFORM_NAME>().c_str());

std::vector<cl::Device> devices;
platforms[i].getDevices(CL_DEVICE_TYPE_ALL, &devices);

std::vector<cl::Device> devices;
platforms[platformIndex].getDevices(CL_DEVICE_TYPE_ALL, &devices);
for( size_t d = 0; d < devices.size(); d++ )
{
printf("\tDevice[%zu]: %s\n",
d,
devices[d].getInfo<CL_DEVICE_NAME>().c_str());

printf("Running on device: %s\n",
devices[deviceIndex].getInfo<CL_DEVICE_NAME>().c_str() );
cl_device_svm_capabilities svmcaps =
devices[d].getInfo<CL_DEVICE_SVM_CAPABILITIES>();
PrintSVMCaps( "CL_DEVICE_SVM_CAPABILITIES", svmcaps );

cl_device_svm_capabilities svmcaps = devices[deviceIndex].getInfo<CL_DEVICE_SVM_CAPABILITIES>();
PrintSVMCaps( "CL_DEVICE_SVM_CAPABILITIES", svmcaps );
printf( "\n" );
}
}

printf("Cleaning up...\n");

Expand Down
30 changes: 11 additions & 19 deletions samples/svm/100_cgsvmhelloworld/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,23 @@
# dmemhelloworld
# cgsvmhelloworld

## Sample Purpose

This is the first Unified Shared Memory sample that meaningfully stores and uses data in a Unified Shared Memory allocation.
This is the first Shared Virtual Memory (SVM) sample that meaningfully stores and uses data in a Shared Virtual Memory allocation.
This sample demonstrates usage of coarse-grained SVM allocations.
Other similar samples demonstrate usage of fine-grained SVM allocations.
This sample may not run on all OpenCL devices because SVM is an optional feature, though many devices do support coarse-grained SVM.

This sample demonstrates usage of device memory allocations.
Other similar samples demonstrate usage of host memory and shared memory allocations.
Device memory allocations are owned by a specific device, and generally trade off high performance for limited access.
Kernels operating on device memory should perform just as well, if not better, than OpenCL buffers or Shared Virtual Memory allocations.

The sample initializes a source USM allocation, copies it to a destination USM allocation using a kernel, then checks on the host that the copy was performed correctly.
The sample initializes a coarse-grained SVM allocation, copies it to a destination coarse-grained SVM allocation using a kernel, then checks on the host that the copy was performed correctly.

## Key APIs and Concepts

This sample allocates device memory using `clDeviceMemAllocINTEL` and frees it using `clMemFreeINTEL`.

Since device memory cannot be directly accessed by the host, this sample initializes the source buffer by copying into it using `clEnqueueMemcpyINTEL`.
This sample also uses `clEnqueueMemcpyINTEL` to copy out of the destination buffer to verify that the copy was performed correctly.

Within a kernel, a Unified Shared Memory allocation can be accessed similar to an OpenCL buffer (a `cl_mem`), or a Shared Virtual Memory allocation.
Unified Shared Memory allocations are set as an argument to a kernel using `clSetKernelArgMemPointerINTEL`.
This sample allocates coarse-grained SVM memory using `clSVMAlloc` and frees it using `clSVMFree`.

Since Unified Shared Memory is an OpenCL extension, this sample uses the `OpenCLExt` extension loader library to query the extension APIs.
Please see the OpenCL Extension Loader [README](https://github.com/bashbaug/opencl-extension-loader) for more detail.
Since coarse-grained SVM cannot be directly accessed by the host, this sample initializes the source allocation by mapping it using `clEnqueueSVMMap`.
This sample also uses `clEnqueueSVMMap` to map the destination buffer to verify that the copy was performed correctly.

This sample currently uses c APIs because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.
Within a kernel, a Shared Virtual Memory allocation can be accessed similar to an OpenCL buffer (a `cl_mem`).
Shared Virtual Memory allocations are set as an argument to a kernel using `clSetKernelArgSVMPointer`.

## Command Line Options

Expand Down
16 changes: 6 additions & 10 deletions samples/svm/101_cgsvmlinkedlist/README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,17 @@
# dmemlinkedlist
# cgsvmlinkedlist

## Sample Purpose

This sample demonstrates how to build a linked list on the host in device Unified Shared Memory, access and modify the linked list in a kernel, then access and check the contents of the linked list on the host.
This sample demonstrates how to build a linked list on the host using coarse-grained Shared Virtual Memory (SVM) allocations, how to access and modify the linked list in a kernel, then how to access and check the contents of the linked list on the host.

Because device Unified Shared Memory cannot be directly read from or written to on the host, the linked list must be constructed and verified using explicit memory copies.
Because device coarse-grained SVM cannot be directly read from or written to on the host, this example constructs and verifies the linked list using explicit memory copies.

## Key APIs and Concepts

This sample demonstrates how to indicate that a kernel may access any device Unified Shared Memory allocation using `clSetKernelExecInfo` and `CL_KERNEL_EXEC_INFO_INDIRECT_DEVICE_ACCESS_INTEL`, without specifying all allocations explicitly.
For kernels that operate on complex data structures consisting of many Unified Shared Memory allocations, this can considerably improve API efficiency.
This sample demonstrates how to use `clEnqueueSVMMemcpy` to explicitly copy between a Shared Virtual Memory allocation and an allocation on the host.

Since Unified Shared Memory is an OpenCL extension, this sample uses the `OpenCLExt` extension loader library to query the extension APIs.
Please see the OpenCL Extension Loader [README](https://github.com/bashbaug/opencl-extension-loader) for more detail.

This sample currently uses c APIs because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.
This sample also demonstrates how to specifying a set of indirectly accessed SVM pointers using `clSetKernelExecInfo` and `CL_KERNEL_EXEC_INFO_SVM_PTRS`.
This is required for kernels that operate on complex data structures consisting of Shared Virtual Memory allocations that are not directly passed as kernel arguments.

## Command Line Options

Expand Down
29 changes: 10 additions & 19 deletions samples/svm/200_fgsvmhelloworld/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,22 @@
# dmemhelloworld
# fgsvmhelloworld

## Sample Purpose

This is the first Unified Shared Memory sample that meaningfully stores and uses data in a Unified Shared Memory allocation.
This sample demonstrates usage of fine-grained Shared Virtual Memory (SVM) allocations.
This sample may not run on all OpenCL devices because many devices do not support fine-grained SVM.

This sample demonstrates usage of device memory allocations.
Other similar samples demonstrate usage of host memory and shared memory allocations.
Device memory allocations are owned by a specific device, and generally trade off high performance for limited access.
Kernels operating on device memory should perform just as well, if not better, than OpenCL buffers or Shared Virtual Memory allocations.

The sample initializes a source USM allocation, copies it to a destination USM allocation using a kernel, then checks on the host that the copy was performed correctly.
The sample initializes a fine-grained SVM allocation, copies it to a destination coarse-grained SVM allocation using a kernel, then checks on the host that the copy was performed correctly.
Because fine-grained SVM does not require any API calls to access the contents of an allocation on the host, this sample is much simpler than the coarse-grained SVM sample.

## Key APIs and Concepts

This sample allocates device memory using `clDeviceMemAllocINTEL` and frees it using `clMemFreeINTEL`.

Since device memory cannot be directly accessed by the host, this sample initializes the source buffer by copying into it using `clEnqueueMemcpyINTEL`.
This sample also uses `clEnqueueMemcpyINTEL` to copy out of the destination buffer to verify that the copy was performed correctly.

Within a kernel, a Unified Shared Memory allocation can be accessed similar to an OpenCL buffer (a `cl_mem`), or a Shared Virtual Memory allocation.
Unified Shared Memory allocations are set as an argument to a kernel using `clSetKernelArgMemPointerINTEL`.
This sample allocates fine-grained SVM memory using `clSVMAlloc` and frees it using `clSVMFree`.

Since Unified Shared Memory is an OpenCL extension, this sample uses the `OpenCLExt` extension loader library to query the extension APIs.
Please see the OpenCL Extension Loader [README](https://github.com/bashbaug/opencl-extension-loader) for more detail.
This sample only needs to ensure the device is not accessing the fine-grained SVM allocation before initializing the contents of the source allocation or verifying that the copy was performed correctly.
For simplicity, this sample calls `clFinish` to ensure all execution is complete on the device.

This sample currently uses c APIs because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.
Within a kernel, a Shared Virtual Memory allocation can be accessed similar to an OpenCL buffer (a `cl_mem`).
Shared Virtual Memory allocations are set as an argument to a kernel using `clSetKernelArgSVMPointer`.

## Command Line Options

Expand Down
17 changes: 7 additions & 10 deletions samples/svm/201_fgsvmlinkedlist/README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,18 @@
# dmemlinkedlist
# fgsvmlinkedlist

## Sample Purpose

This sample demonstrates how to build a linked list on the host in device Unified Shared Memory, access and modify the linked list in a kernel, then access and check the contents of the linked list on the host.
This sample demonstrates how to build a linked list on the host using fine-grained Shared Virtual Memory (SVM) allocations, how to access and modify the linked list in a kernel, then how to access and check the contents of the linked list on the host.

Because device Unified Shared Memory cannot be directly read from or written to on the host, the linked list must be constructed and verified using explicit memory copies.
Because fine-grained SVM does not require any API calls to access the contents of an allocation on the host, this sample is much simpler than the coarse-grained SVM sample.

## Key APIs and Concepts

This sample demonstrates how to indicate that a kernel may access any device Unified Shared Memory allocation using `clSetKernelExecInfo` and `CL_KERNEL_EXEC_INFO_INDIRECT_DEVICE_ACCESS_INTEL`, without specifying all allocations explicitly.
For kernels that operate on complex data structures consisting of many Unified Shared Memory allocations, this can considerably improve API efficiency.
This sample only needs to ensure the device is not accessing the fine-grained SVM allocation before initializing the contents of the source allocation or verifying that the copy was performed correctly.
For simplicity, this sample calls `clFinish` to ensure all execution is complete on the device.

Since Unified Shared Memory is an OpenCL extension, this sample uses the `OpenCLExt` extension loader library to query the extension APIs.
Please see the OpenCL Extension Loader [README](https://github.com/bashbaug/opencl-extension-loader) for more detail.

This sample currently uses c APIs because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.
This sample also demonstrates how to specifying a set of indirectly accessed SVM pointers using `clSetKernelExecInfo` and `CL_KERNEL_EXEC_INFO_SVM_PTRS`.
This is still required for kernels that operate on complex data structures consisting of fine-grained Shared Virtual Memory allocations that are not directly passed as kernel arguments.

## Command Line Options

Expand Down
9 changes: 2 additions & 7 deletions samples/usm/00_usmqueries/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,13 @@

## Sample Purpose

This sample queries and prints the Unified Shared Memory capabilities of a device.
This sample queries and prints the Unified Shared Memory (USM) capabilities for all devices in the system.
Many USM samples require specific USM capabilities and this sample can be used to verify if it will or will not run on a device.

## Key APIs and Concepts

This sample demonstrates the new device queries for Unified Shared Memory capabilities.
This sample currently uses c APIs to perform the device queries because the C++ bindings do not support Unified Shared Memory (yet).
When support for Unified Shared Memory is added to the C++ bindings the samples will be updated to use the C++ bindings instead, which should simplify the sample slightly.

## Command Line Options

| Option | Default Value | Description |
|:--|:-:|:--|
| `-d <index>` | 0 | Specify the index of the OpenCL device in the platform to execute on the sample on.
| `-p <index>` | 0 | Specify the index of the OpenCL platform to execute the sample on.
None
Loading

0 comments on commit 1cd9541

Please sign in to comment.