Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Offload] Introduce offload-tblgen and initial new API implementation #108413

Merged
merged 22 commits into from
Nov 25, 2024

Conversation

callumfare
Copy link
Contributor

@callumfare callumfare commented Sep 12, 2024

Introduce offload-tblgen and an initial implementation of a subset of the new API. The tablegen files are intended to be the single source of truth for the new API, with the header files, documentation, and others bits of source all automatically generated.

TODO (based on review feedback so far):

  • Check in the generated headers
    • Add an offload-generate target to trigger the generation rather than building them every time
  • Decide how error handling should work
    • Finish up new error handling implementation
  • Decide naming convention
  • Add testing for the new API
  • Add tablegen specific testing
  • clang-tidy and use llvm:: types when possible
  • Add optional code location arguments
  • Avoid multiple returns from one function

offload-tblgen

See the included README for more information on how the API definition and generation works. I'm happy to answer any questions about it and plan to walk through it in a future LLVM Offload call.

It should be noted that struct definitions have not been fully implemented/tested as they aren't used by the initial API definitions, but finishing that off in the future shouldn't be too much work.

The tablegen tooling has been designed to be easily extended with new backends, using the classes in RecordTypes.hpp to abstract over the tablegen records.

New API

Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the urinfo and sycl-ls tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available here. Our intention is to maintain this and use it to implement and test Offload API changes with SYCL.

Demoing the new API

$ git clone -b offload_adapter https://github.com/callumfare/unified-runtime.git
$ cd unified-runtime
$ mkdir build
$ cd build
$ cmake .. -GNinja -DUR_BUILD_ADAPTER_OFFLOAD=ON \
    -DUR_OFFLOAD_INSTALL_DIR=<offload build dir containing liboffload_new.so> \
    -DUR_OFFLOAD_INCLUDE_DIR=<offload build dir containing 'offload' headers directory>
$ ninja urinfo
export LD_LIBRARY_PATH=<offload build dir containing offload plugin libraries>
$ UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo
[cuda:gpu][cuda:0] CUDA, NVIDIA GeForce GT 1030  [12030]
# Demo with tracing
$ OFFLOAD_TRACE=1 UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo
---> offloadPlatformGet(.NumEntries = 0, .phPlatforms = {}, .pNumPlatforms = 0x7ffd05e4d6e0 (2))-> OFFLOAD_RESULT_SUCCESS
---> offloadPlatformGet(.NumEntries = 2, .phPlatforms = {0x564bf4040220, 0x564bf4040240}, .pNumPlatforms = nullptr)-> OFFLOAD_RESULT_SUCCESS
...

Open questions and future work

  • The new API is implemented in a separate library (liboffload_new.so). It could just as easily be part of the existing libomptarget library - I have no strong feelings on which is better.
  • Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type.
  • It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2024

@llvm/pr-subscribers-offload

Author: Callum Fare (callumfare)

Changes

Introduce offload-tblgen and an initial implementation of a subset of the new API. The tablegen files are intended to be the single source of truth for the new API, with the header files, documentation, and others bits of source all automatically generated.

offload-tblgen

See the included README for more information on how the API definition and generation works. I'm happy to answer any questions about it and plan to walk through it in a future LLVM Offload call.

It should be noted that struct definitions have not been fully implemented/tested as they aren't used by the initial API definitions, but finishing that off in the future shouldn't be too much work.

The tablegen tooling has been designed to be easily extended with new backends, using the classes in RecordTypes.hpp to abstract over the tablegen records.

New API

Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the urinfo and sycl-ls tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available here. Our intention is to maintain this and use it to implement and test Offload API changes with SYCL.

Demoing the new API

$ git clone -b offload_adapter https://github.com/callumfare/unified-runtime.git
$ cd unified-runtime
$ mkdir build
$ cd build
$ cmake .. -GNinja -DUR_BUILD_ADAPTER_OFFLOAD=ON \
    -DUR_OFFLOAD_INSTALL_DIR=&lt;offload build dir containing liboffload_new.so&gt; \
    -DUR_OFFLOAD_INCLUDE_DIR=&lt;offload build dir containing 'offload' headers directory&gt;
$ ninja urinfo
export LD_LIBRARY_PATH=&lt;offload build dir containing offload plugin libraries&gt;
$ UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo
[cuda:gpu][cuda:0] CUDA, NVIDIA GeForce GT 1030  [12030]
# Demo with tracing
$ OFFLOAD_TRACE=1 UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo
---&gt; offloadPlatformGet(.NumEntries = 0, .phPlatforms = {}, .pNumPlatforms = 0x7ffd05e4d6e0 (2))-&gt; OFFLOAD_RESULT_SUCCESS
---&gt; offloadPlatformGet(.NumEntries = 2, .phPlatforms = {0x564bf4040220, 0x564bf4040240}, .pNumPlatforms = nullptr)-&gt; OFFLOAD_RESULT_SUCCESS
...

Open questions and future work

  • The new API is implemented in a separate library (liboffload_new.so). It could just as easily be part of the existing libomptarget library - I have no strong feelings on which is better.
  • Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type.
  • It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.

Patch is 76.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108413.diff

22 Files Affected:

  • (modified) offload/CMakeLists.txt (+3)
  • (added) offload/new-api/API/APIDefs.td (+212)
  • (added) offload/new-api/API/Common.td (+75)
  • (added) offload/new-api/API/Device.td (+101)
  • (added) offload/new-api/API/OffloadAPI.td (+15)
  • (added) offload/new-api/API/Platform.td (+94)
  • (added) offload/new-api/API/README.md (+150)
  • (added) offload/new-api/CMakeLists.txt (+36)
  • (added) offload/new-api/README.md (+8)
  • (added) offload/new-api/src/helpers.hpp (+96)
  • (added) offload/new-api/src/offload_impl.cpp (+187)
  • (added) offload/new-api/src/offload_lib.cpp (+23)
  • (modified) offload/plugins-nextgen/common/include/PluginInterface.h (+7)
  • (added) offload/tools/offload-tblgen/APIGen.cpp (+196)
  • (added) offload/tools/offload-tblgen/CMakeLists.txt (+24)
  • (added) offload/tools/offload-tblgen/EntryPointGen.cpp (+86)
  • (added) offload/tools/offload-tblgen/FuncsGen.cpp (+54)
  • (added) offload/tools/offload-tblgen/GenCommon.hpp (+58)
  • (added) offload/tools/offload-tblgen/Generators.hpp (+18)
  • (added) offload/tools/offload-tblgen/PrintGen.cpp (+206)
  • (added) offload/tools/offload-tblgen/RecordTypes.hpp (+227)
  • (added) offload/tools/offload-tblgen/offload-tblgen.cpp (+95)
diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt
index 9ffe8f56b76e67..ba7be255cfe582 100644
--- a/offload/CMakeLists.txt
+++ b/offload/CMakeLists.txt
@@ -349,6 +349,9 @@ add_subdirectory(tools)
 # Build target agnostic offloading library.
 add_subdirectory(src)
 
+add_subdirectory(tools/offload-tblgen)
+add_subdirectory(new-api)
+
 # Add tests.
 add_subdirectory(test)
 
diff --git a/offload/new-api/API/APIDefs.td b/offload/new-api/API/APIDefs.td
new file mode 100644
index 00000000000000..410a28c4c90cfe
--- /dev/null
+++ b/offload/new-api/API/APIDefs.td
@@ -0,0 +1,212 @@
+//===-- APIDefs.td - Base definitions for Offload tablegen -*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains the class definitions used to implement the Offload API,
+// as well as helper functions used to help populate relevant records.
+// See offload/API/README.md for more detailed documentation.
+//
+//===----------------------------------------------------------------------===//
+
+// Prefix for API naming. This could be hard-coded in the future when a value
+// is agreed upon.
+defvar PREFIX = "OFFLOAD";
+defvar prefix = !tolower(PREFIX);
+
+// Parameter flags
+defvar PARAM_IN = 0x1;
+defvar PARAM_OUT = 0x2;
+defvar PARAM_OPTIONAL = 0x4;
+defvar PARAM_IN_OPTIONAL = !or(PARAM_IN, PARAM_OPTIONAL);
+defvar PARAM_OUT_OPTIONAL = !or(PARAM_OUT, PARAM_OPTIONAL);
+
+// Does the type end with '_handle_t'?
+class IsHandleType<string Type> {
+  // size("_handle_t") == 9
+  bit ret = !if(!lt(!size(Type), 9), 0,
+                !ne(!find(Type, "_handle_t", !sub(!size(Type), 9)), -1));
+}
+
+// Does the type end with '*'?
+class IsPointerType<string Type> {
+  bit ret = !ne(!find(Type, "*", !sub(!size(Type), 1)), -1);
+}
+
+// Describes the valid range of a pointer parameter that reperesents an array
+class Range<string Begin, string End> {
+  string begin = Begin;
+  string end = End;
+}
+
+// Names the parameters that indicate the type and size of the data pointed to
+// by an opaque pointer parameter
+class TypeInfo<string TypeEnum, string TypeSize> {
+  string enum = TypeEnum;
+  string size = TypeSize;
+}
+
+class Param<string Type, string Name, string Desc, bits<3> Flags = 0> {
+  string type = Type;
+  string name = Name;
+  string desc = Desc;
+  bits<3> flags = Flags;
+  Range range = Range<"", "">;
+  TypeInfo type_info = TypeInfo<"", "">;
+  bit IsHandle = IsHandleType<type>.ret;
+  bit IsPointer = IsPointerType<type>.ret;
+}
+
+// A parameter whose range is described by other parameters in the function.
+class RangedParam<string Type, string Name, string Desc, bits<3> Flags, Range ParamRange> : Param<Type, Name, Desc, Flags> {
+  let range = ParamRange;
+}
+
+// A parameter (normally of type void*) which has its pointee type and size
+// described by other parameters in the function.
+class TypeTaggedParam<string Type, string Name, string Desc, bits<3> Flags, TypeInfo ParamTypeInfo> : Param<Type, Name, Desc, Flags> {
+  let type_info = ParamTypeInfo;
+}
+
+class Return<string Value, list<string> Conditions = []> {
+  string value = Value;
+  list<string> conditions = Conditions;
+}
+
+class ShouldCheckHandle<Param P> {
+  bit ret = !and(P.IsHandle, !eq(!and(PARAM_OPTIONAL, P.flags), 0));
+}
+
+class ShouldCheckPointer<Param P> {
+  bit ret = !and(P.IsPointer, !eq(!and(PARAM_OPTIONAL, P.flags), 0));
+}
+
+// For a list of returns that contains a specific return code, find and append
+// new conditions to that return
+class AppendConditionsToReturn<list<Return> Returns, string ReturnValue,
+                               list<string> Conditions> {
+  list<Return> ret =
+      !foreach(Ret, Returns,
+               !if(!eq(Ret.value, ReturnValue),
+                   Return<Ret.value, Ret.conditions#Conditions>, Ret));
+}
+
+// Add null handle checks to a function's return value descriptions
+class AddHandleChecksToReturns<list<Param> Params, list<Return> Returns> {
+  list<string> handle_params =
+      !foreach(P, Params, !if(ShouldCheckHandle<P>.ret, P.name, ""));
+  list<string> handle_params_filt =
+      !filter(param, handle_params, !ne(param, ""));
+  list<string> handle_param_conds =
+      !foreach(handle, handle_params_filt, "`NULL == "#handle#"`");
+
+  // Does the list of returns already contain ERROR_INVALID_NULL_HANDLE?
+  bit returns_has_inv_handle = !foldl(
+      0, Returns, HasErr, Ret,
+      !or(HasErr, !eq(Ret.value, PREFIX#"_RESULT_ERROR_INVALID_NULL_HANDLE")));
+
+  list<Return> returns_out = !if(returns_has_inv_handle,
+        AppendConditionsToReturn<Returns, PREFIX # "_RESULT_ERROR_INVALID_NULL_HANDLE", handle_param_conds>.ret,
+        !listconcat(Returns, [Return<PREFIX # "_RESULT_ERROR_INVALID_NULL_HANDLE", handle_param_conds>])
+    );
+}
+
+// Add null pointer checks to a function's return value descriptions
+class AddPointerChecksToReturns<list<Param> Params, list<Return> Returns> {
+  list<string> ptr_params =
+      !foreach(P, Params, !if(ShouldCheckPointer<P>.ret, P.name, ""));
+  list<string> ptr_params_filt = !filter(param, ptr_params, !ne(param, ""));
+  list<string> ptr_param_conds =
+      !foreach(ptr, ptr_params_filt, "`NULL == "#ptr#"`");
+
+  // Does the list of returns already contain ERROR_INVALID_NULL_POINTER?
+  bit returns_has_inv_ptr = !foldl(
+      0, Returns, HasErr, Ret,
+      !or(HasErr, !eq(Ret.value, PREFIX#"_RESULT_ERROR_INVALID_NULL_POINTER")));
+  list<Return> returns_out = !if(returns_has_inv_ptr,
+        AppendConditionsToReturn<Returns, PREFIX # "_RESULT_ERROR_INVALID_NULL_POINTER", ptr_param_conds>.ret,
+        !listconcat(Returns, [Return<PREFIX # "_RESULT_ERROR_INVALID_NULL_POINTER", ptr_param_conds>])
+    );
+}
+
+defvar DefaultReturns = [Return<PREFIX#"_RESULT_SUCCESS">,
+                         Return<PREFIX#"_RESULT_ERROR_UNINITIALIZED">,
+                         Return<PREFIX#"_RESULT_ERROR_DEVICE_LOST">];
+
+class APIObject {
+  string name;
+  string desc;
+}
+
+class Function : APIObject {
+  list<Param> params;
+  list<Return> returns;
+  list<string> details = [];
+  list<string> analogues = [];
+
+  list<Return> returns_with_def = !listconcat(DefaultReturns, returns);
+  list<Return> all_returns = AddPointerChecksToReturns<params,
+        AddHandleChecksToReturns<params, returns_with_def>.returns_out>.returns_out;
+}
+
+class Etor<string Name, string Desc> {
+  string name = Name;
+  string desc = Desc;
+  string tagged_type;
+}
+
+class TaggedEtor<string Name, string Type, string Desc> : Etor<Name, Desc> {
+  let tagged_type = Type;
+}
+
+class Enum : APIObject {
+  // This refers to whether the enumerator descriptions specify a return
+  // type for functions where this enum may be used as an output type. If set,
+  // all Etor values must be TaggedEtor records
+  bit is_typed = 0;
+
+  list<Etor> etors = [];
+}
+
+class StructMember<string Type, string Name, string Desc> {
+  string type = Type;
+  string name = Name;
+  string desc = Desc;
+}
+
+defvar DefaultPropStructMembers =
+    [StructMember<prefix#"_structure_type_t", "stype",
+                  "type of this structure">,
+     StructMember<"void*", "pNext", "pointer to extension-specific structure">];
+
+class StructHasInheritedMembers<string BaseClass> {
+  bit ret = !or(!eq(BaseClass, prefix#"_base_properties_t"),
+                !eq(BaseClass, prefix#"_base_desc_t"));
+}
+
+class Struct : APIObject {
+  string base_class = "";
+  list<StructMember> members;
+  list<StructMember> all_members =
+      !if(StructHasInheritedMembers<base_class>.ret,
+          DefaultPropStructMembers, [])#members;
+}
+
+class Typedef : APIObject { string value; }
+
+class FptrTypedef : APIObject {
+  list<Param> params;
+  list<Return> returns;
+}
+
+class Macro : APIObject {
+  string value;
+
+  string condition;
+  string alt_value;
+}
+
+class Handle : APIObject;
diff --git a/offload/new-api/API/Common.td b/offload/new-api/API/Common.td
new file mode 100644
index 00000000000000..d293e4addfef8b
--- /dev/null
+++ b/offload/new-api/API/Common.td
@@ -0,0 +1,75 @@
+def : Macro {
+  let name = "OFFLOAD_APICALL";
+  let desc = "Calling convention for all API functions";
+  let condition = "defined(_WIN32)";
+  let value = "__cdecl";
+  let alt_value = "";
+}
+
+def : Macro {
+  let name = "OFFLOAD_APIEXPORT";
+  let desc = "Microsoft-specific dllexport storage-class attribute";
+  let condition = "defined(_WIN32)";
+  let value = "__declspec(dllexport)";
+  let alt_value = "";
+}
+
+def : Macro {
+  let name = "OFFLOAD_DLLEXPORT";
+  let desc = "Microsoft-specific dllexport storage-class attribute";
+  let condition = "defined(_WIN32)";
+  let value = "__declspec(dllexport)";
+}
+
+def : Macro {
+  let name = "OFFLOAD_DLLEXPORT";
+  let desc = "GCC-specific dllexport storage-class attribute";
+  let condition = "__GNUC__ >= 4";
+  let value = "__attribute__ ((visibility (\"default\")))";
+  let alt_value = "";
+}
+
+def : Typedef {
+  let name = "offload_bool_t";
+  let value = "uint8_t";
+  let desc = "compiler-independent type";
+}
+
+def : Handle {
+  let name = "offload_platform_handle_t";
+  let desc = "Handle of a platform instance";
+}
+
+def : Handle {
+  let name = "offload_device_handle_t";
+  let desc = "Handle of platform's device object";
+}
+
+def : Handle {
+  let name = "offload_context_handle_t";
+  let desc = "Handle of context object";
+}
+
+def : Enum {
+  let name = "offload_result_t";
+  let desc = "Defines Return/Error codes";
+  let etors =[
+    Etor<"SUCCESS", "Success">,
+    Etor<"ERROR_INVALID_VALUE", "Invalid Value">,
+    Etor<"ERROR_INVALID_PLATFORM", "Invalid platform">,
+    Etor<"ERROR_DEVICE_NOT_FOUND", "Device not found">,
+    Etor<"ERROR_INVALID_DEVICE", "Invalid device">,
+    Etor<"ERROR_DEVICE_LOST", "Device hung, reset, was removed, or driver update occurred">,
+    Etor<"ERROR_UNINITIALIZED", "plugin is not initialized or specific entry-point is not implemented">,
+    Etor<"ERROR_OUT_OF_RESOURCES", "Out of resources">,
+    Etor<"ERROR_UNSUPPORTED_VERSION", "[Validation] generic error code for unsupported versions">,
+    Etor<"ERROR_UNSUPPORTED_FEATURE", "[Validation] generic error code for unsupported features">,
+    Etor<"ERROR_INVALID_ARGUMENT", "[Validation] generic error code for invalid arguments">,
+    Etor<"ERROR_INVALID_NULL_HANDLE", "[Validation] handle argument is not valid">,
+    Etor<"ERROR_INVALID_NULL_POINTER", "[Validation] pointer argument may not be nullptr">,
+    Etor<"ERROR_INVALID_SIZE", "[Validation] invalid size or dimensions (e.g., must not be zero, or is out of bounds)">,
+    Etor<"ERROR_INVALID_ENUMERATION", "[Validation] enumerator argument is not valid">,
+    Etor<"ERROR_UNSUPPORTED_ENUMERATION", "[Validation] enumerator argument is not supported by the device">,
+    Etor<"ERROR_UNKNOWN", "Unknown or internal error">
+  ];
+}
\ No newline at end of file
diff --git a/offload/new-api/API/Device.td b/offload/new-api/API/Device.td
new file mode 100644
index 00000000000000..2a330d7ef7c4cb
--- /dev/null
+++ b/offload/new-api/API/Device.td
@@ -0,0 +1,101 @@
+//===-- Device.td - Device definitions for Offload ---------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains Offload API definitions related to the Device handle
+//
+//===----------------------------------------------------------------------===//
+
+def : Enum {
+  let name = "offload_device_type_t";
+  let desc = "Supported device types";
+  let etors =[
+    Etor<"DEFAULT", "The default device type as preferred by the runtime">,
+    Etor<"ALL", "Devices of all types">,
+    Etor<"GPU", "GPU device type">,
+    Etor<"CPU", "CPU device type">,
+  ];
+}
+
+def : Enum {
+  let name = "offload_device_info_t";
+  let desc = "Supported device info";
+  let is_typed = 1;
+  let etors =[
+    TaggedEtor<"TYPE", "offload_device_type_t", "type of the device">,
+    TaggedEtor<"PLATFORM", "offload_platform_handle_t", "the platform associated with the device">,
+    TaggedEtor<"NAME", "char[]", "Device name">,
+    TaggedEtor<"VENDOR", "char[]", "Device vendor">,
+    TaggedEtor<"DRIVER_VERSION", "char[]", "Driver version">
+  ];
+}
+
+def : Function {
+  let name = "offloadDeviceGet";
+  let desc = "Retrieves devices within a platform";
+  let details = [
+    "Multiple calls to this function will return identical device handles, in the same order.",
+    "The number and order of handles returned from this function can be affected by environment variables that filter devices exposed through API.",
+    "The returned devices are taken a reference of and must be released with a subsequent call to olDeviceRelease.",
+    "The application may call this function from simultaneous threads, the implementation must be thread-safe"
+  ];
+  let params = [
+    Param<"offload_platform_handle_t", "hPlatform", "handle of the platform instance", PARAM_IN>,
+    Param<"offload_device_type_t", "DeviceType", "the type of the devices.", PARAM_IN>,
+    Param<"uint32_t", "NumEntries", "the number of devices to be added to phDevices."
+        "If phDevices is not NULL, then NumEntries should be greater than zero. Otherwise OFFLOAD_RESULT_ERROR_INVALID_SIZE"
+        "will be returned.", PARAM_IN>,
+    RangedParam<"offload_device_handle_t*", "phDevices", "array of handle of devices."
+        "If NumEntries is less than the number of devices available, then platform shall only retrieve that number of devices.", PARAM_OUT_OPTIONAL,
+        Range<"0", "NumEntries">>,
+    Param<"uint32_t*", "pNumDevices", "pointer to the number of devices."
+        "pNumDevices will be updated with the total number of devices available.", PARAM_OUT_OPTIONAL>
+  ];
+  let returns = [
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_SIZE", [
+      "`NumEntries == 0 && phDevices != NULL`"
+    ]>,
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_NULL_POINTER", [
+      "`NumEntries > 0 && phDevices == NULL`"
+    ]>,
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_VALUE">
+  ];
+}
+
+def : Function {
+  let name = "offloadDeviceGetInfo";
+  let desc = "Retrieves various information about device";
+  let details = [
+    "The application may call this function from simultaneous threads.",
+    "The implementation of this function should be lock-free."
+  ];
+  let params = [
+    Param<"offload_device_handle_t", "hDevice", "handle of the device instance", PARAM_IN>,
+    Param<"offload_device_info_t", "propName", "type of the info to retrieve", PARAM_IN>,
+    Param<"size_t", "propSize", "the number of bytes pointed to by pPropValue.", PARAM_IN>,
+    TypeTaggedParam<"void*", "pPropValue", "array of bytes holding the info. If propSize is not equal to or greater than the real "
+                    "number of bytes needed to return the info then the OFFLOAD_RESULT_ERROR_INVALID_SIZE error is returned and "
+                    "pPropValue is not used.", PARAM_OUT_OPTIONAL, TypeInfo<"propName" , "propSize">>,
+    Param<"size_t*", "pPropSizeRet", "pointer to the actual size in bytes of the queried propName.", PARAM_OUT_OPTIONAL>
+  ];
+  let returns = [
+    Return<"OFFLOAD_RESULT_ERROR_UNSUPPORTED_ENUMERATION", [
+      "If `propName` is not supported by the adapter."
+    ]>,
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_SIZE", [
+      "`propSize == 0 && pPropValue != NULL`",
+      "If `propSize` is less than the real number of bytes needed to return the info."
+    ]>,
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_NULL_POINTER", [
+      "`propSize != 0 && pPropValue == NULL`",
+      "`pPropValue == NULL && pPropSizeRet == NULL`"
+    ]>,
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_DEVICE">,
+    Return<"OFFLOAD_RESULT_ERROR_OUT_OF_RESOURCES">,
+    Return<"OFFLOAD_RESULT_ERROR_OUT_OF_HOST_MEMORY">
+  ];
+}
diff --git a/offload/new-api/API/OffloadAPI.td b/offload/new-api/API/OffloadAPI.td
new file mode 100644
index 00000000000000..8a0c3c40581223
--- /dev/null
+++ b/offload/new-api/API/OffloadAPI.td
@@ -0,0 +1,15 @@
+//===-- OffloadAPI.td - Root tablegen file for Offload -----*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// Always include this file first
+include "APIDefs.td"
+
+// Add API definition files here
+include "Common.td"
+include "Platform.td"
+include "Device.td"
diff --git a/offload/new-api/API/Platform.td b/offload/new-api/API/Platform.td
new file mode 100644
index 00000000000000..71af04bb831998
--- /dev/null
+++ b/offload/new-api/API/Platform.td
@@ -0,0 +1,94 @@
+//===-- Platform.td - Platform definitions for Offload -----*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains Offload API definitions related to the Platform handle
+//
+//===----------------------------------------------------------------------===//
+def : Function {
+  let name = "offloadPlatformGet";
+  let desc = "Retrieves all available platforms";
+  let details = [
+    "Multiple calls to this function will return identical platforms handles, in the same order.",
+  ];
+  let params = [
+    Param<"uint32_t", "NumEntries",
+      "The number of platforms to be added to phPlatforms. If phPlatforms is not NULL, then"
+      "NumEntries should be greater than zero, otherwise OFFLOAD_RESULT_ERROR_INVALID_SIZE"
+      "will be returned.", PARAM_IN>,
+    RangedParam<"offload_platform_handle_t*", "phPlatforms", 
+      "Array of handle of platforms. If NumEntries is"
+      "less than the number of platforms available, then offloadPlatformGet"
+      "shall only retrieve that number of platforms.",
+      PARAM_OUT_OPTIONAL, Range<"0", "NumEntries">>,
+    Param<"uint32_t*",
+      "pNumPlatforms", "returns the total number of platforms available.",
+      PARAM_OUT_OPTIONAL>
+  ];
+  let returns = [
+    Return<"OFFLOAD_RESULT_ERROR_INVALID_SIZE", [
+      "`NumEntries == 0 && phPlatforms != NULL`"
+    ]>
+  ];
+}
+
+def : Enum {
+  let name = "offload_platform_info_t";
+  let desc = "Supported platform info";
+  let is_typed = 1;
+  let etors = [
+    TaggedEtor<"NAME", "char[]", "The string denoting name of the platform. The size of the info needs to be dynamically queried.">,
+    TaggedEtor<"VENDOR_NAME", "char[]","The string denoting name of the vendor of the platform. The size of the info needs to be dynamically queried.">,
+    TaggedEtor<"VERSION", "char[]", "The string denoting the version of the platform. The size of the info needs to be dynamically queried.">,
+    TaggedEtor<"BACKEND", "offload_platform_backend_t", "The backend of the platform. Identifies the native backend adapter implementing this platform.">
+  ];
+}
+
+def : Enum {
+  let name = "offload_platform_backend_t";
+  let desc = "Identifies the native backend of the platform";
+  let etors =[
+    Etor<"UNKNOWN", "The backend is not recognized">,
+    Etor<"CUDA", "The backend is CUDA">,
+    Etor<"AMDGPU", "The backend is AMDGPU">,
+  ];
+}
+
+def : Function {
+  let name = "offloadPlatformGetInfo";
+  let desc = "Retrieves various information about platform";
+  let details = [
+    "The application may call this function from simultaneous threads.",
+    "The implementation of this function should be lock-free."
+  ];
+  let params = [
+    Param<"offload_platform_handle_t", "hPlatform", "handle of the platform", PARAM_IN>,
+    Param<"offload_platform_info_t", "propName", "type of the info to retrieve", PARAM_IN>,
+    Param<"size_t", "propSize", "the number of bytes pointed to by pPlatformInfo.", PARAM_IN>,
+    TypeTaggedParam<"void*...
[truncated]

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, glad to see something coming along here.

Generally I'd like a standalone interface into the plugins that can be targeted without the awful OpenMP interface, this seeems to be going in that direction so I'm happy about that.

I'm guessing right now this is mostly a demo and doesn't expose a functioning kernel launch? One downside to using this TableGen stuff is I have no clue what the headers look like after being generated.

@@ -349,6 +349,9 @@ add_subdirectory(tools)
# Build target agnostic offloading library.
add_subdirectory(src)

add_subdirectory(tools/offload-tblgen)
add_subdirectory(new-api)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call this new-api.

offload/new-api/API/Common.td Outdated Show resolved Hide resolved
}

def : Handle {
let name = "offload_platform_handle_t";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is platform what we call a plugin currently and a device a subset of that? It's tough to think how we should handle this in what's exported, since I would say that ideally the user doesn't need to care about the "plugin" at all. However, that's not true for cases where some queue is only valid with CUDA and not AMDGPU for example. We could probably make it a quality of the device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea of this is that platform is roughly equivalent to a plugin (with a big caveat*).

In SYCL, the platforms are exposed to the user (see here), so we can't hide these completely from the interface. Obviously that's not important to OpenMP, so anything omp related built on top of this wouldn't expose them, but in general we need something in the new API.

*In UR we actually split the concept of a plugin (called an adapter in UR) from the platform. This is because the OpenCL adapter can contain multiple platforms (one for every valid OpenCL driver on the system). For other adapters it's just a 1:1 relationship and not important. I didn't want to introduce this complication at this point though, especially since we don't even have an OpenCL plugin.

}

def : Handle {
let name = "offload_context_handle_t";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding, I think offload_ in front of everything is too verbose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, by the time I was done it was feeling a little unwieldy but I didn't want to rewrite everything without agreement - it would be good to make a decision as a group on this. ol_ or offl_ might be better choices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think there was some desire for every API function to accept some identifier information, for use with debug printing. I'm wondering if we should have some _ident or something suffix and #define the normal API entrypoints to be those with a nullptr argument.

}

def : Enum {
let name = "offload_result_t";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked a lot about error codes, where the returned value is actually an index into some table (zero is success / invalid index). The actual error code could be queried from that index or something. This is mostly because we want the library to provide more helpful error codes than generic "X went wrong" since we already do a lot of stuff like return Error::("Something called %s failed on device %d", name, device) instead of just returning failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely possible, but I have some concerns about that being the default path for returning errors.

With SYCL we programmatically deal with error codes, converting them to appropriate SYCL exceptions etc. See here for an example. Having errors just be strings makes this more difficult to handle, and liable to be unintentionally broken.

Could we have a specific error condition (e.g. RESULT_ERROR_CUSTOM) that represents a custom error state that can then be queried with a function like offloadGetLastCustomError? We have something similar in UR (UR_RESULT_ERROR_ADAPTER_SPECIFIC). Rather than using an index into a table of strings, we have a thread_local buffer for a single error message, and it's the caller's responsibility to query the contents of it after the failing API call (subsequent calls may overwrite it). This design avoids complications with the lifetime of the error strings.

Another possibility is some kind of logging callback where Offload can write out additional error/debug information for the language runtime to handle in whatever way is suitable. That would allow an enum to be used as the normal return code, while additional more descriptive information is optionally available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are not strings, the errors are indices into a pair of {error_code, error_message} in a table. APIs like CUDA and HSA already have stuff like hsa_status_string(EC) to get the string error message. What we suggested was needing something like hsa_status_code(IDX).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, yeah that makes more sense, thanks!

Would the table contain one or more errors for the last API call or is the idea that the table can be queried for any previous API call that returned an error? If it's the latter then I'm still a bit unsure about how that would work in terms of lifetime; seems like they would just pile up indefinitely since you can never safely clear them. Maybe that's not a big deal in practice but it makes me a bit worried about long-running programs. That's why we made the equivalent in UR thread-local and mandated checking the error message before making another API call on the same thread, since the implementation can safely clear the existing message(s) when setting a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do pointers or something, null is no error. That might make it easier to implement the "table" as a thread safe data structure (Since otherwise an index could never be reused).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now implemented the errors as discussed in previous calls. Success is a nullptr, an error is a pointer to a struct with an error code and an optional string.


add_public_tablegen_target(OffloadHeaderGen)

add_llvm_library(offload_new SHARED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is offload_new and what's old? What is the expected way for this to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for this new interface to be offload with the existing OpenMP interface(s) staying in omptarget. But since the new API is nowhere near usable I wanted the name to indicate that and didn't have any better ideas, hence offload_new. I don't have strong feelings on this at all so happy to go with whatever others think.

offload/new-api/CMakeLists.txt Outdated Show resolved Hide resolved
@callumfare
Copy link
Contributor Author

@jhuber6

Thanks for the quick review, I'll try to address your other comments asap.

I'm guessing right now this is mostly a demo and doesn't expose a functioning kernel launch? One downside to using this TableGen stuff is I have no clue what the headers look like after being generated.

Indeed, this doesn't expose anything to do with kernels, queues, memory, etc. right now. Just basic device discovery stuff which is enough for a quick demo - see the information in the PR description. I think implementing something that exposes all that would be achievable quickly, but getting a polished interface will obviously take much longer. Either way I didn't want to stick too much in the initial PR.

I agree that not seeing the generated headers is unfortunate. In Unified Runtime we check in the generated files (and have a CI job that fails if the generated headers don't match the output of re-running the generation). All things aside I think I'd prefer that approach, but I understand that's not how any other LLVM project works so it might be difficult to get traction there.

Copy link

github-actions bot commented Sep 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 12, 2024

I agree that not seeing the generated headers is unfortunate. In Unified Runtime we check in the generated files (and have a CI job that fails if the generated headers don't match the output of re-running the generation). All things aside I think I'd prefer that approach, but I understand that's not how any other LLVM project works so it might be difficult to get traction there.

We could probably do something similar, it's not like we're above checking in random blobs. For example, we have some random PDFs sitting around in the documentation directories.

I think in general this needs a test as well, and a clarification since as it stands this is basically a "new" runtime since it exposes the plugins separate from the OpenMP interface (which I like). Also, I'm thinking about the many issues we have with shared libraries, i.e. if someone wants to do OpenMP / Direct API stuff at the same time. (However I don't know how useful a use-case this really is).

We also need unit testing, so it would be get a head start on that.


struct offload_platform_handle_t_ {
std::unique_ptr<GenericPluginTy> Plugin;
std::vector<offload_device_handle_t_> Devices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be LLVM small vectors, since they have small object optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, similar for anything std:: there's probably something like llvm::find_if that's easier to use. Can you also run clang-tidy on the whole file to make sure everything's formatted with the right names?

return ReturnValue("Unknown platform vendor");
case OFFLOAD_PLATFORM_INFO_VERSION: {
// TODO: Implement this
return ReturnValue("v0.0.0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably codify the version right at the start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the version be in the .td file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added version information to the tablegen directly. I've just set it to 0.0.1 for now. As we approach a stable-ish API we can decide how to handle versioning, whether it's just tied to the LLVM version etc.

@tschuett
Copy link
Member

We have tblgen tests. Could you do something similar for Offload?
https://github.com/llvm/llvm-project/tree/main/llvm/test/TableGen

Could the tablgen emitter wrap everything into a namespace offload to avoid prefixes?

Copy link
Contributor

@jplehr jplehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this up and moving it forward.
I looked at it briefly and will get to it later.

}

def : Function {
let name = "offloadDeviceGet";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably too early to bikeshed, but I think, function names should read more like getOffloadDevice.

Copy link
Contributor

@jhuber6 jhuber6 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be a C API, which usually is snake_case in my experience as well.

@JonChesterfield
Copy link
Collaborator

Generating lots of redundant C++, where the redundancies are required by the language, is a good thing. Tablegen has pros and cons but is at least well established.

If we check in the generated code, we essentially guarantee that people will edit the generated files, and not the tablegen, and that review will miss this. The single source of truth will diverge from the thing actually running. However, if we do not check in the generated source, it'll be impossible to tell from the github review what has changed.

I would suggest we generate the source from tablegen, and check it in, and also have a unit testing step which generates the source again and diffs it against the copy in the local filesystem, and fails if they differ. As long as CI is running that test we'll catch the divergence.

Also don't call it new. This would be the... fourth? plugin revision and I think most of them are called some variant on "new".

Is it viable to develop the tablegen without the source checked in, and once it's holding together, have it replace the existing source files? Thus there is no new/old, just a transition from handwritten C++ into generated C++ which does the same thing, and we can permute the existing source code to look more like the generated output under CI.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 18, 2024

Also don't call it new. This would be the... fourth? plugin revision and I think most of them are called some variant on "new".

Honestly my preferred solution would just be to start this as an entirely separate interface so we don't need to deal with breaking OpenMP stuff for existing customers. That's what I wanted from offload/ in the first place. i.e. redo the plugins as well.

@callumfare
Copy link
Contributor Author

I would suggest we generate the source from tablegen, and check it in, and also have a unit testing step which generates the source again and diffs it against the copy in the local filesystem, and fails if they differ. As long as CI is running that test we'll catch the divergence.

Agreed, I think this is a good way to balance things. I'll make sure to add a test/check target to this PR so we can immediately have it covered by CI.

Also don't call it new. This would be the... fourth? plugin revision and I think most of them are called some variant on "new".

I think it makes sense to just call this the Offload API while the existing stuff is the omptarget plugin interface (or similar), if there are no objections to that. Obviously in this PR the API doesn't come close to implementing a usable interface, but we can/should focus on getting the basics working (up to and including kernel execution) asap.

Is it viable to develop the tablegen without the source checked in, and once it's holding together, have it replace the existing source files? Thus there is no new/old, just a transition from handwritten C++ into generated C++ which does the same thing, and we can permute the existing source code to look more like the generated output under CI.

Swapping it out when ready makes sense to me, although I'm not sure how we get there. I see the current approach being

  • Implement the new API on top of the existing plugin interface. This is what this PR does. We can't simply change the existing plugins without breaking omptarget (or spending significant effort updating that code at the same time).
    • This also means for new functionality we will have to add it to the old interface, even in a hacky way, so the implementation has something to use. Seems like it would be extra effort on the old plugins.
  • When the new API reaches some usable critical mass, update the plugins to implement the new API directly. Then either rewrite the plugin interface (that libomptarget uses) on top of that, or just rewrite the libomptarget code to use it directly. I'm not familiar enough with the project to say whether that's a good idea.
    • This is kind of an all-or-nothing thing where we can't really swap things out incrementally.

Honestly my preferred solution would just be to start this as an entirely separate interface so we don't need to deal with breaking OpenMP stuff for existing customers. That's what I wanted from offload/ in the first place. i.e. redo the plugins as well.

This is absolutely our (Codeplay) preference, but I'd got the impression that we'd decided against this as a group. If we did this, we could create a plugins-nextgen/offload plugin (implementing the old interface on top of the new one), to allow us to optionally run OpenMP through the new API, but not breaking anything else. Then at some future point it could become the default path. I think this would be less of a headache than the approaches described above.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 18, 2024

This is absolutely our (Codeplay) preference, but I'd got the impression that we'd decided against this as a group.

As far as I recall, that was the preference of everyone at AMD and the only ones in favor of what we have now were Johannes and the people on the RFC that didn't contribute to libomptarget.

@callumfare
Copy link
Contributor Author

As far as I recall, that was the preference of everyone at AMD and the only ones in favor of what we have now were Johannes and the people on the RFC that didn't contribute to libomptarget.

I'm not against having that discussion again, before we start actually committing new code, if it makes thing easier for us. If we keep the current approach it seems important to understand how we actually transfer the existing code (both plugins and libomptarget runtime) to the new API without breaking OpenMP things.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 18, 2024

I'm not against having that discussion again, before we start actually committing new code, if it makes thing easier for us. If we keep the current approach it seems important to understand how we actually transfer the existing code (both plugins and libomptarget runtime) to the new API without breaking OpenMP things.

libomptarget is currently written in terms of the GenericPluginTy interface, and presumably this API can be as well. The main issue I have with that is there's a lot of OpenMP specific stuff in the plugin that prevents me from adequately fixing / modifying the interface. There's also the fact that modifying stuff upstream tends to conflict with AMD's fork (mostly our fault however).

@callumfare
Copy link
Contributor Author

libomptarget is currently written in terms of the GenericPluginTy interface, and presumably this API can be as well. The main issue I have with that is there's a lot of OpenMP specific stuff in the plugin that prevents me from adequately fixing / modifying the interface. There's also the fact that modifying stuff upstream tends to conflict with AMD's fork (mostly our fault however).

Ok, I was getting confused between the GenericPluginTy interface (which we can presumably change as much as is needed to enable the new API) and the plugin-specific bits of libomptarget (__tgt_rtl_*) which I'm assuming we can't change without breaking downstream projects.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 19, 2024

Ok, I was getting confused between the GenericPluginTy interface (which we can presumably change as much as is needed to enable the new API) and the plugin-specific bits of libomptarget (__tgt_rtl_*) which I'm assuming we can't change without breaking downstream projects.

The __tgt_rtl_* functions used to be a pseudo-API because the plugins were all .so files loaded via dlopen. I reworked that but never go around to modifying the functions themselves.

@jplehr
Copy link
Contributor

jplehr commented Sep 23, 2024

Agreed, I think this is a good way to balance things. I'll make sure to add a test/check target to this PR so we can immediately have it covered by CI.

I am working on enabling the whole offload project in precommit CI. However, I'm not there yet and have hit a few roadblocks along the way. Those are mostly due to the way how we use CMake and pick-up all sorts of OpenMP CMake variables.
So, I'm looking at this, but unsure about the timeline for now.

@callumfare
Copy link
Contributor Author

I've added unit tests. The LibomptUnitTests target now builds a offload.unittests executable. It optionally takes a --platform argument to specify the platform/plugin to use, and otherwise picks a sensible default (i.e. one with an available device).

In general it should be possible to fully parameterize the tests for every available platform/device but I've not gone that far at this stage - when the API is a little more fully formed I'm happy to go back and do that.

Incidentally, I disabled the existing NextgenPluginsTest.cpp unit test because it didn't even build and seems like it targets an older version of the plugin interface. Probably could just be deleted?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'm in favor of ol_ as a prefix, snake_case for the C API, and get_x as the verbage.

@@ -0,0 +1,25 @@
VERS1.0 {
global:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just put ol_ as the prefix and then use ol_*

Comment on lines 29 to 31
if (Count < 2) {
GTEST_SKIP() << "Only one device is available on this platform.";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, LLVM style shouldn't put braces around single lines.

Suggested change
if (Count < 2) {
GTEST_SKIP() << "Only one device is available on this platform.";
}
if (Count < 2)
GTEST_SKIP() << "Only one device is available on this platform.";

@callumfare
Copy link
Contributor Author

Generally I'm in favor of ol_ as a prefix, snake_case for the C API, and get_x as the verbage.

I'll update the prefix etc. Just wondering though do we want to prefer olCamelCase instead to better match the LLVM style rather than just do what omp does? IMO snake_case looks odd unless function parameters also use snake_case, and at that point it kind of pollutes the implementation code with that style

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 30, 2024

Generally I'm in favor of ol_ as a prefix, snake_case for the C API, and get_x as the verbage.

I'll update the prefix etc. Just wondering though do we want to prefer olCamelCase instead to better match the LLVM style rather than just do what omp does? IMO snake_case looks odd unless function parameters also use snake_case, and at that point it kind of pollutes the implementation code with that style

Tough to say, pretty much up to personal preference. We could probably make the argument for camelCase because it's used in CUDA, Vulkan, OpenCL, and OpenGL while AFAIK snake_case is only in HSA.

@callumfare
Copy link
Contributor Author

I've gone for camelCase then, since it meant less refactoring anyway. I believe that all review feedback so far has been addressed now.

Comment on lines +76 to +78
std::cout << "No platform found with the name \"" << SelectedPlatform
<< "\". Choose from:" << Platforms << "\n";
std::exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe the runtime should never exit on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the offload.unittest executable rather than the runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh, didn't see any test checks and didn't parse enough of the filename.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG for now, we can adjust as we move forward.

Comment on lines +76 to +78
std::cout << "No platform found with the name \"" << SelectedPlatform
<< "\". Choose from:" << Platforms << "\n";
std::exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh, didn't see any test checks and didn't parse enough of the filename.

@shiltian
Copy link
Contributor

shiltian commented Nov 13, 2024

I'm afraid the code style of this entire stuff will be out of control. Looking at the code, there are at least three different code styles. I don't think it's a good idea to keep as it is.

we can adjust as we move forward.

IMHO, most of the time with this, it roughly means, this is it, and eventually we get so many half-baked stuff.

@jhuber6
Copy link
Contributor

jhuber6 commented Nov 13, 2024

I'm afraid the code style of this entire stuff will be out of control. Looking at the code, there are at least three different code styles. I don't think it's a good idea to keep as it is.

The public API and internal style can be different like with libomp, but it would definitely help to be consistent.

we can adjust as we move forward.

IMHO, most of the time with this, it roughly means, this is it, and eventually we get so many half-baked stuff.

To be fair I've deleted some of the half baked stuff in the past, but I get your point. Generally I'm less concerned overall since I've somewhat resigned myself to whatever the offload project is going to be. This one as-is can't do anything useful, but it's hard to track changes with such a big delta. We could probably make the argument that this could be opt-in or something.

@callumfare
Copy link
Contributor Author

Thanks for the review @jhuber6!

I'm afraid the code style of this entire stuff will be out of control. Looking at the code, there are at least three different code styles. I don't think it's a good idea to keep as it is.

The public API and internal style can be different like with libomp, but it would definitely help to be consistent.

I've used clang-tidy to keep things as consistent as possible but there are two distinct styles:

  • In the public API: ol_snake_case_t for types, olCamelCase for functions
  • In the implementation: Normal LLVM style.

I've kept the public API in that style since it's in line with most other C APIs. If anything the previous conversation was leaning towards making everything snake_case.

If there's any inconsistency beyond that then that's a mistake and I'm happy to correct it.

@shiltian We've previously discussed various aspects of style at the bi-weekly LLVM Offload calls and I think what we have now is as close to a consensus as we could get. I'm not against changing the style again to be purely LLVM style, but I'd want to discuss it as a group again at the next call (December 11th). If you'd like to join that call I'd be happy to have that conversation. In the meantime I would strongly prefer merging this as-is so progress isn't held up. The tablegen tooling means any style or design changes in the future should be fairly painless.

IMHO, most of the time with this, it roughly means, this is it, and eventually we get so many half-baked stuff.

I understand the concern, but we have many more contributions planned after this. I didn't want the scope of this initial PR to get out of control so have avoided implementing too much of the API initially. In the very short term we want to extend the API to expose all the existing functionality in the libomptarget plugins. Beyond that we want to extend functionality to enable SYCL on top of Offload.

We (Codeplay) plan on having multiple engineers working on this, once this PR is in we should be able to start ramping up effort. We've been attending the bi-weekly Offload meetings and discussing our plans there, and will continue to do so. We're happy to discuss any specific details during a future call.

@jhuber6 I don't have write access. If you're still happy with everything after the above comments would you mind landing the change?

@callumfare
Copy link
Contributor Author

@jhuber6 Gentle nudge on the above

@jhuber6 jhuber6 merged commit 8a2311c into llvm:main Nov 25, 2024
6 checks passed
Copy link

@callumfare Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 25, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building offload at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/10979

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)
...
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/bin/opt: WARNING: failed to create target machine for 'nvptx64-nvidia-cuda': unable to get target for 'nvptx64-nvidia-cuda', see --version and --triple.
39.513 [24/28/2185] Packaging LLVM offloading binary libomptarget-nvptx-sm_72.bc.out
39.523 [23/28/2186] Embedding LLVM offloading binary in devicertl-nvptx-sm_70.o
39.531 [23/27/2187] Embedding LLVM offloading binary in devicertl-nvptx-sm_72.o
39.767 [23/26/2188] Optimizing LLVM bitcode libomptarget-nvptx-sm_75.bc
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/bin/opt: WARNING: failed to create target machine for 'nvptx64-nvidia-cuda': unable to get target for 'nvptx64-nvidia-cuda', see --version and --triple.
39.771 [22/26/2189] Packaging LLVM offloading binary libomptarget-nvptx-sm_75.bc.out
39.795 [21/26/2190] Embedding LLVM offloading binary in devicertl-nvptx-sm_75.o
39.868 [21/25/2191] Building CXX object offload/src/CMakeFiles/omptarget.dir/DeviceImage.cpp.o
40.176 [21/24/2192] Building CXX object offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/offload-tblgen.cpp.o
FAILED: offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/offload-tblgen.cpp.o 
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ --target=x86_64-unknown-linux-gnu -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -DOMPT_SUPPORT=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/llvm/include -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/include -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/offload-tblgen.cpp.o -MF offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/offload-tblgen.cpp.o.d -o offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/offload-tblgen.cpp.o -c /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/offload-tblgen.cpp
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/offload-tblgen.cpp:92:10: error: no matching function for call to 'TableGenMain'
   92 |   return TableGenMain(argv[0], &OffloadTableGenMain);
      |          ^~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/llvm/include/llvm/TableGen/Main.h:27:5: note: candidate function not viable: no known conversion from 'bool (*)(raw_ostream &, RecordKeeper &)' to 'std::function<TableGenMainFn>' (aka 'function<bool (raw_ostream &, const RecordKeeper &)>') for 2nd argument
   27 | int TableGenMain(const char *argv0,
      |     ^
   28 |                  std::function<TableGenMainFn> MainFn = nullptr);
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
40.228 [21/23/2193] Building CXX object offload/tools/kernelreplay/CMakeFiles/llvm-omp-kernel-replay.dir/llvm-omp-kernel-replay.cpp.o
40.364 [21/22/2194] Optimizing LLVM bitcode libomptarget-nvptx-sm_80.bc
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/bin/opt: WARNING: failed to create target machine for 'nvptx64-nvidia-cuda': unable to get target for 'nvptx64-nvidia-cuda', see --version and --triple.
40.372 [21/21/2195] Building CXX object offload/liboffload/CMakeFiles/LLVMOffload.dir/src/OffloadLib.cpp.o
40.594 [21/20/2196] Building CXX object offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o
FAILED: offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o 
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ --target=x86_64-unknown-linux-gnu -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -DOMPT_SUPPORT=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/llvm/include -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/include -I/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o -MF offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o.d -o offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o -c /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/EntryPointGen.cpp
In file included from /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/EntryPointGen.cpp:17:
In file included from /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/GenCommon.hpp:11:
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:81:25: error: no matching constructor for initialization of 'EnumValueRec'
   81 |       vals.emplace_back(EnumValueRec{Val});
      |                         ^           ~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:64:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const EnumValueRec' for 1st argument
   64 | class EnumValueRec {
      |       ^~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:64:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'EnumValueRec' for 1st argument
   64 | class EnumValueRec {
      |       ^~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:66:12: note: candidate constructor not viable: 1st argument ('const llvm::Record *') would lose const qualifier
   66 |   explicit EnumValueRec(Record *rec) : rec(rec) {}
      |            ^            ~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:115:28: error: no matching conversion for functional-style cast from 'const llvm::Record *' to 'StructMemberRec'
  115 |       members.emplace_back(StructMemberRec(Member));
      |                            ^~~~~~~~~~~~~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:100:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const StructMemberRec' for 1st argument
  100 | class StructMemberRec {
      |       ^~~~~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:100:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'StructMemberRec' for 1st argument

@jhuber6
Copy link
Contributor

jhuber6 commented Nov 25, 2024

@callumfare ^

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 25, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-sles-build-only running on rocm-worker-hw-04-sles while building offload at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/11679

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)
...
19.509 [25/28/1739] Embedding LLVM offloading binary in devicertl-nvptx-sm_35.o
19.536 [25/27/1740] Optimizing LLVM bitcode libomptarget-amdgpu-gfx1151.bc
19.541 [24/27/1741] Optimizing LLVM bitcode libomptarget-amdgpu-gfx1153.bc
19.544 [23/27/1742] Packaging LLVM offloading binary libomptarget-amdgpu-gfx1151.bc.out
19.548 [22/27/1743] Building CXX object offload/src/CMakeFiles/omptarget.dir/DeviceImage.cpp.o
19.548 [22/26/1744] Packaging LLVM offloading binary libomptarget-amdgpu-gfx1153.bc.out
19.563 [21/26/1745] Embedding LLVM offloading binary in devicertl-amdgpu-gfx1151.o
19.564 [21/25/1746] Embedding LLVM offloading binary in devicertl-amdgpu-gfx1153.o
19.767 [21/24/1747] Building CXX object offload/liboffload/CMakeFiles/LLVMOffload.dir/src/OffloadLib.cpp.o
19.836 [21/23/1748] Building CXX object offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/APIGen.cpp.o
FAILED: offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/APIGen.cpp.o 
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/./bin/clang++ --target=x86_64-unknown-linux-gnu -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -DOMPT_SUPPORT=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/include -I/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/include -Iopenmp/runtime/src -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/APIGen.cpp.o -MF offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/APIGen.cpp.o.d -o offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/APIGen.cpp.o -c /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/APIGen.cpp
In file included from /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/APIGen.cpp:19:
In file included from /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/GenCommon.hpp:11:
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:81:25: error: no matching constructor for initialization of 'EnumValueRec'
   81 |       vals.emplace_back(EnumValueRec{Val});
      |                         ^           ~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:64:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const EnumValueRec' for 1st argument
   64 | class EnumValueRec {
      |       ^~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:64:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'EnumValueRec' for 1st argument
   64 | class EnumValueRec {
      |       ^~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:66:12: note: candidate constructor not viable: 1st argument ('const llvm::Record *') would lose const qualifier
   66 |   explicit EnumValueRec(Record *rec) : rec(rec) {}
      |            ^            ~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:115:28: error: no matching conversion for functional-style cast from 'const llvm::Record *' to 'StructMemberRec'
  115 |       members.emplace_back(StructMemberRec(Member));
      |                            ^~~~~~~~~~~~~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:100:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const StructMemberRec' for 1st argument
  100 | class StructMemberRec {
      |       ^~~~~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:100:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'StructMemberRec' for 1st argument
  100 | class StructMemberRec {
      |       ^~~~~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:102:12: note: candidate constructor not viable: 1st argument ('const llvm::Record *') would lose const qualifier
  102 |   explicit StructMemberRec(Record *rec) : rec(rec) {}
      |            ^               ~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:133:18: error: assigning to 'BitsInit *' from 'const BitsInit *' discards qualifiers
  133 |     flags = rec->getValueAsBitsInit("flags");
      |             ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/APIGen.cpp:205:20: error: no matching constructor for initialization of 'MacroRec'
  205 |       ProcessMacro(MacroRec{R}, OS);
      |                    ^       ~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:29:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const MacroRec' for 1st argument
   29 | class MacroRec {
      |       ^~~~~~~~
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:29:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'MacroRec' for 1st argument
   29 | class MacroRec {

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 25, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building offload at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/9113

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)
...
39.964 [21/25/2187] Optimizing LLVM bitcode libomptarget-nvptx-sm_75.bc
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/opt: WARNING: failed to create target machine for 'nvptx64-nvidia-cuda': unable to get target for 'nvptx64-nvidia-cuda', see --version and --triple.
39.970 [20/25/2188] Packaging LLVM offloading binary libomptarget-nvptx-sm_75.bc.out
39.995 [19/25/2189] Embedding LLVM offloading binary in devicertl-nvptx-sm_75.o
40.002 [19/24/2190] Building CXX object offload/src/CMakeFiles/omptarget.dir/DeviceImage.cpp.o
40.160 [19/23/2191] Optimizing LLVM bitcode libomptarget-nvptx-sm_80.bc
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/opt: WARNING: failed to create target machine for 'nvptx64-nvidia-cuda': unable to get target for 'nvptx64-nvidia-cuda', see --version and --triple.
40.164 [18/23/2192] Packaging LLVM offloading binary libomptarget-nvptx-sm_80.bc.out
40.183 [17/23/2193] Embedding LLVM offloading binary in devicertl-nvptx-sm_80.o
40.245 [17/22/2194] Building CXX object offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o
FAILED: offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o 
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang++ --target=x86_64-unknown-linux-gnu -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -DOMPT_SUPPORT=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/include -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/include -I/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o -MF offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o.d -o offload/tools/offload-tblgen/CMakeFiles/offload-tblgen.dir/EntryPointGen.cpp.o -c /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/EntryPointGen.cpp
In file included from /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/EntryPointGen.cpp:17:
In file included from /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/GenCommon.hpp:11:
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:81:25: error: no matching constructor for initialization of 'EnumValueRec'
   81 |       vals.emplace_back(EnumValueRec{Val});
      |                         ^           ~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:64:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const EnumValueRec' for 1st argument
   64 | class EnumValueRec {
      |       ^~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:64:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'EnumValueRec' for 1st argument
   64 | class EnumValueRec {
      |       ^~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:66:12: note: candidate constructor not viable: 1st argument ('const llvm::Record *') would lose const qualifier
   66 |   explicit EnumValueRec(Record *rec) : rec(rec) {}
      |            ^            ~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:115:28: error: no matching conversion for functional-style cast from 'const llvm::Record *' to 'StructMemberRec'
  115 |       members.emplace_back(StructMemberRec(Member));
      |                            ^~~~~~~~~~~~~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:100:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const StructMemberRec' for 1st argument
  100 | class StructMemberRec {
      |       ^~~~~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:100:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'StructMemberRec' for 1st argument
  100 | class StructMemberRec {
      |       ^~~~~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:102:12: note: candidate constructor not viable: 1st argument ('const llvm::Record *') would lose const qualifier
  102 |   explicit StructMemberRec(Record *rec) : rec(rec) {}
      |            ^               ~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:133:18: error: assigning to 'BitsInit *' from 'const BitsInit *' discards qualifiers
  133 |     flags = rec->getValueAsBitsInit("flags");
      |             ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/EntryPointGen.cpp:134:24: error: no matching constructor for initialization of 'FunctionRec'
  134 |     EmitValidationFunc(FunctionRec{R}, OS);
      |                        ^          ~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:192:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'const llvm::Record *' to 'const FunctionRec' for 1st argument
  192 | class FunctionRec {
      |       ^~~~~~~~~~~
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/tools/offload-tblgen/RecordTypes.hpp:192:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'const llvm::Record *' to 'FunctionRec' for 1st argument
  192 | class FunctionRec {

jhuber6 added a commit that referenced this pull request Nov 25, 2024
@jhuber6
Copy link
Contributor

jhuber6 commented Nov 25, 2024

Reverted for now, relanding is a pain on GH.

frasercrmck pushed a commit that referenced this pull request Nov 27, 2024
…mentation (#108413) (#117704)

Relands changes from #108413 - this was reverted due to build issues.
The problem was just that the `offload-tblgen` tool was behind recent
changes to tablegen that ensure `const` records. This has been fixed and
the PR is otherwise identical.

___

### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
frasercrmck added a commit that referenced this pull request Nov 27, 2024
…PI implementation (#108413) (#117704)"

This reverts commit c979ec0.

This showed failures in the post-merge CI.
sommerlukas pushed a commit that referenced this pull request Nov 28, 2024
…plementation (#108413. #117704) (#117894)

Relands #117704, which relanded changes from #108413 - this was reverted
due to build issues. The new offload library did not build with
`LIBOMPTARGET_OMPT_SUPPORT` enabled, which was not picked up by
pre-merge testing.

The last commit contains the fix; everything else is otherwise identical
to the approved PR.
___

### New API

Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.

The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.

### Demoing the new API

```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests 
```


### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
jplehr added a commit that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants