Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
347 changes: 347 additions & 0 deletions sycl/doc/design/DeviceConfigFile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,347 @@
# Implementation Design for Device Configuration File
This design document describes the implementation of the DPC++ Device
Configuration File.

In summary, there several scenarios where we need to know information about a
target at compile-time, which is the main purpose of this Device Configuration
File. Examples are `any_device_has/all_devices_have` which defines macros
depending on the optional features supported by a target; or conditional AOT
compilation based on optional features used in kernels and supported by targets.

## Requirements
We need a default Device Configuration File embedded in the compiler describing
the well known targets at the time of building the compiler. This embedded
knowledge must be extendable, since our AOT toolchain allows compiling for
targets not known at the time of building the compiler so long as the
appropriate toolchain --AOT compiler and driver-- support such targets. In
other words, we need to provide a way for users to add entries for new targets or
update existing targets at application compile time.

An entry of the Device Configuration File should include:
- Name of the target. Target names should be spelled exactly as expected in
`-fsycl-targets`, since these are going to be used to implement validation of
supported targets.
- List of supported aspects.
- List of supported sub-group sizes.
- [Optional] `aot-toolchain` name/identifier describing the toolchain used to compile
for this target. This information is optional because we plan to implement an
auto-detection mechanism that is able to infer the `aot-toolchain` from the
target name for well known targets.
- [Optional] `aot-toolchain-%option_name` information to be passed to the
`aot-toolchain` command. This information is optional. For some targets, the
auto-detection mechanism might be able to infer values for this. One example of this
information would be `ocloc-device %device_id`.

The information provided in the Device Configuration File is required from
different tools and compiler modules:
- Compiler driver:
- `any_device_has/all_devices_have` requires compiler driver to read the
config file and define corresponding macros.
[[DeviceAspectTraitDesign](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/DeviceAspectTraitDesign.md)]
- Compiler driver requires `aot-toolchain` and `ocloc-device` to trigger the
compilation for the required targets.
[https://github.com/intel/llvm/pull/6775/files]
- `sycl-aspect-filter`:
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#aspect-filter-tool

Finally, overhead should be minimal. Particularly, users should not pay for what
they do not use. This motivates our decision to embed the default Device
Configuration File rather than releasing it as a separate file.

## High-Level Design
The default Device Configuration File is a `.td` file located in the compiler
source code. `.td` is the file extension for [LLVM
TableGen](https://llvm.org/docs/TableGen/). This default file will include all
the devices known by the developers at the time of the release. During the
build process, using a custom TableGen backend, we generate a `.inc` C++ file
containing a `std::map` with one key/value element for each entry in the `.td`
file. Using a map we can later update or add new elements if the user provides
new targets at application compile time. Finally, the tools and compiler
modules that need information about the targets can simply query the map to get
it.

Further information about TableGen can be found in [TableGenFundamentals](https://releases.llvm.org/1.9/docs/TableGenFundamentals.html).

### New `TableGen` backend
Note: This [guide](https://llvm.org/docs/TableGen/BackGuide.html) details how
to implement new TableGen backends. Also, the [Search
Indexes](https://llvm.org/docs/TableGen/BackEnds.html#search-indexes) backend
already does something very similar to what we seek. It generates a table that
provides a lookup function, but it cannot be extended with new entries. We can
use _Search Indexes_ backend as inspiration for ours.

Our backend should generate a map where the key is the target name and the value
is an object of a custom class/struct including all the information required.

Firstly, we need to provide a file describing the `DynamicTable` class. An
example for this is `SearchableTable.td`, which describes `GenericEnum`, and
`GenericTable` classes for `gen-searchable-tables` backend. File
`llvm/include/llvm/TableGen/DynamicTable.td` should look like the one below:
```
//===- DynamicTable.td ----------------------------------*- tablegen -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file defines the key top-level classes needed to produce a reasonably
// generic dynamic table that can be updated in runtime. DynamicTable objects
// can be defined using the class in this file:
// 1. (Dynamic) Tables. By instantiating the DynamicTable
// class once, a table with the name of the instantiating def is generated and
// guarded by the GET_name_IMPL preprocessor guard.
//
//===----------------------------------------------------------------------===//
// Define a record derived from this class to generate a dynamic table. This
// table resembles a hashtable with a key-value pair, and can updated in runtime.
//
// The name of the record is used as the name of the global primary array of
// entries of the table in C++.
class DynamicTable {
// Name of a class. The table will have one entry for each record that
// derives from that class.
string FilterClass;

// Name of the C++ struct/class type that holds table entries. The
// declaration of this type is not generated automatically.
string CppTypeName = FilterClass;

// List of the names of fields of collected records that contain the data for
// table entries, in the order that is used for initialization in C++.
//
// TableGen needs to know the type of the fields so that it can format
// the initializers correctly.
//
// For each field of the table named xxx, TableGen will look for a field
// named TypeOf_xxx and use that as a more detailed description of the
// type of the field.

// class MyTableEntry {
// MyEnum V;
// ...
// }
//
// def MyTable : DynamicTable {
// let FilterClass = "MyTableEntry";
// let Fields = ["V", ...];
// string TypeOf_V = "list<int>";
// }
list<string> Fields;
}
```

This file should be included --either directly or indirectly-- in any other
`.td` file that uses `DynamicTable` class.

The default device configuration `.td` file should look like the one below:
```
include "llvm/TableGen/DynamicTable.td"

// Aspect and all the aspects definitions could be outlined
// to another .td file that could be included into this file
class Aspect<string name> {
string Name = name;
}

def AspectCpu : Aspect<"cpu">;
def AspectGpu : Aspect<"gpu">;
def AspectAccelerator : Aspect<"accelerator">;
def AspectCustom : Aspect<"custom">;
def AspectFp16 : Aspect<"fp16">;
def AspectFp64 : Aspect<"fp64">;
def AspectImage : Aspect<"image">;
def AspectOnline_compiler : Aspect<"online_compiler">;
def AspectOnline_linker : Aspect<"online_linker">;
def AspectQueue_profiling : Aspect<"queue_profiling">;
def AspectUsm_device_allocations : Aspect<"usm_device_allocations">;
def AspectUsm_host_allocations : Aspect<"usm_host_allocations">;
def AspectUsm_shared_allocations : Aspect<"usm_shared_allocations">;
def AspectUsm_system_allocations : Aspect<"usm_system_allocations">;
def AspectExt_intel_pci_address : Aspect<"ext_intel_pci_address">;
def AspectExt_intel_gpu_eu_count : Aspect<"ext_intel_gpu_eu_count">;
def AspectExt_intel_gpu_eu_simd_width : Aspect<"ext_intel_gpu_eu_simd_width">;
def AspectExt_intel_gpu_slices : Aspect<"ext_intel_gpu_slices">;
def AspectExt_intel_gpu_subslices_per_slice : Aspect<"ext_intel_gpu_subslices_per_slice">;
def AspectExt_intel_gpu_eu_count_per_subslice : Aspect<"ext_intel_gpu_eu_count_per_subslice">;
def AspectExt_intel_max_mem_bandwidth : Aspect<"ext_intel_max_mem_bandwidth">;
def AspectExt_intel_mem_channel : Aspect<"ext_intel_mem_channel">;
def AspectUsm_atomic_host_allocations : Aspect<"usm_atomic_host_allocations">;
def AspectUsm_atomic_shared_allocations : Aspect<"usm_atomic_shared_allocations">;
def AspectAtomic64 : Aspect<"atomic64">;
def AspectExt_intel_device_info_uuid : Aspect<"ext_intel_device_info_uuid">;
def AspectExt_oneapi_srgb : Aspect<"ext_oneapi_srgb">;
def AspectExt_oneapi_native_assert : Aspect<"ext_oneapi_native_assert">;
def AspectHost_debuggable : Aspect<"host_debuggable">;
def AspectExt_intel_gpu_hw_threads_per_eu : Aspect<"ext_intel_gpu_hw_threads_per_eu">;
def AspectExt_oneapi_cuda_async_barrier : Aspect<"ext_oneapi_cuda_async_barrier">;
def AspectExt_oneapi_bfloat16_math_functions : Aspect<"ext_oneapi_bfloat16_math_functions">;
def AspectExt_intel_free_memory : Aspect<"ext_intel_free_memory">;
def AspectExt_intel_device_id : Aspect<"ext_intel_device_id">;
def AspectExt_intel_memory_clock_rate : Aspect<"ext_intel_memory_clock_rate">;
def AspectExt_intel_memory_bus_width : Aspect<"ext_intel_memory_bus_width">;
def AspectEmulated : Aspect<"emulated">;

def TargetTable : DynamicTable {
let FilterClass = "TargetInfo";
let Fields = ["TargetName", "aspects", "maySupportOtherAspects",
"subGroupSizes", "aotToolchain", "aotToolchainOptions"];
string TypeOf_aspects = "list<Aspect>";
string TypeOf_subGroupSizes = "list<int>"
}

class TargetInfo <string tgtName, list<Aspect> aspectList, bit otherAspects,
list<int> listSubGroupSizes, string toolchain, string options>
{
list<Aspect> aspects = aspectList;
bits<1> maySupportOtherAspects = otherAspects;
list<int> subGroupSizes = listSubGroupSizes;
string aotToolchain = toolchain;
string aotToolchainOptions = options;
}

def : TargetInfo<"TargetA", [AspectCpu, AspectAtomic64],
0, [8, 16], "ocloc", "-device tgtA">;
def : TargetInfo<"TargetB", [AspectGpu, AspectFp16],
0, [8, 16], "ocloc", "-device tgtB">;
def : TargetInfo<"TargetC", [AspectEmulated, AspectImage],
0, [8, 32], "ocloc", "-device tgtC, -option2 val">;
```
Note: backends tested don't allow lists within `TargetInfo` class. This is a
backend limitation, rather than a TableGen limitation. Thus, we should be able
to lift this limitation in our own backend, as shown in the initial prototype
implemented to drive the design.

The generated `.inc` file should look like the example below:
```c++
std::map<std::string, TargetInfo> TargetTable = {
{"TargetA",
{{"cpu", "atomic64"}, 0, {8, 16}, "ocloc", "-device tgtA"}},
{"TargetB",
{{"gpu", "fp16"}, 0, {8, 16}, "ocloc", "-device tgtB"}},
{"TargetC",
{{"emulated", "image"}, 0, {8, 32}, "ocloc", "-device tgtC, -option2 val"}}};
```

We also need a header file that includes the `.inc` file generated by the
TableGen backend. Other backends don't generate the definition of `struct
TargetInfo`, and this seems a good idea to me: it simplifies the backend
implementation, and it is easier for developers to check the data structure
to understand how to work with it. The idea is simply to define the struct
in this header file. This header file should look like the code below:
```c++
namespace DeviceConfigFile {
struct TargetInfo {
bool maySupportOtherAspects;
std::vector<std::string> aspects;
std::vector<unsigned> subGroupSizes;
std::string aotToolchain;
std::string aotToolchainOptions;
};

#include "device_config_file.inc"
using TargetTable_t = std::map<std::string, TargetInfo>;
}; // namespace DeviceConfigFile
```

Other modules can query the map to get the information like in the example
below:
```c++
DeviceConfigFile::TargetInfo info = DeviceConfigFile::targets.find("TargetA");
if (info == DeviceConfigFile::targets.end()) {
/* Target not found */
...
} else {
auto aspects = info.aspects;
auto maySupportOtherAspects = info.maySupportOtherAspects;
auto subGroupSizes = info.subGroupSizes;
...
}
```

## Tools and Modules Interacting with Device Config File
This is a list of the tools and compiler modules that require using the file:
- The *compiler driver* needs the file to determine the set of legal values for
`-fsycl-targets`.
- The *compiler driver* needs the file to define macros for `any_device_has/all_devices_have`.
- *Clang* needs the file to emit diagnostics related to `-fsycl-fixed-targets.`
- `sycl-post-link` needs the file to filter kernels in device images when doing AOT
compilation.

Following, you can find the changes required in different parts of the project
in more detail.

### Changes to Build Infrastructure
We need the information about the targets in multiple tools and compiler
modules listed in [Requirements](#Requirements). Thus, we need to make sure
that the generation of the `.inc` file out of the `.td` file is done in time
for all the consumers. The command we need to run for TableGen is `llvm-tblgen
-gen-dynamic-tables -I /llvm-root/llvm/include/ input.td -o output.inc`.
Additionally, we need to set dependencies adequately so that this command is
run before any of the consumers need it.

### Changes to the DPC++ Frontend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is a bit unclear: I understand that clang driver may read the option, open a file and modify the map we have with default config, but what about other tools, like sycl-post-link? Will that option be propagated there, so the tool does the same handling?

Will we copy-paste the code opening a file and modifying default config, or will we put into helper header (one which contains TargetInfo struct definition)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with how different processes are invoked from the driver, so I was assuming here that options are propagated to all the processes. Thus, yes, my original idea was having the different processes to do the same handling. I think handling the map is easier, while modifying files is more error-prone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with how different processes are invoked from the driver, so I was assuming here that options are propagated to all the processes.

-### would help here. It prints commands which compiler driver would execute, without actually invoking them. Just add this option to any compilation command (even the simplest one), to explore the flow.

Generally, driver options are not propagated down to tools automatically and each option is handled on case by case bases. Not all options are applicable for all tools, not all options have 1:1 equivalents, etc. A few examples here: -fsycl-device-code-split is only needed for sycl-post-link and it looks like -split there. -g is being transformed into several options like -dwarf-version=5 -debug-info-kind=....

Thus, yes, my original idea was having the different processes to do the same handling. I think handling the map is easier, while modifying files is more error-prone.

Then we should document that every tool which is supposed to work with the config, should be extended with this new option and it should be propagated from the driver to a tool.

To allow users to add new targets we provide a new flag:
`fsycl-device-config-file=/path/to/file.yaml`. Users can pass a `.yaml` file
describing the targets to be added/updated. An example of how such `.yaml` file
should look like is shown below.
```
intel_gpu_skl:
aspects: [aspect_name1, aspect_name2]
may_support_other_aspects: true/false
sub-group-sizes: [1, 2, 4, 8]
aot-toolchain: ocloc
aot-toolchain-options: -device skl
```
The frontend module should parse the user-provided `.yaml` file and update the
map with the new information about targets. LLVM provides
[YAML/IO](https://llvm.org/docs/YamlIO.html) library to easily parse `.yaml`
files. The driver should propagate this option to all the tools that require
the Device Configuration File (e.g. `sycl-post-link`) so that each of the
tools can modify the map according to the user extensions described in the
`.yaml` file.

As mentioned in [Requirements](#Requirements), there is an auto-detection
mechanism for `aot-toolchain` and `aot-toolchain-options` that is able to
infer these from the target name. In the `.yaml` example shown above the target
name is `intel_gpu_skl`. From that name, we can infer that `aot-toolchain` is
`ocloc` because the name starts with `intel_gpu`. Also, we can infer that it needs
`aot-toolchain-options` set to `-device skl` just by keeping what is left after the
prefix `intel_gpu`.

#### Potential Issues/Limitations
- Multiple targets with the same name: On the one hand, the compiler emits a
warning so that the user is aware that multiple targets share the same name. On
the other hand, it simply processes each new entry and updates the map with the
latest information found.

The auto-detection mechanism is a best effort to relieve users from specifying
`aot-toolchain` and `aot-toolchain-options` from well known devices. However,
it has its own limitations and potential issues:
- Rules for target names: As of now, auto-detection is only available for Intel GPU
targets. All targets starting with `intel_gpu_` will automatically set
`aot-toolchain=ocloc` and `aot-toolchain-options=-device suffix` being suffix the part
left after `intel_gpu_` prefix.
- User specifies `aot-toolchain` and `aot-toolchain-options` for a target name
that can be auto-detected: user-specified information has precedence over auto-detected
information.

## Testing
There is a danger that the device configuration file will get out-of-sync with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test already for any_device_has, which goes over each item in aspects.def file and tries to instantiate any_device_has with that enumerator.

We could do the same for default device config. if_architecture_is could be used to detect which exact device we are running on and .inc file with default config can be included into SYCL unit-tests as well (perhaps we may want to select a different location for it, so it is clear that this test requires some compiler internals to be present), i.e. that is not exactly a runtime test.

However, I still don't think that such approach will allow us to be sure that nothing went out-of-sync, because it requires us to run the test on every different target we have in our config file. We most likely won't have all of them available to us at any time even for manual run of the test, not to say about automated launch of it.

Not that we shouldn't run this test in our regular testing, but just highlighting that it won't automatically give us 100% coverage over all known targets.

An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (aspects.def) with list of aspects available in .td file with default config description. This way, if someone adds a new aspect, they won't forget to register it in a .td file and hopefully, will not forget to add it to descriptions of known targets in that .td file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (aspects.def) with list of aspects available in .td file with default config description. This way, if someone adds a new aspect, they won't forget to register it in a .td file and hopefully, will not forget to add it to descriptions of known targets in that .td file.

This would be just to make sure that new aspects added to aspects.def are also added to the .td file list of aspects, if I understand correctly, but it doesn't really check whether a device is returning the correct set of aspects. Is my understanding correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (aspects.def) with list of aspects available in .td file with default config description. This way, if someone adds a new aspect, they won't forget to register it in a .td file and hopefully, will not forget to add it to descriptions of known targets in that .td file.

This would be just to make sure that new aspects added to aspects.def are also added to the .td file list of aspects, if I understand correctly, but it doesn't really check whether a device is returning the correct set of aspects. Is my understanding correct?

Right, that is a separate test, which is intended to at least highlight that default config maybe outdated, because it is not even aware of that new aspect. It is not 100% bulletproof, because it doesn't check that you actually used the aspect in description of any target. However, such test doesn't require any actual HW to be present on a machine and therefore could also be useful during development when new aspects are added.

actual device capabilities. In order to prevent that, we need testing to validate
that the device config file does not go out-of-sync. There are two tests that we
should include:
- A test that compares the list of aspects known to SYCL RT (defined in `aspects.def`)
with the list of aspects defined in the `.td` file describing the default configuration.
This will be useful to detect new aspects added to SYCL RT that have not been added in
the `.td` file.
- A test that compares the aspects listed in the `.td` file with the aspects reported
via `device::has` for each device listed in the `.td` file. Both lists should match.
This test could copy the mechanism of the test for `any_device_has` that goes over each
item in `aspects.def` and tries to instantiate `any_device_has` with that enumerator.

Neither of the tests provides guarantees that nothing went out-of-sync *per se*, we
would require running the second test in all the targets described in the `.td` file
for such guarantees, but at least provides the mechanism to detect potential desyncs.

1 change: 1 addition & 0 deletions sycl/doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Design Documents for the oneAPI DPC++ Compiler
design/KernelFusionJIT
design/NonRelocatableDeviceCode
design/DeviceAspectTraitDesign
design/DeviceConfigFile
design/PropagateCompilerFlagsToRuntime
New OpenCL Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/opencl-extensions>
New SPIR-V Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/spirv-extensions>
Expand Down