-
Notifications
You must be signed in to change notification settings - Fork 802
[SYCL][Docs] Add design document for Device Config File #9371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1c20239
9151d5b
953ecba
a3649ef
d65b450
7926915
ff60ea4
776ca8f
19fea4d
29c91e7
783870b
dcbe1d0
fa9a391
d9e1926
02b8a50
5a55e59
d3da6ff
9efa9fb
6fb0c36
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,347 @@ | ||
| # Implementation Design for Device Configuration File | ||
| This design document describes the implementation of the DPC++ Device | ||
| Configuration File. | ||
|
|
||
| In summary, there several scenarios where we need to know information about a | ||
| target at compile-time, which is the main purpose of this Device Configuration | ||
| File. Examples are `any_device_has/all_devices_have` which defines macros | ||
| depending on the optional features supported by a target; or conditional AOT | ||
| compilation based on optional features used in kernels and supported by targets. | ||
|
|
||
| ## Requirements | ||
| We need a default Device Configuration File embedded in the compiler describing | ||
| the well known targets at the time of building the compiler. This embedded | ||
| knowledge must be extendable, since our AOT toolchain allows compiling for | ||
| targets not known at the time of building the compiler so long as the | ||
| appropriate toolchain --AOT compiler and driver-- support such targets. In | ||
| other words, we need to provide a way for users to add entries for new targets or | ||
| update existing targets at application compile time. | ||
|
|
||
| An entry of the Device Configuration File should include: | ||
| - Name of the target. Target names should be spelled exactly as expected in | ||
| `-fsycl-targets`, since these are going to be used to implement validation of | ||
| supported targets. | ||
| - List of supported aspects. | ||
| - List of supported sub-group sizes. | ||
| - [Optional] `aot-toolchain` name/identifier describing the toolchain used to compile | ||
| for this target. This information is optional because we plan to implement an | ||
| auto-detection mechanism that is able to infer the `aot-toolchain` from the | ||
| target name for well known targets. | ||
| - [Optional] `aot-toolchain-%option_name` information to be passed to the | ||
| `aot-toolchain` command. This information is optional. For some targets, the | ||
| auto-detection mechanism might be able to infer values for this. One example of this | ||
| information would be `ocloc-device %device_id`. | ||
|
|
||
| The information provided in the Device Configuration File is required from | ||
| different tools and compiler modules: | ||
| - Compiler driver: | ||
| - `any_device_has/all_devices_have` requires compiler driver to read the | ||
| config file and define corresponding macros. | ||
| [[DeviceAspectTraitDesign](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/DeviceAspectTraitDesign.md)] | ||
| - Compiler driver requires `aot-toolchain` and `ocloc-device` to trigger the | ||
| compilation for the required targets. | ||
| [https://github.com/intel/llvm/pull/6775/files] | ||
| - `sycl-aspect-filter`: | ||
| https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#aspect-filter-tool | ||
|
|
||
| Finally, overhead should be minimal. Particularly, users should not pay for what | ||
| they do not use. This motivates our decision to embed the default Device | ||
| Configuration File rather than releasing it as a separate file. | ||
|
|
||
| ## High-Level Design | ||
| The default Device Configuration File is a `.td` file located in the compiler | ||
| source code. `.td` is the file extension for [LLVM | ||
| TableGen](https://llvm.org/docs/TableGen/). This default file will include all | ||
| the devices known by the developers at the time of the release. During the | ||
| build process, using a custom TableGen backend, we generate a `.inc` C++ file | ||
| containing a `std::map` with one key/value element for each entry in the `.td` | ||
| file. Using a map we can later update or add new elements if the user provides | ||
| new targets at application compile time. Finally, the tools and compiler | ||
| modules that need information about the targets can simply query the map to get | ||
| it. | ||
|
|
||
| Further information about TableGen can be found in [TableGenFundamentals](https://releases.llvm.org/1.9/docs/TableGenFundamentals.html). | ||
|
|
||
| ### New `TableGen` backend | ||
| Note: This [guide](https://llvm.org/docs/TableGen/BackGuide.html) details how | ||
| to implement new TableGen backends. Also, the [Search | ||
| Indexes](https://llvm.org/docs/TableGen/BackEnds.html#search-indexes) backend | ||
| already does something very similar to what we seek. It generates a table that | ||
| provides a lookup function, but it cannot be extended with new entries. We can | ||
| use _Search Indexes_ backend as inspiration for ours. | ||
|
|
||
| Our backend should generate a map where the key is the target name and the value | ||
| is an object of a custom class/struct including all the information required. | ||
|
|
||
| Firstly, we need to provide a file describing the `DynamicTable` class. An | ||
| example for this is `SearchableTable.td`, which describes `GenericEnum`, and | ||
| `GenericTable` classes for `gen-searchable-tables` backend. File | ||
| `llvm/include/llvm/TableGen/DynamicTable.td` should look like the one below: | ||
| ``` | ||
| //===- DynamicTable.td ----------------------------------*- tablegen -*-===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This file defines the key top-level classes needed to produce a reasonably | ||
| // generic dynamic table that can be updated in runtime. DynamicTable objects | ||
| // can be defined using the class in this file: | ||
| // 1. (Dynamic) Tables. By instantiating the DynamicTable | ||
| // class once, a table with the name of the instantiating def is generated and | ||
| // guarded by the GET_name_IMPL preprocessor guard. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // Define a record derived from this class to generate a dynamic table. This | ||
| // table resembles a hashtable with a key-value pair, and can updated in runtime. | ||
| // | ||
| // The name of the record is used as the name of the global primary array of | ||
| // entries of the table in C++. | ||
| class DynamicTable { | ||
| // Name of a class. The table will have one entry for each record that | ||
| // derives from that class. | ||
| string FilterClass; | ||
|
|
||
| // Name of the C++ struct/class type that holds table entries. The | ||
| // declaration of this type is not generated automatically. | ||
| string CppTypeName = FilterClass; | ||
|
|
||
| // List of the names of fields of collected records that contain the data for | ||
| // table entries, in the order that is used for initialization in C++. | ||
| // | ||
| // TableGen needs to know the type of the fields so that it can format | ||
| // the initializers correctly. | ||
| // | ||
| // For each field of the table named xxx, TableGen will look for a field | ||
| // named TypeOf_xxx and use that as a more detailed description of the | ||
| // type of the field. | ||
|
|
||
| // class MyTableEntry { | ||
| // MyEnum V; | ||
| // ... | ||
| // } | ||
| // | ||
| // def MyTable : DynamicTable { | ||
| // let FilterClass = "MyTableEntry"; | ||
| // let Fields = ["V", ...]; | ||
| // string TypeOf_V = "list<int>"; | ||
| // } | ||
| list<string> Fields; | ||
| } | ||
| ``` | ||
|
|
||
| This file should be included --either directly or indirectly-- in any other | ||
| `.td` file that uses `DynamicTable` class. | ||
|
|
||
| The default device configuration `.td` file should look like the one below: | ||
| ``` | ||
| include "llvm/TableGen/DynamicTable.td" | ||
|
|
||
| // Aspect and all the aspects definitions could be outlined | ||
| // to another .td file that could be included into this file | ||
| class Aspect<string name> { | ||
| string Name = name; | ||
| } | ||
|
|
||
| def AspectCpu : Aspect<"cpu">; | ||
| def AspectGpu : Aspect<"gpu">; | ||
| def AspectAccelerator : Aspect<"accelerator">; | ||
| def AspectCustom : Aspect<"custom">; | ||
| def AspectFp16 : Aspect<"fp16">; | ||
| def AspectFp64 : Aspect<"fp64">; | ||
| def AspectImage : Aspect<"image">; | ||
| def AspectOnline_compiler : Aspect<"online_compiler">; | ||
| def AspectOnline_linker : Aspect<"online_linker">; | ||
| def AspectQueue_profiling : Aspect<"queue_profiling">; | ||
| def AspectUsm_device_allocations : Aspect<"usm_device_allocations">; | ||
| def AspectUsm_host_allocations : Aspect<"usm_host_allocations">; | ||
| def AspectUsm_shared_allocations : Aspect<"usm_shared_allocations">; | ||
| def AspectUsm_system_allocations : Aspect<"usm_system_allocations">; | ||
| def AspectExt_intel_pci_address : Aspect<"ext_intel_pci_address">; | ||
| def AspectExt_intel_gpu_eu_count : Aspect<"ext_intel_gpu_eu_count">; | ||
| def AspectExt_intel_gpu_eu_simd_width : Aspect<"ext_intel_gpu_eu_simd_width">; | ||
| def AspectExt_intel_gpu_slices : Aspect<"ext_intel_gpu_slices">; | ||
| def AspectExt_intel_gpu_subslices_per_slice : Aspect<"ext_intel_gpu_subslices_per_slice">; | ||
| def AspectExt_intel_gpu_eu_count_per_subslice : Aspect<"ext_intel_gpu_eu_count_per_subslice">; | ||
| def AspectExt_intel_max_mem_bandwidth : Aspect<"ext_intel_max_mem_bandwidth">; | ||
| def AspectExt_intel_mem_channel : Aspect<"ext_intel_mem_channel">; | ||
| def AspectUsm_atomic_host_allocations : Aspect<"usm_atomic_host_allocations">; | ||
| def AspectUsm_atomic_shared_allocations : Aspect<"usm_atomic_shared_allocations">; | ||
| def AspectAtomic64 : Aspect<"atomic64">; | ||
| def AspectExt_intel_device_info_uuid : Aspect<"ext_intel_device_info_uuid">; | ||
| def AspectExt_oneapi_srgb : Aspect<"ext_oneapi_srgb">; | ||
| def AspectExt_oneapi_native_assert : Aspect<"ext_oneapi_native_assert">; | ||
| def AspectHost_debuggable : Aspect<"host_debuggable">; | ||
| def AspectExt_intel_gpu_hw_threads_per_eu : Aspect<"ext_intel_gpu_hw_threads_per_eu">; | ||
| def AspectExt_oneapi_cuda_async_barrier : Aspect<"ext_oneapi_cuda_async_barrier">; | ||
| def AspectExt_oneapi_bfloat16_math_functions : Aspect<"ext_oneapi_bfloat16_math_functions">; | ||
| def AspectExt_intel_free_memory : Aspect<"ext_intel_free_memory">; | ||
| def AspectExt_intel_device_id : Aspect<"ext_intel_device_id">; | ||
| def AspectExt_intel_memory_clock_rate : Aspect<"ext_intel_memory_clock_rate">; | ||
| def AspectExt_intel_memory_bus_width : Aspect<"ext_intel_memory_bus_width">; | ||
| def AspectEmulated : Aspect<"emulated">; | ||
|
|
||
| def TargetTable : DynamicTable { | ||
| let FilterClass = "TargetInfo"; | ||
| let Fields = ["TargetName", "aspects", "maySupportOtherAspects", | ||
| "subGroupSizes", "aotToolchain", "aotToolchainOptions"]; | ||
| string TypeOf_aspects = "list<Aspect>"; | ||
| string TypeOf_subGroupSizes = "list<int>" | ||
| } | ||
|
|
||
| class TargetInfo <string tgtName, list<Aspect> aspectList, bit otherAspects, | ||
| list<int> listSubGroupSizes, string toolchain, string options> | ||
| { | ||
| list<Aspect> aspects = aspectList; | ||
| bits<1> maySupportOtherAspects = otherAspects; | ||
| list<int> subGroupSizes = listSubGroupSizes; | ||
| string aotToolchain = toolchain; | ||
| string aotToolchainOptions = options; | ||
| } | ||
|
|
||
| def : TargetInfo<"TargetA", [AspectCpu, AspectAtomic64], | ||
| 0, [8, 16], "ocloc", "-device tgtA">; | ||
| def : TargetInfo<"TargetB", [AspectGpu, AspectFp16], | ||
| 0, [8, 16], "ocloc", "-device tgtB">; | ||
| def : TargetInfo<"TargetC", [AspectEmulated, AspectImage], | ||
| 0, [8, 32], "ocloc", "-device tgtC, -option2 val">; | ||
| ``` | ||
| Note: backends tested don't allow lists within `TargetInfo` class. This is a | ||
| backend limitation, rather than a TableGen limitation. Thus, we should be able | ||
| to lift this limitation in our own backend, as shown in the initial prototype | ||
| implemented to drive the design. | ||
|
|
||
| The generated `.inc` file should look like the example below: | ||
| ```c++ | ||
| std::map<std::string, TargetInfo> TargetTable = { | ||
| {"TargetA", | ||
| {{"cpu", "atomic64"}, 0, {8, 16}, "ocloc", "-device tgtA"}}, | ||
| {"TargetB", | ||
| {{"gpu", "fp16"}, 0, {8, 16}, "ocloc", "-device tgtB"}}, | ||
| {"TargetC", | ||
| {{"emulated", "image"}, 0, {8, 32}, "ocloc", "-device tgtC, -option2 val"}}}; | ||
| ``` | ||
|
|
||
| We also need a header file that includes the `.inc` file generated by the | ||
| TableGen backend. Other backends don't generate the definition of `struct | ||
| TargetInfo`, and this seems a good idea to me: it simplifies the backend | ||
| implementation, and it is easier for developers to check the data structure | ||
| to understand how to work with it. The idea is simply to define the struct | ||
| in this header file. This header file should look like the code below: | ||
| ```c++ | ||
| namespace DeviceConfigFile { | ||
| struct TargetInfo { | ||
| bool maySupportOtherAspects; | ||
| std::vector<std::string> aspects; | ||
| std::vector<unsigned> subGroupSizes; | ||
| std::string aotToolchain; | ||
| std::string aotToolchainOptions; | ||
| }; | ||
|
|
||
| #include "device_config_file.inc" | ||
| using TargetTable_t = std::map<std::string, TargetInfo>; | ||
| }; // namespace DeviceConfigFile | ||
| ``` | ||
|
|
||
| Other modules can query the map to get the information like in the example | ||
| below: | ||
| ```c++ | ||
| DeviceConfigFile::TargetInfo info = DeviceConfigFile::targets.find("TargetA"); | ||
| if (info == DeviceConfigFile::targets.end()) { | ||
| /* Target not found */ | ||
| ... | ||
| } else { | ||
| auto aspects = info.aspects; | ||
| auto maySupportOtherAspects = info.maySupportOtherAspects; | ||
| auto subGroupSizes = info.subGroupSizes; | ||
| ... | ||
| } | ||
| ``` | ||
|
|
||
| ## Tools and Modules Interacting with Device Config File | ||
| This is a list of the tools and compiler modules that require using the file: | ||
| - The *compiler driver* needs the file to determine the set of legal values for | ||
| `-fsycl-targets`. | ||
| - The *compiler driver* needs the file to define macros for `any_device_has/all_devices_have`. | ||
| - *Clang* needs the file to emit diagnostics related to `-fsycl-fixed-targets.` | ||
| - `sycl-post-link` needs the file to filter kernels in device images when doing AOT | ||
| compilation. | ||
|
|
||
| Following, you can find the changes required in different parts of the project | ||
| in more detail. | ||
|
|
||
| ### Changes to Build Infrastructure | ||
| We need the information about the targets in multiple tools and compiler | ||
| modules listed in [Requirements](#Requirements). Thus, we need to make sure | ||
| that the generation of the `.inc` file out of the `.td` file is done in time | ||
| for all the consumers. The command we need to run for TableGen is `llvm-tblgen | ||
| -gen-dynamic-tables -I /llvm-root/llvm/include/ input.td -o output.inc`. | ||
| Additionally, we need to set dependencies adequately so that this command is | ||
| run before any of the consumers need it. | ||
|
|
||
| ### Changes to the DPC++ Frontend | ||
maarquitos14 marked this conversation as resolved.
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section is a bit unclear: I understand that clang driver may read the option, open a file and modify the map we have with default config, but what about other tools, like Will we copy-paste the code opening a file and modifying default config, or will we put into helper header (one which contains
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not super familiar with how different processes are invoked from the driver, so I was assuming here that options are propagated to all the processes. Thus, yes, my original idea was having the different processes to do the same handling. I think handling the map is easier, while modifying files is more error-prone.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Generally, driver options are not propagated down to tools automatically and each option is handled on case by case bases. Not all options are applicable for all tools, not all options have 1:1 equivalents, etc. A few examples here:
Then we should document that every tool which is supposed to work with the config, should be extended with this new option and it should be propagated from the driver to a tool. |
||
| To allow users to add new targets we provide a new flag: | ||
| `fsycl-device-config-file=/path/to/file.yaml`. Users can pass a `.yaml` file | ||
| describing the targets to be added/updated. An example of how such `.yaml` file | ||
| should look like is shown below. | ||
| ``` | ||
| intel_gpu_skl: | ||
| aspects: [aspect_name1, aspect_name2] | ||
| may_support_other_aspects: true/false | ||
| sub-group-sizes: [1, 2, 4, 8] | ||
| aot-toolchain: ocloc | ||
| aot-toolchain-options: -device skl | ||
| ``` | ||
| The frontend module should parse the user-provided `.yaml` file and update the | ||
| map with the new information about targets. LLVM provides | ||
| [YAML/IO](https://llvm.org/docs/YamlIO.html) library to easily parse `.yaml` | ||
| files. The driver should propagate this option to all the tools that require | ||
| the Device Configuration File (e.g. `sycl-post-link`) so that each of the | ||
| tools can modify the map according to the user extensions described in the | ||
| `.yaml` file. | ||
|
|
||
| As mentioned in [Requirements](#Requirements), there is an auto-detection | ||
| mechanism for `aot-toolchain` and `aot-toolchain-options` that is able to | ||
| infer these from the target name. In the `.yaml` example shown above the target | ||
| name is `intel_gpu_skl`. From that name, we can infer that `aot-toolchain` is | ||
| `ocloc` because the name starts with `intel_gpu`. Also, we can infer that it needs | ||
| `aot-toolchain-options` set to `-device skl` just by keeping what is left after the | ||
| prefix `intel_gpu`. | ||
|
|
||
| #### Potential Issues/Limitations | ||
| - Multiple targets with the same name: On the one hand, the compiler emits a | ||
| warning so that the user is aware that multiple targets share the same name. On | ||
| the other hand, it simply processes each new entry and updates the map with the | ||
| latest information found. | ||
maarquitos14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The auto-detection mechanism is a best effort to relieve users from specifying | ||
| `aot-toolchain` and `aot-toolchain-options` from well known devices. However, | ||
| it has its own limitations and potential issues: | ||
| - Rules for target names: As of now, auto-detection is only available for Intel GPU | ||
| targets. All targets starting with `intel_gpu_` will automatically set | ||
| `aot-toolchain=ocloc` and `aot-toolchain-options=-device suffix` being suffix the part | ||
| left after `intel_gpu_` prefix. | ||
| - User specifies `aot-toolchain` and `aot-toolchain-options` for a target name | ||
| that can be auto-detected: user-specified information has precedence over auto-detected | ||
| information. | ||
|
|
||
| ## Testing | ||
| There is a danger that the device configuration file will get out-of-sync with the | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a test already for We could do the same for default device config. However, I still don't think that such approach will allow us to be sure that nothing went out-of-sync, because it requires us to run the test on every different target we have in our config file. We most likely won't have all of them available to us at any time even for manual run of the test, not to say about automated launch of it. Not that we shouldn't run this test in our regular testing, but just highlighting that it won't automatically give us 100% coverage over all known targets. An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This would be just to make sure that new aspects added to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right, that is a separate test, which is intended to at least highlight that default config maybe outdated, because it is not even aware of that new aspect. It is not 100% bulletproof, because it doesn't check that you actually used the aspect in description of any target. However, such test doesn't require any actual HW to be present on a machine and therefore could also be useful during development when new aspects are added. |
||
| actual device capabilities. In order to prevent that, we need testing to validate | ||
| that the device config file does not go out-of-sync. There are two tests that we | ||
| should include: | ||
| - A test that compares the list of aspects known to SYCL RT (defined in `aspects.def`) | ||
| with the list of aspects defined in the `.td` file describing the default configuration. | ||
| This will be useful to detect new aspects added to SYCL RT that have not been added in | ||
| the `.td` file. | ||
| - A test that compares the aspects listed in the `.td` file with the aspects reported | ||
| via `device::has` for each device listed in the `.td` file. Both lists should match. | ||
| This test could copy the mechanism of the test for `any_device_has` that goes over each | ||
| item in `aspects.def` and tries to instantiate `any_device_has` with that enumerator. | ||
|
|
||
| Neither of the tests provides guarantees that nothing went out-of-sync *per se*, we | ||
| would require running the second test in all the targets described in the `.td` file | ||
| for such guarantees, but at least provides the mechanism to detect potential desyncs. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.