diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0995f5734261f..554a14a9eaae9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -79,14 +79,16 @@ for more information. changes. See [Get Started Guide](sycl/doc/GetStartedGuide.md). - Prepare your patch - follow [LLVM coding standards](https://llvm.org/docs/CodingStandards.html) - - [clang-format](https://clang.llvm.org/docs/ClangFormat.html) and - [clang-tidy](https://clang.llvm.org/extra/clang-tidy/) tools can be integrated into your - workflow to ensure formatting and stylistic compliance of your changes. + - [clang-format](https://clang.llvm.org/docs/ClangFormat.html) and + [clang-tidy](https://clang.llvm.org/extra/clang-tidy/) tools can be + integrated into your workflow to ensure formatting and stylistic + compliance of your changes. - use ``` ./clang/tools/clang-format/git-clang-format `git merge-base origin/sycl HEAD` ``` - to check the format of your current changes against the `origin/sycl` branch. + to check the format of your current changes against the `origin/sycl` + branch. - `-f` to also correct unstaged changes - `--diff` to only print the diff without applying - Build the project and run all tests. @@ -125,5 +127,4 @@ Project maintainers merge pull requests using one of the following options: - [Create a merge commit] Used for LLVM pull-down PRs to preserve hashes of the commits pulled from the LLVM community repository - *Other names and brands may be claimed as the property of others. diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 98c49dc59bc7b..60550014fd38b 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -929,8 +929,9 @@ Release notes for commit c557eb740d55e828fcf74b28d2b686c928e45318. - The problem with calling inlined kernel from multiple TUs is fixed. - Fixed compiler warnings for Intel FPGA attributes on host compilation. - Fixed bug with passing values of `vec<#, half>` type to the kernel. -- Fixed buffer constructor which takes host data as shared_ptr. Now it increments - shared_ptr reference counter and reuses provided memory if possible. +- Fixed buffer constructor which takes host data as shared_ptr. Now it + increments shared_ptr reference counter and reuses provided memory if + possible. - Fixed a bug with nd_item.barrier not respecting fence_space flag ## Prerequisites @@ -1001,9 +1002,9 @@ Release notes for commit 64c0262c0f0b9e1b7b2e2dcef57542a3fe3bdb97. - Fixed code generation for 3-element boolean vectors. ## Prerequisites - - Experimental Intel(R) CPU Runtime for OpenCL(TM) Applications with SYCL support is - available now and recommended OpenCL CPU RT prerequisite for the SYCL - compiler. + - Experimental Intel(R) CPU Runtime for OpenCL(TM) Applications with SYCL + support is available now and recommended OpenCL CPU RT prerequisite for the + SYCL compiler. - The Intel(R) Graphics Compute Runtime for OpenCL(TM) version 19.25.13237 is recommended OpenCL GPU RT prerequisite for the SYCL compiler. @@ -1039,7 +1040,8 @@ d404d1c6767524c21b9c5d05f11b89510abc0ab9. - Memory attribute `intelfpga::max_concurrency` was renamed to `intelfpga::max_private_copies` to avoid name conflict with fresh added loop attribute -- Added support for const values and local accessors in `handler::set_arg` method. +- Added support for const values and local accessors in `handler::set_arg` + method. ## Bug Fixes - The new scheduler is implemented with the following bug fixes: @@ -1056,8 +1058,8 @@ d404d1c6767524c21b9c5d05f11b89510abc0ab9. specification. - Compiling multiple objects when using `-fsycl-link-targets` now creates proper final .spv binary. -- Fixed bug with crash in sampler destructor when sampler object is created using - enumerations. +- Fixed bug with crash in sampler destructor when sampler object is created + using enumerations. - Fixed `handler::set_arg`, so now it works correctly with kernels created using program constructor which takes `cl_program` or `program::build_with_source`. - Now `lgamma_r` builtin works correctly when application is built without @@ -1077,7 +1079,6 @@ d404d1c6767524c21b9c5d05f11b89510abc0ab9. OpenCL handles allocated inside SYCL(e.g. `cl_command_queue`) are not released. - # May'19 release notes ## New Features diff --git a/sycl/doc/CompilerAndRuntimeDesign.md b/sycl/doc/CompilerAndRuntimeDesign.md index 1db56ca0db6bd..41d7bb6846663 100644 --- a/sycl/doc/CompilerAndRuntimeDesign.md +++ b/sycl/doc/CompilerAndRuntimeDesign.md @@ -102,17 +102,17 @@ pointers to the device memory. As there is no way in OpenCL to pass structures with pointers inside as kernel arguments all memory objects shared between host and device must be passed to the kernel as raw pointers. SYCL also has a special mechanism for passing kernel arguments from host to -the device. In OpenCL kernel arguments are set by calling `clSetKernelArg` function -for each kernel argument, meanwhile in SYCL all the kernel arguments are fields of -"SYCL kernel function" which can be defined as a lambda function or a named function -object and passed as an argument to SYCL function for invoking kernels (such as -`parallel_for` or `single_task`). For example, in the previous code snippet above -`accessor` `A` is one such captured kernel argument. +the device. In OpenCL kernel arguments are set by calling `clSetKernelArg` +function for each kernel argument, meanwhile in SYCL all the kernel arguments +are fields of "SYCL kernel function" which can be defined as a lambda function +or a named function object and passed as an argument to SYCL function for +invoking kernels (such as `parallel_for` or `single_task`). For example, in the +previous code snippet above `accessor` `A` is one such captured kernel argument. To facilitate the mapping of SYCL kernel data members to OpenCL -kernel arguments and overcome OpenCL limitations we added the generation of an OpenCL -kernel function inside the compiler. An OpenCL kernel function contains the -body of the SYCL kernel function, receives OpenCL-like parameters and +kernel arguments and overcome OpenCL limitations we added the generation of an +OpenCL kernel function inside the compiler. An OpenCL kernel function contains +the body of the SYCL kernel function, receives OpenCL-like parameters and additionally does some manipulation to initialize SYCL kernel data members with these parameters. In some pseudo code the OpenCL kernel function for the previous code snippet above looks like this: @@ -141,7 +141,8 @@ __kernel KernelName(global int* a) { ``` -OpenCL kernel function is generated by the compiler inside the Sema using AST nodes. +OpenCL kernel function is generated by the compiler inside the Sema using AST +nodes. ### SYCL support in the driver @@ -215,12 +216,13 @@ option mechanism, similar to OpenMP. `-Xsycl-target-backend= "arg1 arg2 ..."` -For example, to support offload to Gen9/vISA3.3, the following options would be used: +For example, to support offload to Gen9/vISA3.3, the following options would be +used: `-fsycl -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xsycl-target-backend "-device skl"` -The driver passes the `-device skl` parameter directly to the Gen device backend compiler -without parsing it. +The driver passes the `-device skl` parameter directly to the Gen device backend +compiler without parsing it. **TBD:** Having multiple code forms for the same target in the fat binary might mean invoking device compiler multiple times. Multiple invocations are not @@ -361,27 +363,28 @@ is to allow users to save re-compile time when making changes that only affect their host code. In the case where device image generation takes a long time (e.g. FPGA), this savings can be significant. -For example, if the user separated source code into four files: dev_a.cpp, dev_b.cpp, -host_a.cpp and host_b.cpp where only dev_a.cpp and dev_b.cpp contain device code, -they can divide the compilation process into three steps: +For example, if the user separated source code into four files: dev_a.cpp, +dev_b.cpp, host_a.cpp and host_b.cpp where only dev_a.cpp and dev_b.cpp contain +device code, they can divide the compilation process into three steps: 1. Device link: dev_a.cpp dev_b.cpp -> dev_image.o (contain device image) 2. Host Compile (c): host_a.cpp -> host_a.o; host_b.cpp -> host_b.o 3. Linking: dev_image.o host_a.o host_b.o -> executable Step 1 can take hours for some targets. But if the user wish to recompile after -modifying only host_a.cpp and host_b.cpp, they can simply run steps 2 and 3 without -rerunning the expensive step 1. +modifying only host_a.cpp and host_b.cpp, they can simply run steps 2 and 3 +without rerunning the expensive step 1. -The compiler is responsible for verifying that the user provided all the relevant -files to the device link step. There are 2 cases that have to be checked: +The compiler is responsible for verifying that the user provided all the +relevant files to the device link step. There are 2 cases that have to be +checked: 1. Missing symbols referenced by the kernels present in the device link step (e.g. functions called by or global variables used by the known kernels). 2. Missing kernels. -Case 1 can be identified in the device binary generation stage (step 1) by scanning -the known kernels. Case 2 must be verified by the driver by checking for newly -introduced kernels in the final link stage (step 3). +Case 1 can be identified in the device binary generation stage (step 1) by +scanning the known kernels. Case 2 must be verified by the driver by checking +for newly introduced kernels in the final link stage (step 3). The llvm-no-spir-kernel tool was introduced to facilitate checking for case 2 in the driver. It detects if a module includes kernels and is invoked as follows: @@ -438,24 +441,40 @@ unit) #### CUDA support -The driver supports compilation to NVPTX when the `nvptx64-nvidia-cuda-sycldevice` is passed to `-fsycl-targets`. +The driver supports compilation to NVPTX when the +`nvptx64-nvidia-cuda-sycldevice` is passed to `-fsycl-targets`. -Unlike other AOT targets, the bitcode module linked from intermediate compiled objects never goes through SPIR-V. Instead it is passed directly in bitcode form down to the NVPTX Back End. All produced bitcode depends on two libraries, `libdevice.bc` (provided by the CUDA SDK) and `libspirv-nvptx64--nvidiacl.bc` (built by the libclc project). +Unlike other AOT targets, the bitcode module linked from intermediate compiled +objects never goes through SPIR-V. Instead it is passed directly in bitcode form +down to the NVPTX Back End. All produced bitcode depends on two libraries, +`libdevice.bc` (provided by the CUDA SDK) and `libspirv-nvptx64--nvidiacl.bc` +(built by the libclc project). -During the device linking step (device linker box in the [Separate Compilation and Linking](#separate-compilation-and-linking) illustration), llvm bitcode objects for the CUDA target are linked together alongside `libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc`, compiled to PTX using the NVPTX backend, and assembled into a cubin using the `ptxas` tool (part of the CUDA SDK). The PTX file and cubin are assembled together using `fatbinary` to produce a CUDA fatbin. The CUDA fatbin is then passed to the offload wrapper tool. +During the device linking step (device linker box in the +[Separate Compilation and Linking](#separate-compilation-and-linking) +illustration), llvm bitcode objects for the CUDA target are linked together +alongside `libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc`, compiled to PTX +using the NVPTX backend, and assembled into a cubin using the `ptxas` tool (part +of the CUDA SDK). The PTX file and cubin are assembled together using +`fatbinary` to produce a CUDA fatbin. The CUDA fatbin is then passed to the +offload wrapper tool. ##### Checking if the compiler is targeting NVPTX -When the SYCL compiler is in device mode and targeting the NVPTX backend, compiler defines the macro `__SYCL_NVPTX__`. -This macro can safely be used to enable NVPTX specific code path in SYCL kernels. +When the SYCL compiler is in device mode and targeting the NVPTX backend, +compiler defines the macro `__SYCL_NVPTX__`. +This macro can safely be used to enable NVPTX specific code path in SYCL +kernels. *Note: this macro is only define during the device compilation phase.* ##### NVPTX Builtins -When the SYCL compiler is in device mode and targeting the NVPTX backend, the compiler exposes NVPTX builtins supported by clang. +When the SYCL compiler is in device mode and targeting the NVPTX backend, the +compiler exposes NVPTX builtins supported by clang. -*Note: this enable NVPTX specific features which cannot be supported by other targets or the host.* +*Note: this enable NVPTX specific features which cannot be supported by other +targets or the host.* Example: ```cpp @@ -472,16 +491,24 @@ double my_min(double x, double y) { ##### Local memory support -In CUDA, users can only allocate one chunk of host allocated shared memory (which maps to SYCL's local accessors). -This chunk of memory is allocated as an array `extern __shared__ [];` which LLVM represents as an external global symbol to the CUDA shared memory address space. -The NVPTX backend then lowers this into a `.extern .shared .align 4 .b8` PTX instruction. +In CUDA, users can only allocate one chunk of host allocated shared memory +(which maps to SYCL's local accessors). This chunk of memory is allocated as an +array `extern __shared__ [];` which LLVM represents as an external +global symbol to the CUDA shared memory address space. The NVPTX backend then +lowers this into a `.extern .shared .align 4 .b8` PTX instruction. -In SYCL, users can allocate multiple local accessors and pass them as kernel parameters. When the SYCL frontend lowers the SYCL kernel invocation into an OpenCL compliant kernel entry, it lowers local accessors into a pointer to OpenCL local memory (CUDA shared memory) but this is not legal for CUDA kernels. +In SYCL, users can allocate multiple local accessors and pass them as kernel +parameters. When the SYCL frontend lowers the SYCL kernel invocation into an +OpenCL compliant kernel entry, it lowers local accessors into a pointer to +OpenCL local memory (CUDA shared memory) but this is not legal for CUDA kernels. -To legalize the SYCL lowering for CUDA, a SYCL for CUDA specific pass will do the following: +To legalize the SYCL lowering for CUDA, a SYCL for CUDA specific pass will do +the following: - Create a global symbol to the CUDA shared memory address space -- Transform all pointers to CUDA shared memory into a 32 bit integer representing the offset in bytes to use with the global symbol -- Replace all uses of the transformed pointers by the address to global symbol offset by the value of the integer passed as parameter +- Transform all pointers to CUDA shared memory into a 32 bit integer + representing the offset in bytes to use with the global symbol +- Replace all uses of the transformed pointers by the address to global symbol + offset by the value of the integer passed as parameter As an example, the following kernel: ``` @@ -490,6 +517,7 @@ define void @SYCL_generated_kernel(i64 addrspace(3)* nocapture %local_ptr, i32 % %1 = load i64, i64 addrspace(3)* %local_ptr2 } ``` + Is transformed into this kernel when targeting CUDA: ``` @SYCL_generated_kernel.shared_mem = external dso_local local_unnamed_addr addrspace(3) global [0 x i8], align 4 @@ -502,7 +530,10 @@ define void @SYCL_generated_kernel(i32 %local_ptr_offset, i32 %arg, i32 %local_p } ``` -On the runtime side, when setting local memory arguments, the CUDA PI implementation will internally set the argument as the offset with respect to the accumulated size of used local memory. This approach preserves the exisiting PI interface. +On the runtime side, when setting local memory arguments, the CUDA PI +implementation will internally set the argument as the offset with respect to +the accumulated size of used local memory. This approach preserves the exisiting +PI interface. ### Integration with SPIR-V format @@ -537,8 +568,8 @@ Translation from LLVM IR to SPIR-V for special types is also supported, but such LLVM IR must comply to some special requirements. Unfortunately there is no canonical form of special built-in types and operations in LLVM IR, moreover we can't re-use existing representation generated by OpenCL C front-end -compiler. For instance here is how `OpGroupAsyncCopy` operation looks in LLVM IR -produced by OpenCL C front-end compiler. +compiler. For instance here is how `OpGroupAsyncCopy` operation looks in LLVM +IR produced by OpenCL C front-end compiler. ```LLVM @_Z21async_work_group_copyPU3AS3fPU3AS1Kfjj(float addrspace(3)*, float addrspace(1)*, i32, i32) diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md index 24ede38429933..7fcb91aaf7826 100644 --- a/sycl/doc/GetStartedGuide.md +++ b/sycl/doc/GetStartedGuide.md @@ -130,8 +130,8 @@ To enable support for CUDA devices, follow the instructions for the Linux DPC++ toolchain, but add the `--cuda` flag to `configure.py` Enabling this flag requires an installation of -[CUDA 10.1](https://developer.nvidia.com/cuda-10.1-download-archive-update2) on the system, -refer to +[CUDA 10.1](https://developer.nvidia.com/cuda-10.1-download-archive-update2) on +the system, refer to [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html). Currently, the only combination tested is Ubuntu 18.04 with CUDA 10.2 using @@ -145,17 +145,18 @@ above. The DPC++ toolchain support on CUDA platforms is still in an experimental phase. Currently, the DPC++ toolchain relies on having a recent OpenCL implementation on the system in order to link applications to the DPC++ runtime. -The OpenCL implementation is not used at runtime if only the CUDA backend is +The OpenCL implementation is not used at runtime if only the CUDA backend is used in the application, but must be installed. The OpenCL implementation provided by the CUDA SDK is OpenCL 1.2, which is too old to link with the DPC++ runtime and lacks some symbols. -We recommend installing the low level CPU runtime, following the instructions +We recommend installing the low level CPU runtime, following the instructions in the next section. -Instead of installing the low level CPU runtime, it is possible to build and -install the [Khronos ICD loader](https://github.com/KhronosGroup/OpenCL-ICD-Loader), +Instead of installing the low level CPU runtime, it is possible to build and +install the +[Khronos ICD loader](https://github.com/KhronosGroup/OpenCL-ICD-Loader), which contains all the symbols required. ### Install low level runtime @@ -276,7 +277,7 @@ python %DPCPP_HOME%\llvm\buildbot\check.py If no OpenCL GPU/CPU runtimes are available, the corresponding tests are skipped. -If CUDA support has been built, it is tested only if there are CUDA devices +If CUDA support has been built, it is tested only if there are CUDA devices available. #### Run Khronos\* SYCL\* conformance test suite (optional) @@ -411,7 +412,7 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \ This `simple-sycl-app.exe` application doesn't specify SYCL device for execution, so SYCL runtime will use `default_selector` logic to select one of accelerators available in the system or SYCL host device. -In this case, the behaviour of the `default_selector` can be altered +In this case, the behaviour of the `default_selector` can be altered using the `SYCL_BE` environment variable, setting `PI_CUDA` forces the usage of the CUDA backend (if available), `PI_OPENCL` will force the usage of the OpenCL backend. @@ -543,5 +544,4 @@ class CUDASelector : public cl::sycl::device_selector { - SYCL\* 1.2.1 specification: [www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf](https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf) - \*Other names and brands may be claimed as the property of others. diff --git a/sycl/doc/PluginInterface.md b/sycl/doc/PluginInterface.md index 4d13f882650e6..99c2626b89d47 100644 --- a/sycl/doc/PluginInterface.md +++ b/sycl/doc/PluginInterface.md @@ -51,12 +51,12 @@ searched at locations in env LD_LIBRARY_PATH on Linux and env PATH on Windows. installation directory by using DT_RPATH on Linux. Similar functionality can be achieved on Windows using SetDllDirectory. This will help avoiding extra setting of LD_LIBRARY_PATH.) -To avoid any issues with read-only access, an environment variable SYCL_PI_CONFIG -can be set to point to the configuration file which lists the Plugin names. The -enviroment variable if set overrides the predetermined location's config file. -These Plugins are then be searched in LD_LIBRARY_PATH locations. -It is the developer's responsibility to include the plugin names from the -predetermined location's config file to enable discovery of all plugins. +To avoid any issues with read-only access, an environment variable +SYCL_PI_CONFIG can be set to point to the configuration file which lists the +Plugin names. The enviroment variable if set overrides the predetermined +location's config file. These Plugins are then be searched in LD_LIBRARY_PATH +locations. It is the developer's responsibility to include the plugin names from +the predetermined location's config file to enable discovery of all plugins. (TBD - Extend to support search in DT_RPATH as above.) In the current implementation the plugin names are hardcoded in the library. Configuration file or env SYCL_PI_CONFIG is currently not being considered. @@ -72,11 +72,12 @@ SYCL_PI_TRACE=-1 lists all PI Traces above and more debug messages. #### Plugin binary interface Plugins should implement all the Interface APIs required for the PI Version -it supports. There is [pi.def](../include/CL/sycl/detail/pi.def)/ +it supports. There is [pi.def](../include/CL/sycl/detail/pi.def)/ [pi.h](../include/CL/sycl/detail/pi.h) file listing all PI API names that can be called by the specific version of Plugin Interface. -It exports a function - "piPluginInit" that returns the plugin details and function pointer -table containing the list of pointers to implemented Interface Functions defined in pi.h. +It exports a function - "piPluginInit" that returns the plugin details and +function pointer table containing the list of pointers to implemented Interface +Functions defined in pi.h. In the future, this document will list the minimum set of Interface APIs to be supported by Plugins. This will also require adding functionality to SYCL Runtime to work with such limited functionality plugins. @@ -86,7 +87,7 @@ be picked up by the DPC++ runtime for offload.) #### Binding a Plugin The DPC++ Runtime loads all discovered Plugins and tries to bind them by calling -piPluginInit API for each loaded Plugin. The Plugins return the information of +piPluginInit API for each loaded Plugin. The Plugins return the information of supported PI version and the list of implemented PI API Function pointers. (TBD - Use the PI API Version information and check for compatibility. Extend to support version compatibility checks without loading the library. @@ -97,8 +98,8 @@ The PI API calls are later forwarded using this information. A plugin is said to "bind" after this process completes with no errors. During device selection, the user can prefer selection of a device from a specific Plugin or Backend using the env SYCL_BE. The correspondence between -a plugin and a SYCL_BE value is currently hardcoded in the runtime. -( TBD: Make this a part of configuration file). +a plugin and a SYCL_BE value is currently hardcoded in the runtime. +( TBD: Make this a part of configuration file). Eg: SYCL_BE=PI_OPENCL corresponds to OpenCL Plugin. #### OpenCL plugin @@ -109,13 +110,15 @@ OpenCL implementations. They can be installed either in the standard Khronos ICD-compatible way (e.g. listed in files under /etc/OpenCL/vendors on Linux) or not, and the OpenCL plugin can hook up with both. -TBD - implement and describe the nested OpenCL implementation discovery process performed by -the OpenCL plugin +TBD - implement and describe the nested OpenCL implementation discovery process +performed by the OpenCL plugin ### Device enumeration by plugins -Devices from all bound plugins are queried and listed as and when required, eg: during device selection in device_selector. +Devices from all bound plugins are queried and listed as and when required, eg: +during device selection in device_selector. The trace shows the PI API calls made when using SYCL_PI_TRACE=-1. -(TBD - Add the trace to list all available devices when plugins are successfully bound.) +(TBD - Add the trace to list all available devices when plugins are successfully +bound.) ### Plugin Unloading The plugins not chosen to be connected to should be unloaded. @@ -130,8 +133,8 @@ able to operate on the corresponding device. The core API further breaks down into - **OpenCL-based** APIs which have OpenCL origin and semantics - **Extension** APIs which don't have counterparts in the OpenCL -- **Interoperability API** which allows interoperability with underlying runtimes -such as OpenCL. +- **Interoperability API** which allows interoperability with underlying +runtimes such as OpenCL. See [pi.h](../include/CL/sycl/detail/pi.h) header for the full list and descriptions of PI APIs. diff --git a/sycl/doc/PreprocessorMacros.md b/sycl/doc/PreprocessorMacros.md index 3e522830fd5cd..cea826120f737 100644 --- a/sycl/doc/PreprocessorMacros.md +++ b/sycl/doc/PreprocessorMacros.md @@ -4,17 +4,17 @@ This file describes macros that have effect on SYCL compiler and run-time. ### RESTRICT_WRITE_ACCESS_TO_CONSTANT_PTR -The spec assumes that the SYCL implementation does address space deduction. However, -for our implementation, the deduction is performed in the middle end, where it's -hard to provide user friendly diagnositcs. -Due to these problems writing to raw pointers obtained from `constant_ptr` is not -diagnosed now. +The spec assumes that the SYCL implementation does address space deduction. +However, for our implementation, the deduction is performed in the middle end, +where it's hard to provide user friendly diagnositcs. +Due to these problems writing to raw pointers obtained from `constant_ptr` is +not diagnosed now. The user can enable diagnostics upon writing to such pointers via enabling the `RESTRICT_WRITE_ACCESS_TO_CONSTANT_PTR` macro. This allows `constant_ptr` to use constant pointers as underlying pointer types. Thus, conversions from `constant_ptr` to raw pointers will return constant pointers and writing to const pointers will be diagnosed by the front-end. -This behavior is not following the SYCL spec since `constant_ptr` conversions to the -underlying pointer types return pointers without any additional qualifiers so -it's disabled by default. +This behavior is not following the SYCL spec since `constant_ptr` conversions to +the underlying pointer types return pointers without any additional qualifiers +so it's disabled by default. diff --git a/sycl/doc/UsersManual.md b/sycl/doc/UsersManual.md index ec761ffaf793a..8a0ac1be8c725 100644 --- a/sycl/doc/UsersManual.md +++ b/sycl/doc/UsersManual.md @@ -14,8 +14,8 @@ your application. **`-fsycl-targets=`** - A comma separated list of triples to specify the device target(s) to generate - code for. This option is only valid when used with `-fsycl`. + A comma separated list of triples to specify the device target(s) to + generate code for. This option is only valid when used with `-fsycl`. ### Target toolchain options. @@ -112,15 +112,17 @@ your application. **`-fintelfpga`** - Perform ahead of time compilation for Intel FPGA, which relies on the external tool `aoc` - being available in the `PATH`. + Perform ahead of time compilation for Intel FPGA, which relies on the + external tool `aoc` being available in the `PATH`. - This option is roughly equivalent to `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -g -MMD -lOpenCL`. + This option is roughly equivalent to + `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice -g -MMD -lOpenCL`. It is incompatible with `-fsycl-targets=...`; if ahead of time compilation is needed for multiple backends (e.g. Intel FPGA, Intel GPU, etc.), the - alternative form based on `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice` - should be used instead. + alternative form based on + `-fsycl-targets=spir64_fpga-unknown-unknown-sycldevice` should be used + instead. **`-fsycl-link=`** diff --git a/sycl/doc/extensions/README.md b/sycl/doc/extensions/README.md index 30554bf35a3f3..bcafe351f1f57 100644 --- a/sycl/doc/extensions/README.md +++ b/sycl/doc/extensions/README.md @@ -1,3 +1,4 @@ # Extensions -This is where documents can be found that propose extensions to the SYCL specification. +This is where documents can be found that propose extensions to the SYCL +specification.