@@ -394,6 +394,51 @@ llvm-no-spir-kernel host.bc
394394
395395It returns 0 if no kernels are present and 1 otherwise. 
396396
397+ #### Device code split 
398+ 
399+ Putting all device code into a single SPIRV module does not work well in the 
400+ following cases: 
401+ 1. There are thousands of kernels defined and only small part of them is used at 
402+ run-time. Having them all in one SPIR-V module significantly increases JIT time. 
403+ 2. Device code can be specialized for different devices. For example, kernels 
404+ that are supposed to be executed only on FPGA can use extensions avaliable for 
405+ FPGA only. This will cause JIT compilation failure on other devices even if this 
406+ particular kernel is never called on them. 
407+ 
408+ To resolve these problems the compiler can split a single module into smaller 
409+ ones. The following features is supported: 
410+ * Emitting a separate module for source (translation unit) 
411+ * Emitting a separate module for each kernel 
412+ 
413+ The current approach is: 
414+ * Generate special meta-data with translation unit ID for each kernel in SYCL 
415+ front-end. This ID will be used to group kernels on per-translation unit basis 
416+ * Link all device LLVM modules using llvm-link 
417+ * Perform split on a fully linked module 
418+ * Generate a symbol table (list of kernels) for each produced device module for 
419+ proper module selection in runtime 
420+ * Perform SPIR-V translation and AOT compilation (if requested) on each produced 
421+ module separately 
422+ * Add information about presented kernels to a wrappring object for each device 
423+ image 
424+ 
425+ Device code splitting process: 
426+  
427+ 
428+ The "split" box is implemented as functionality of the dedicated tool 
429+ `sycl-post-link`. The tool runs a set of LLVM passes to split input module and 
430+ generates a symbol table (list of kernels) for each produced device module. 
431+ 
432+ To enable device code split, a special option must be passed to the clang 
433+ driver: 
434+ 
435+ `-fsycl-device-code-split=<value>` 
436+ 
437+ There are three possible values for this option: 
438+ * `per_source` - enables emitting a separate module for each source (translation 
439+ unit) 
440+ * `per_kernel` - enables emitting a separate module for each kernel 
441+ * `off` - disables device code split 
397442
398443### Integration with SPIR-V format 
399444
0 commit comments