Update raising.md

Pangoraw · web-flow · commit 4624977cbc19 · 2025-12-09T10:52:48.000+01:00
diff --git a/docs/src/tutorials/raising.md b/docs/src/tutorials/raising.md
@@ -2,7 +2,16 @@
 
 ## Raising GPU Kernels
 
-<!-- TODO: write this section -->
+Kernel raising refer to Reactant's ability to transform a program written in a GPU kernel style. That is, kernel functions which are evaluated in a grid of blocks and threads where operations are done at the scalar level. The transformation raises the program to a tensor style function (in the StableHLO dialect) where operations are broadcasted.
+
+This transformation enables several features:
+
+ - Running the raised compute kernel on hardware where the original kernel was not designed to run on (_i.e._ running a CUDA kernel on a TPU).
+ - Enabling further optimizations, since the raised kernel is now indiscernible from the rest of the program, it can be optimized with it. For example, two sequential kernel launches operating on the result of each others can be fused if they are both raised. Resulting in a single kernel launch, in the final optimized StableHLO program.
+ - Lastly, automatic-differentiation in Reactant is currently not supported for GPU kernels. Raising kernels enables Enzyme to differentiate the raised kernel. For this to function, one must use the `raise_first` compilation option to make sure the kernel are raised before Enzyme performs automatic-differentiation on the program.
+
+!!! note
+    Not all classes of kernels are currently raisable to StableHLO. If your kernel encounters an error while being raised, please open an issue on [the Reactant.jl repository](https://github.com/EnzymeAD/Reactant.jl/issues/new?labels=raising).
 
 ## Raising Scalar Loops to Tensor IR