diff --git a/docs/tutorial/reduction.rst b/docs/tutorial/reduction.rst index 45ba478f47..83ada10962 100644 --- a/docs/tutorial/reduction.rst +++ b/docs/tutorial/reduction.rst @@ -30,7 +30,7 @@ Implementing reductions on GPUs requires a basic understanding of the :doc:`/und Synchronizing parallel threads of execution across a GPU is crucial for correctness as the partial results can't be synchronized before they manifest. Synchronizing all the threads running on a GPU at any given time is possible, however, it is a costly and intricate operation. If synchronization is not absolutely necessary, map the parallel algorithm so that multiprocessors and blocks can make independent progress and need not sync frequently. -There are ten reduction implementations in the `rocm-examples `_, which are described in the following sections. +There are ten reduction implementations in the `rocm-examples `_, which are described in the following sections. Naive shared reduction ----------------------