ROCm · neon60 · Oct 4, 2024 · Sep 26, 2024
@@ -30,7 +30,7 @@ Implementing reductions on GPUs requires a basic understanding of the :doc:`/und
 
 Synchronizing parallel threads of execution across a GPU is crucial for correctness as the partial results can't be synchronized before they manifest. Synchronizing all the threads running on a GPU at any given time is possible, however, it is a costly and intricate operation. If synchronization is not absolutely necessary, map the parallel algorithm so that multiprocessors and blocks can make independent progress and need not sync frequently.
 
-There are ten reduction implementations in the `rocm-examples <https://github.com/StreamHPC/rocm-examples/tree/reduction-tutorial/Tutorials/Reduction/include/Reduction>`_, which are described in the following sections. 
+There are ten reduction implementations in the `rocm-examples <https://github.com/ROCm/rocm-examples/tree/develop/Tutorials/reduction/include/Reduction>`_, which are described in the following sections. 
 
 Naive shared reduction
 ----------------------