Only commutative reductions can be parallelized #6609

Elarnon · 2022-02-08T18:34:26Z

Because parallelization changes the order of computation within the reduction, parallelizing associative but non-commutative reductions can result in (non-deterministically) incorrect results in the same way reordering them can.

For instance Halide currently accepts the following code, but generates non-deterministic outputs on GPU. On CPU with .parallel(r.x), OpenMP rejects the generated code (correctly) stating that the #pragma omp atomic is invalid for the same reasons.

#include <stdio.h>

#include "Halide.h"

using namespace Halide;

int main(int argc, char **argv) {
        Halide::Func A("A"), B("B");
        Halide::Var i("i");

        A(i) = i;
        B() = -1;
        Halide::RDom r(0, 1024);
        B() = A(r.x);

        A.compute_root();
        B.update().atomic().gpu_blocks(r.x);

        B.compile_jit(get_host_target().with_feature(Target::CUDA));
        Halide::Buffer<int32_t> b = B.realize();
        printf("%d\n", b());

        return 0;
}

Because parallelization changes the order of computation within the reduction, parallelizing associative but non-commutative reductions can result in (non-deterministically) incorrect results in the same way `reorder`ing them can. For instance Halide currently accepts the following code, but generates non-deterministic outputs on GPU. On CPU with `.parallel(r.x)`, OpenMP rejects the generated code (correctly) stating that the `#pragma omp atomic` is invalid for the same reasons. ```c++ #include <stdio.h> #include "Halide.h" using namespace Halide; int main(int argc, char **argv) { Halide::Func A("A"), B("B"); Halide::Var i("i"); A(i) = i; B() = -1; Halide::RDom r(0, 1024); B() = A(r.x); A.compute_root(); B.update().atomic().gpu_blocks(r.x); B.compile_jit(get_host_target().with_feature(Target::CUDA)); Halide::Buffer<int32_t> b = B.realize(); printf("%d\n", b()); return 0; } ```

abadams · 2022-02-11T19:26:07Z

I believe this is correct. Thanks for the fix.

steven-johnson requested review from halidebuildbots and abadams February 9, 2022 17:22

abadams approved these changes Feb 11, 2022

View reviewed changes

steven-johnson merged commit 38032e8 into halide:master Feb 14, 2022

Elarnon deleted the patch-2 branch February 15, 2022 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only commutative reductions can be parallelized #6609

Only commutative reductions can be parallelized #6609

Elarnon commented Feb 8, 2022

abadams commented Feb 11, 2022

Only commutative reductions can be parallelized #6609

Only commutative reductions can be parallelized #6609

Conversation

Elarnon commented Feb 8, 2022

abadams commented Feb 11, 2022