Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only commutative reductions can be parallelized #6609

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

Elarnon
Copy link
Contributor

@Elarnon Elarnon commented Feb 8, 2022

Because parallelization changes the order of computation within the reduction, parallelizing associative but non-commutative reductions can result in (non-deterministically) incorrect results in the same way reordering them can.

For instance Halide currently accepts the following code, but generates non-deterministic outputs on GPU. On CPU with .parallel(r.x), OpenMP rejects the generated code (correctly) stating that the #pragma omp atomic is invalid for the same reasons.

#include <stdio.h>

#include "Halide.h"

using namespace Halide;

int main(int argc, char **argv) {
        Halide::Func A("A"), B("B");
        Halide::Var i("i");

        A(i) = i;
        B() = -1;
        Halide::RDom r(0, 1024);
        B() = A(r.x);

        A.compute_root();
        B.update().atomic().gpu_blocks(r.x);

        B.compile_jit(get_host_target().with_feature(Target::CUDA));
        Halide::Buffer<int32_t> b = B.realize();
        printf("%d\n", b());

        return 0;
}

Because parallelization changes the order of computation within the reduction, parallelizing associative but non-commutative reductions can result in (non-deterministically) incorrect results in the same way `reorder`ing them can.

For instance Halide currently accepts the following code, but generates non-deterministic outputs on GPU.  On CPU with `.parallel(r.x)`, OpenMP rejects the generated code (correctly) stating that the `#pragma omp atomic` is invalid for the same reasons.

```c++
#include <stdio.h>

#include "Halide.h"

using namespace Halide;

int main(int argc, char **argv) {
        Halide::Func A("A"), B("B");
        Halide::Var i("i");

        A(i) = i;
        B() = -1;
        Halide::RDom r(0, 1024);
        B() = A(r.x);

        A.compute_root();
        B.update().atomic().gpu_blocks(r.x);

        B.compile_jit(get_host_target().with_feature(Target::CUDA));
        Halide::Buffer<int32_t> b = B.realize();
        printf("%d\n", b());

        return 0;
}
```
@abadams
Copy link
Member

abadams commented Feb 11, 2022

I believe this is correct. Thanks for the fix.

@steven-johnson steven-johnson merged commit 38032e8 into halide:master Feb 14, 2022
@Elarnon Elarnon deleted the patch-2 branch February 15, 2022 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants