Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding reduce to Ops.jl #840

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

tharittk
Copy link
Contributor

@tharittk tharittk commented Mar 3, 2025

I deliberately did not test on non-commutative operators—it will fail when tested against Julia's reduce. I read the discussion on the internet that it is supposed to be so, i.e., reduce should take a commutative operator; otherwise, the result will be non-deterministic, especially in a parallel system.

Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately did not test on non-commutative operators—it will fail when tested against Julia's reduce.

Does that mean that the reduce method introduced here has different semantics than Julia's reduce? That'd be unideal, asi it can lead to hard-to-track bugs when people write generic code and Reactant suddenly changes their meaning (similar for example to #755)

@mofeing
Copy link
Collaborator

mofeing commented Mar 3, 2025

one thing to note is that methods in Ops should replicate the semantics of StableHLO (and possibly other MLIR dialects), not Julia.

the adaptation from StableHLO to Julia semantics is done in the method specializations outside the Ops module.

@giordano
Copy link
Member

giordano commented Mar 3, 2025

one thing to note is that methods in Ops should replicate the semantics of StableHLO (and possibly other MLIR dialects), not Julia.

Ah, yes, that's a good point. This is not extending Base.reduce, so that's fine (although the same name is slightly confusing 😅).

@mofeing
Copy link
Collaborator

mofeing commented Mar 3, 2025

... (although the same name is slightly confusing 😅).

yep 😅, but it's because it wraps the stablehlo.reduce op. as a rule of thumb, you should not import Ops but call them explicitly like Ops.reduce to avoid confusion.

@@ -2313,4 +2313,50 @@ Produces a [`Reactant.MLIR.Dialects.sdy.sharding_constraint`](@ref) operation wi
end
end

@noinline function reduce(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow up with a PR to update the mapreduce impl to call this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll do that

@avik-pal
Copy link
Collaborator

avik-pal commented Mar 4, 2025

the cuda reduce tests are failing

@tharittk
Copy link
Contributor Author

tharittk commented Mar 4, 2025

Based on my initial investigation, it seems to be about how the init_values is handled. In case of CPU - via Reactant.set_default_backend("cpu"), there is no issue.

But when using GPU, if the init_values is set to be 1 for * and 0 for +, for example, it will be OK. Other than that, it looks like the GPU applies init_values multiple times.

For example, if we start with
A = [1 3; 2 4;;; 5 7; 6 8;;; 9 11; 10 12]

with init_values = 2, and dim = [3] then
we have [(15+2) (21+2); (18+2) (24+2)] = [17 23; 20 26] -- both CPU and GPU version agree

with init_values = 2, and dim = [1, 3] then
We should have [(33 + 2) (45+2)] -- init_value only applied once (this is what Julia's reduce() gives also)
but with stablehlo + GPU, we get [(17+20) (23+26)] -- effectively init_values is applied twice

I am not sure if this is a sematic issue or something else.

What I think happens is stablehlo broadcasts the init_values at the beginning and do reduce while Julia's reduce does the reduce and then broadcasts the init_values to whatever size is after reduce at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants