feat: use parameter shardings from XLA #743

avik-pal · 2025-02-13T22:33:38Z

strangely enough dot_general is giving me incorrect results when using sharding. Every other operator I tested with the exact same sharding setup gives correct result.

Once the JLL builds I will test it out on a tpu pod to verify this isn't some weird behavior originating from --xla_force_host_platform_device_count=8

This should also unblock PRONTOLab/GB-25#8 (comment)

avik-pal · 2025-02-14T16:19:40Z

forgot to export some names for mac. Will fix in next JLL.

avik-pal · 2025-02-14T16:19:59Z

An even simpler sharding case that gives incorrect results

# Currently an extremely simple test
using Reactant, Test

const addressable_devices = Reactant.addressable_devices()

mesh = Sharding.Mesh(reshape(collect(Int64, 0:3), (2, 2)), ("data", "model"))

# samples_sharding = Sharding.NamedSharding(mesh, (nothing, "data"))
w1_sharding = Sharding.NamedSharding(mesh, ("model", nothing))
# w2_sharding = Sharding.NamedSharding(mesh, ("data", nothing))

# samples = reshape(collect(Float32, 1:84), 7, 12)
w1 = reshape(collect(Float32, 1:4), 2, 2)
w2 = reshape(collect(Float32, 1:4), 2, 2)

w1_ra = Reactant.to_rarray(w1; sharding=w1_sharding)
w2_ra = Reactant.to_rarray(w2; sharding=w1_sharding)

@code_xla *(w2_ra, w1_ra)

# @jit *(w2_ra, w1_ra)

avik-pal · 2025-02-14T16:33:16Z

julia> @jit fn_test2(x_ra)
2025-02-14 10:32:46.155582: I external/xla/xla/service/spmd/shardy/shardy_xla_pass.cc:306] Using Shardy for XLA SPMD propagation.
2025-02-14 10:32:46.229661: I external/xla/xla/hlo/utils/hlo_sharding_util.cc:3063] There is no registered layout_canonicalization_callback.
4×4 ConcreteRArray{Float32, 2, 8, Reactant.Sharding.ShardInfo{Reactant.Sharding.NamedSharding{2, 8, Tuple{Nothing, Nothing}, 2}, NTuple{8, Tuple{UnitRange{Int64}, UnitRange{Int64}}}}}:
  2.0   8.0  11.0  17.0
  8.0  14.0  17.0  23.0
 11.0  17.0  20.0  26.0
 17.0  23.0  26.0  32.0

julia> fn_test2(x)
4×4 Matrix{Float32}:
  2.0   7.0  12.0  17.0
  7.0  12.0  17.0  22.0
 12.0  17.0  22.0  27.0
 17.0  22.0  27.0  32.0

julia> @code_xla fn_test2(x_ra)
2025-02-14 10:33:00.715410: I external/xla/xla/service/spmd/shardy/shardy_xla_pass.cc:306] Using Shardy for XLA SPMD propagation.
2025-02-14 10:33:00.788563: I external/xla/xla/hlo/utils/hlo_sharding_util.cc:3063] There is no registered layout_canonicalization_callback.
HloModule reactant_fn_test2, is_scheduled=true, entry_computation_layout={(f32[2,1]{1,0})->f32[4,4]{1,0}}, num_partitions=8

%fused_computation (param_0.2: f32[4,4], param_1: f32[4,4]) -> f32[4,4] {
  %param_0.2 = f32[4,4]{1,0} parameter(0)
  %param_1 = f32[4,4]{1,0} parameter(1)
  %add.2 = f32[4,4]{1,0} add(f32[4,4]{1,0} %param_0.2, f32[4,4]{1,0} %param_1), metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  %transpose.6 = f32[4,4]{0,1} transpose(f32[4,4]{1,0} %add.2), dimensions={1,0}, metadata={op_name="transpose.4"}
  ROOT %copy.4 = f32[4,4]{1,0} copy(f32[4,4]{0,1} %transpose.6), metadata={op_name="transpose.4"}
}

ENTRY %main.0_spmd (param: f32[2,1]) -> f32[4,4] {
  %param = f32[2,1]{1,0} parameter(0), sharding={devices=[2,4]<=[4,2]T(1,0)}, metadata={op_name="Arg_0.1"}
  %bitcast = f32[1,2]{0,1} bitcast(f32[2,1]{1,0} %param), metadata={op_name="Arg_0.1"}
  %bitcast.2 = f32[2,1]{0,1} bitcast(f32[2,1]{1,0} %param), sharding={devices=[2,4]<=[4,2]T(1,0)}, metadata={op_name="Arg_0.1"}
  %all-gather = f32[1,4]{0,1} all-gather(f32[1,2]{0,1} %bitcast), channel_id=1, replica_groups=[4,2]<=[8], dimensions={1}, use_global_device_ids=true, metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  %all-gather.2 = f32[2,4]{0,1} all-gather(f32[2,1]{0,1} %bitcast.2), channel_id=3, replica_groups=[2,4]<=[4,2]T(1,0), dimensions={1}, use_global_device_ids=true, metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  %bitcast.1 = f32[1,4]{1,0} bitcast(f32[1,4]{0,1} %all-gather), metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  %copy.2 = f32[2,4]{1,0} copy(f32[2,4]{0,1} %all-gather.2), metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  %all-gather.1 = f32[4,4]{1,0} all-gather(f32[1,4]{1,0} %bitcast.1), channel_id=2, replica_groups=[2,4]<=[4,2]T(1,0), dimensions={0}, use_global_device_ids=true, metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  %all-gather.3 = f32[4,4]{1,0} all-gather(f32[2,4]{1,0} %copy.2), channel_id=4, replica_groups=[4,2]<=[8], dimensions={0}, use_global_device_ids=true, metadata={op_name="add" source_file="/home/avik-pal/reactant/Reactant.jl/src/Ops.jl" source_line=266}
  ROOT %transpose_copy_fusion = f32[4,4]{1,0} fusion(f32[4,4]{1,0} %all-gather.1, f32[4,4]{1,0} %all-gather.3), kind=kLoop, calls=%fused_computation, metadata={op_name="transpose.4"}
}

avik-pal · 2025-02-14T17:28:23Z

foo(x) = x .+ x'

x = reshape(collect(Float32, 1:4), 2, 2)

x_ra = Reactant.to_rarray(
    x;
    sharding=Sharding.NamedSharding(
        Sharding.Mesh(reshape(collect(Int64, 0:3), (2, 2)), ("data", "model")),
        ("data", nothing),
    ),
)

@code_xla foo(x_ra)

@jit foo(x_ra)

src/Ops.jl

avik-pal · 2025-02-16T17:38:03Z

src/Sharding.jl

+    tmp = Reactant.ConcreteRArray(
+        ones(sharding_and_shape.shape); sharding=LazySharding(sharding_and_shape.sharding)
+    )
+    _, exec, _, _, _ = Reactant.Compiler.compile_xla(internal_simple_op, (tmp,))
+    return XLA.CondensedOpSharding(only(XLA.get_parameter_shardings(exec)))


This is not the most ideal solution, but is guaranteed to be correct. After GB I will see if there is a nicer way to do this

avik-pal · 2025-02-16T17:42:24Z

Locally tests pass. We need a new JLL before CI is green

avik-pal added 3 commits February 13, 2025 16:31

feat: use parameter shardings from XLA

be479ae

chore: bump jll

f6f814d

fix: handle dimensions

9e2b1f1

avik-pal mentioned this pull request Feb 14, 2025

Renaming macros #744

Open

feat: access hlo module from julia

58c393f

avik-pal added 9 commits February 15, 2025 08:06

fix: roundtrip from OpSharding

b40e4b0

Merge remote-tracking branch 'origin' into ap/wider_support_sharding

f7972f0

fix: minor patch

d887922

chore: bump jll

cbc550d

fix: more fixes

b3288de

fix: more progress towards correct results

7db9eae

Merge remote-tracking branch 'origin' into ap/wider_support_sharding

4b0faf2

Merge remote-tracking branch 'origin' into ap/wider_support_sharding

3dea7cc

refactor: roundtrip from XLA to get mhlo shardings

d1a26d2

github-actions bot reviewed Feb 16, 2025

View reviewed changes

src/Ops.jl Outdated Show resolved Hide resolved

avik-pal commented Feb 16, 2025

View reviewed changes

avik-pal added 2 commits February 16, 2025 11:40

fix: ordering of the tile_assignment

1e95760

Merge branch 'main' into ap/wider_support_sharding

5f8f131

avik-pal marked this pull request as ready for review February 16, 2025 17:42

avik-pal added 3 commits February 16, 2025 11:43

chore: remove unused function

3dbb282

chore: bump jll

f0d7485

test: only run tests if correct number of devices are present

cdbc466

avik-pal requested a review from wsmoses February 16, 2025 20:49

wsmoses approved these changes Feb 16, 2025

View reviewed changes

wsmoses merged commit 20f7a3c into main Feb 16, 2025
35 of 39 checks passed

wsmoses deleted the ap/wider_support_sharding branch February 16, 2025 22:39

giordano mentioned this pull request Feb 17, 2025

Failed sharding test on macOS #763

Closed

glou-nes mentioned this pull request Mar 7, 2025

Thunk Change glou-nes/Reactant.jl#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use parameter shardings from XLA #743

feat: use parameter shardings from XLA #743

Uh oh!

avik-pal commented Feb 13, 2025 •

edited

Loading

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

Uh oh!

avik-pal Feb 16, 2025

Uh oh!

avik-pal commented Feb 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: use parameter shardings from XLA #743

feat: use parameter shardings from XLA #743

Uh oh!

Conversation

avik-pal commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

avik-pal commented Feb 14, 2025

Uh oh!

Uh oh!

avik-pal Feb 16, 2025

Choose a reason for hiding this comment

Uh oh!

avik-pal commented Feb 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avik-pal commented Feb 13, 2025 •

edited

Loading