fix: multi-device execution and sharding [take III] #713

avik-pal · 2025-02-08T21:36:42Z

avik-pal · 2025-02-08T21:43:55Z

dont merge some more fixes needed

avik-pal · 2025-02-08T22:05:31Z

beautiful error on TPUs

ERROR: INTERNAL: RET_CHECK failure (third_party/tensorflow/compiler/xla/service/spmd/spmd_partitioner.cc:5245) !HasReplicatedSharding(hlo->sharding()) || CanSideEffectingHaveReplicatedSharding(hlo) side-effect HLO cannot have a replicated sharding: %cust
om-call.8 = f32[12,2]{1,0} custom-call(f32[12,2]{1,0} %sine.1), custom_call_target="xla.sdy.FuncResultSharding", custom_call_has_side_effect=true, sharding={replicated}, frontend_attributes={xla.sdy.sharding="#sdy.sharding_per_value<[<@mesh, [{}, {}], re
plicated={\"data\", \"model\"}>]>"}, metadata={op_name="custom-call.9"}

avik-pal · 2025-02-09T21:02:09Z

beautiful error on TPUs

ERROR: INTERNAL: RET_CHECK failure (third_party/tensorflow/compiler/xla/service/spmd/spmd_partitioner.cc:5245) !HasReplicatedSharding(hlo->sharding()) || CanSideEffectingHaveReplicatedSharding(hlo) side-effect HLO cannot have a replicated sharding: %cust
om-call.8 = f32[12,2]{1,0} custom-call(f32[12,2]{1,0} %sine.1), custom_call_target="xla.sdy.FuncResultSharding", custom_call_has_side_effect=true, sharding={replicated}, frontend_attributes={xla.sdy.sharding="#sdy.sharding_per_value<[<@mesh, [{}, {}], re
plicated={\"data\", \"model\"}>]>"}, metadata={op_name="custom-call.9"}

Just to note. This is only on TPUs, the same code works fine on GPUs

avik-pal · 2025-02-10T03:53:24Z

Remaining changes are only julia side so hopefully no more JLL building needed

deps/ReactantExtra/WORKSPACE

avik-pal · 2025-02-11T02:30:03Z

This is now ready, we just need the JLL build to go through. Once that is done I will test with TPUs

wsmoses · 2025-02-11T06:03:13Z

very minor but technically if we want to be correct, hlo is the low level post xla IR, and mhlo (aka mlir hlo) and now stablehlo is what we have before

re naming of code_hlo and code_mhlo lol

we don't need to fix here though

avik-pal · 2025-02-11T13:46:52Z

very minor but technically if we want to be correct, hlo is the low level post xla IR, and mhlo (aka mlir hlo) and now stablehlo is what we have before

@code_mhlo actually still prints the mhlo not the xla IR. But it is useful since all the shardy ops are expanded into custom calls.

module @reactant_fn_test1 attributes {mhlo.frontend_attributes = {xla.sdy.meshes = "{mesh = #sdy.mesh<[\\\22x\\\22=1, \\\22y\\\22=2]>}"}, mhlo.num_partitions = 2 : i64, mhlo.num_replicas = 1 : i64} {
  func.func @main(%arg0: tensor<12x4x16xf32> {mhlo.frontend_attributes = {xla.sdy.sharding = "#sdy.sharding<@mesh, [{\\\22x\\\22}, {}, {\\\22y\\\22}]>"}, mhlo.sharding = "{devices=[1,1,2]<=[2]}"}) -> (tensor<f32>, tensor<12x4x1xf32>, tensor<12x4x16xf32> {mhlo.sharding = "{devices=[1,1,2]<=[2]}"}, tensor<12x4x16xf32>) {
    %0 = mhlo.constant dense<1.000000e+00> : tensor<f32>
    %1 = "mhlo.broadcast_in_dim"(%0) <{broadcast_dimensions = dense<> : tensor<0xi64>}> : (tensor<f32>) -> tensor<16x4x12xf32>
    %2 = mhlo.constant dense<0.000000e+00> : tensor<f32>
    %3 = "mhlo.transpose"(%arg0) <{permutation = dense<[2, 1, 0]> : tensor<3xi64>}> : (tensor<12x4x16xf32>) -> tensor<16x4x12xf32>
    %4 = mhlo.add %3, %3 : tensor<16x4x12xf32>
    %5 = mhlo.add %3, %1 : tensor<16x4x12xf32>
    %6 = mhlo.multiply %5, %4 : tensor<16x4x12xf32>
    %7 = mhlo.reduce(%4 init: %2) applies mhlo.add across dimensions = [0, 1, 2] : (tensor<16x4x12xf32>, tensor<f32>) -> tensor<f32>
    %8 = mhlo.reduce(%4 init: %2) applies mhlo.add across dimensions = [0] : (tensor<16x4x12xf32>, tensor<f32>) -> tensor<4x12xf32>
    %9 = "mhlo.transpose"(%8) <{permutation = dense<[1, 0]> : tensor<2xi64>}> : (tensor<4x12xf32>) -> tensor<12x4xf32>
    %10 = mhlo.reshape %9 : (tensor<12x4xf32>) -> tensor<12x4x1xf32>
    %11 = "mhlo.transpose"(%5) <{permutation = dense<[2, 1, 0]> : tensor<3xi64>}> : (tensor<16x4x12xf32>) -> tensor<12x4x16xf32>
    %12 = "mhlo.transpose"(%6) <{permutation = dense<[2, 1, 0]> : tensor<3xi64>}> : (tensor<16x4x12xf32>) -> tensor<12x4x16xf32>
    %13 = mhlo.custom_call @xla.sdy.FuncResultSharding(%11) {has_side_effect = true, mhlo.frontend_attributes = {xla.sdy.sharding = "#sdy.sharding_per_value<[<@mesh, [{\\\22x\\\22}, {}, {\\\22y\\\22}]>]>"}} : (tensor<12x4x16xf32>) -> tensor<12x4x16xf32>
    return %7, %10, %13, %12 : tensor<f32>, tensor<12x4x1xf32>, tensor<12x4x16xf32>, tensor<12x4x16xf32>
  }
}

We should definitely rename code_hlo to return the actual HLO module at some point. Though I need to check how to get that without dumping it to a file.

avik-pal · 2025-02-11T20:29:21Z

This is now ready to go!

wsmoses approved these changes Feb 8, 2025

View reviewed changes

avik-pal changed the title ~~fix: use different API~~ fix: mutli-device execution and sharding [take III] Feb 8, 2025

avik-pal marked this pull request as draft February 8, 2025 22:15

avik-pal force-pushed the ap/fixes branch from a3c61d5 to a723449 Compare February 8, 2025 22:21

avik-pal changed the title ~~fix: mutli-device execution and sharding [take III]~~ fix: multi-device execution and sharding [take III] Feb 8, 2025

avik-pal linked an issue Feb 9, 2025 that may be closed by this pull request

shardy functions not visible on macos #714

Closed

avik-pal force-pushed the ap/fixes branch from e5cc1be to fb116cd Compare February 9, 2025 14:20

avik-pal mentioned this pull request Feb 10, 2025

[ReactantExtra] feat: OpSharding bindings for Julia #721

Merged

avik-pal changed the base branch from main to ap/jll February 10, 2025 03:51

Base automatically changed from ap/jll to main February 10, 2025 04:06

avik-pal force-pushed the ap/fixes branch 2 times, most recently from 965da33 to dfb63c0 Compare February 10, 2025 16:54

avik-pal commented Feb 10, 2025

View reviewed changes

deps/ReactantExtra/WORKSPACE Outdated Show resolved Hide resolved

avik-pal force-pushed the ap/fixes branch from 85cb887 to cc49a02 Compare February 10, 2025 19:25

avik-pal marked this pull request as ready for review February 11, 2025 02:29

avik-pal force-pushed the ap/fixes branch 2 times, most recently from 2e1b67a to 0f6f397 Compare February 11, 2025 16:33

avik-pal mentioned this pull request Feb 11, 2025

feat: don't unroll Recurrence LuxDL/Lux.jl#1209

Merged

avik-pal added 5 commits February 11, 2025 14:58

fix: use different API

0613909

test: add simple 8 device test

cf69cbf

fix: correct result attributes

d8e6727

fix: try running tests on single device

abd34a0

chore: remove unused field

2e499c4

avik-pal and others added 12 commits February 11, 2025 14:58

fix: attributes

ecff58e

feat: expose code_mhlo and initial OpSharding

ccff66d

feat: add API to fetch XLA OpSharding

97844f7

feat: add convertion from JLOpSharding to OpSharding

eeff684

feat: load OpSharding of the outputs from XLA

4435396

fix: don't force replicated sharding

6211e78

refactor: store internal shardinfo as tuples

136cdf0

fix: make sure to reverse the dims

b8841fc

feat: convert OpSharding to correct sharding info

724eab5

chore: bump jll

5a24703

test: more tests fixed

e4ef063

chore: bump jll

8c14768

avik-pal force-pushed the ap/fixes branch from 0f6f397 to 8c14768 Compare February 11, 2025 19:59

wsmoses merged commit 90b0d1d into main Feb 11, 2025
36 of 39 checks passed

wsmoses deleted the ap/fixes branch February 11, 2025 20:41

glou-nes mentioned this pull request Mar 7, 2025

Thunk Change glou-nes/Reactant.jl#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: multi-device execution and sharding [take III] #713

fix: multi-device execution and sharding [take III] #713

Uh oh!

avik-pal commented Feb 8, 2025 •

edited

Loading

Uh oh!

avik-pal commented Feb 8, 2025

Uh oh!

avik-pal commented Feb 8, 2025

Uh oh!

avik-pal commented Feb 9, 2025

Uh oh!

avik-pal commented Feb 10, 2025

Uh oh!

Uh oh!

avik-pal commented Feb 11, 2025

Uh oh!

wsmoses commented Feb 11, 2025

Uh oh!

avik-pal commented Feb 11, 2025 •

edited

Loading

Uh oh!

avik-pal commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

fix: multi-device execution and sharding [take III] #713

fix: multi-device execution and sharding [take III] #713

Uh oh!

Conversation

avik-pal commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avik-pal commented Feb 8, 2025

Uh oh!

avik-pal commented Feb 8, 2025

Uh oh!

avik-pal commented Feb 9, 2025

Uh oh!

avik-pal commented Feb 10, 2025

Uh oh!

Uh oh!

avik-pal commented Feb 11, 2025

Uh oh!

wsmoses commented Feb 11, 2025

Uh oh!

avik-pal commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avik-pal commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

avik-pal commented Feb 8, 2025 •

edited

Loading

avik-pal commented Feb 11, 2025 •

edited

Loading