Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@batch with ARM #88

Open
efaulhaber opened this issue Aug 6, 2022 · 7 comments · May be fixed by #131
Open

@batch with ARM #88

efaulhaber opened this issue Aug 6, 2022 · 7 comments · May be fixed by #131

Comments

@efaulhaber
Copy link

efaulhaber commented Aug 6, 2022

While experimenting with @batch on an M1 Macbook, I noticed that there is no performance improvement when using Julia x86 with Rosetta. With the new Julia 1.8.0-rc3 for macOS ARM, however, the same code runs more than 3x faster with @batch.
However, when using @batch on some loops, I get this error, while it works perfectly fine on other loops:

ERROR: cfunction: closures are not supported on this platform
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Polyester/BI7zH/src/batch.jl:25 [inlined]
  [2] batch_closure
    @ ~/.julia/packages/Polyester/BI7zH/src/batch.jl:18 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/Polyester/BI7zH/src/batch.jl:127 [inlined]
  [4] _batch_no_reserve
    @ ~/.julia/packages/Polyester/BI7zH/src/batch.jl:111 [inlined]
  [5] batch
    @ ~/.julia/packages/Polyester/BI7zH/src/batch.jl:318 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/Polyester/BI7zH/src/closure.jl:373 [inlined]

Is that fixable in Polyester.jl? Or is there some workaround?

@chriselrod
Copy link
Member

This is a platform specific problem. Those loops should work on Rosetta.

But Polyester implements a workaround (which is why many do work): avoid cfunction closures by not closing in over anything, passing used objects as arguments instead.

So the fix would be to find out why this is failing for some loops.

@efaulhaber
Copy link
Author

Thanks for the quick reply.
I experimented a bit and found a workaround and some interesting behaviour with UnPack.jl.

My initial function looked like this (pseudocoded):

@unpack some, arrays = some_datastructure

@batch for i in something
    # Do something with some and arrays
end

This works once I replaced the "do something" code with a function barrier, unpacking the datastructure in the new function:

@batch for i in something
    do_something(some_datastructure)
end

@inline function do_something(some_datastructure)
    @unpack some, arrays = some_datastructure

    # Do something with some and arrays
end

Now, I tried to avoid the additional function by just unpacking inside the loop:

@batch for i in something
    @unpack some, arrays = some_datastructure

    # Do something with some and arrays
end

Interestingly, I now get an UndefVarError: some not defined, which does not happen without using @batch. Is this behaviour expected somehow?

@chriselrod
Copy link
Member

Is this behaviour expected somehow?

It's a bug, obviously.

But as @batch is a macro, it's expected that giving it different expressions could lead to different results.
If the macro has a bug, then making the expression simpler is likely to avoid it.

You can use @macroexpand @batch to see the transformed code.
Adding print statements to src/closure.jl can also help you figure out what is going wrong, and then make a PR to fix the problem.

@daviehh
Copy link

daviehh commented Feb 16, 2023

Also ran into this issue, looks like it may not just be the code @batch generates? Not exactly "minimum", but a small example showing the behavior:

using Polyester

function apply_fx(r, fx, x)
    @batch for i in eachindex(r, x)
        r[i] = fx(x[i])
    end
end

x = rand(10)
r = similar(x)
apply_fx(r, sin, x) # this works

struct foo
    a
end

(p::foo)(x) = x + p.a

p = foo(1)

@batch for i in eachindex(r, x)
    r[i] = p(x[i])
end
# ^^^  this also works

apply_fx(r, p, x)  # <- this errors w/ ERROR: cfunction: closures are not supported on this platform

maybe like it's due to

q = if Base.issingletontype(F)

where for sin type Base.issingletontype(F) is true but not for p above? but it's a bit strange that

@batch for i in eachindex(r, x)
    r[i] = p(x[i])
end

does not error..

@chriselrod chriselrod changed the title @batch with macOS ARM @batch with ARM Feb 19, 2023
@chriselrod
Copy link
Member

Note that this example isn't mac-specific

julia> using Polyester

julia> function apply_fx(r, fx, x)
           @batch for i in eachindex(r, x)
               r[i] = fx(x[i])
           end
       end
apply_fx (generic function with 1 method)

julia> x = rand(10)
10-element Vector{Float64}:
 0.03132866340686413
 0.6027004038817649
 0.6456645117379025
 0.4063888285618241
 0.4856265149123564
 0.1628396868191755
 0.6747107409909057
 0.2654926160007658
 0.43564136962596123
 0.14818287906086158

julia> r = similar(x)
10-element Vector{Float64}:
 0.0
 1.390654747103734e-309
 1.390654747103734e-309
 0.0
 1.39066128553932e-309
 1.390654747101046e-309
 0.0
 1.39066128553932e-309
 1.39066128558122e-309
 0.0

julia> apply_fx(r, sin, x) # this works

julia> struct foo
           a
       end

julia> (::foo)(x) = x + p.a

julia> p = foo(1)
foo(1)

julia> @batch for i in eachindex(r, x)
           r[i] = p(x[i])
       end

julia> # ^^^  this also works
       
       apply_fx(r, p, x)
ERROR: cfunction: closures are not supported on this platform

julia> versioninfo()
Julia Version 1.10.0-DEV.501
Commit 2a9f4412a0 (2023-02-05 19:46 UTC)
Platform Info:
  OS: Linux (aarch64-unknown-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)

Anyway, this should generally be fixable via storing and loading the closure objects.
If someone else wants to take a stab at it, I could describe the approach.
Otherwise, I'll get around to fixing this eventually.

@sloede
Copy link

sloede commented May 13, 2023

@chriselrod any chance you have found a solution for this issue yet 😬? I'm asking since we had another User stumbling on this issue in trixi-framework/Trixi.jl#1457. Given the increase in memory usage of Julia with each new release (see the post on the #ttfx channel in the Julia Slack), a Robust and fast multithreading implementation in Julia is thus becoming more and more important...

@chriselrod
Copy link
Member

Polyester tries hard to avoid creating closures by passing in everything that's needed as an argument. If you have examples, you can see why it fails and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants