-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-threaded (@threads) dotcall/broadcast? #19777
Comments
I don't think this is something that should go into the The beauty of Simon's approach is that it is general and works for distributed arrays, GPU arrays, and native arrays. I consed that it might feel a bit unnatural for native arrays, and I wouldn't be opposed to add a macro that transforms |
Regardless of the name of the macro, it would be nice to have something that just involved a decorator and didn't involve re-allocating or wrapping all of your arrays in some other type. This is the big distinction between threads and GPUs or distributed-memory — with threads, you don't need to decide in advance to put your data on a GPU or in another process. |
I'd suggest having a macro similar to fastmath, that replaces the |
Would it be too crazy for broadcast to be implicitly parallel once threading is stable? |
Already, |
Could this get a 1.0 milestone? At least some kind of macro would be very useful, if not implicit parallelism (or a mixture: default implicit parallelism, which can be overridden with a macro). Since broadcasting is such a cool feature, this would really complete the story. |
cc @lkuper |
If anyone wants to tackle this (developing this outside of base at first is probably a good idea) my roadmap/ideas would be.
For other inspiration take a look at parallel collections in Scala http://docs.scala-lang.org/overviews/parallel-collections/overview.html (which were the inspiration for Java 8). |
Just as an info: I'm already getting good speed ups out of |
I like the idea of letting the user choose.
Thank You. |
It would be interesting if the heuristic could be applied to arbitrary loops via some macro as well. |
@ChrisRackauckas, parallelizing arbitrary loops is precisely what the @vchuravy, I'm not sure I like the idea of a special array type, vs. just a |
I was asking if there could be a way to apply whatever implicit parallelism heuristic to a loop. Essentially a macro for "multithread this if the size of the array is greater than x" or whatever is involved in the heuristic, and have the options be tweakable. Then broadcast would just be essentially applying that with the defaults. Your proposal just has a macro, but I'm wondering if implicit parallelism can be added as well. Otherwise I could see applications wanting to have a bunch of conditionals to check if multithreading should be ran? That last part is dependent on the overhead of multithreading (which I found to be measurable in many small problems, but the benchmarks may be mixed up with #15276 issues). |
@ChrisRackauckas, it's not the size of the array, but the expense of the loop iterations that matters. There's also the issue of load balancing if the loop iterations have unequal cost. I agree that you want to automate this (both deciding how many threads to use and how to load balance) to the extent possible. My understanding is that Cilk (and subsequently OpenMP) mostly solved this issue. Anyway, I see that as orthogonal to this issue. |
Ref. #18278 (comment) |
Related #1802 (or is this a duplicate?) |
@stevengj , There is a lot to consider whether or not to multi thread a loop. |
It would be nice to initially have a simple-case version of the macro as explained in the first post (for cases like function threadedcos(x::AbstractArray)
out = similar(x)
Threads.@threads for i in eachindex(x)
out[i]=cos(x[i])
end
return out
end |
using GPUArrays you can automatically accelerate broadcast.
|
I wonder if it wouldn't be possible to have some sort of option to switch out which function is used for . broadcasting. Something like |
Ref. the discussion around #16285 (comment). Best! |
Do note that this should not be the default or a global setting, not before we require every functions to be threadsafe at least. No guarantee on execution order is a very weak requirement compare to thread safe. |
It should probably be noted here for those looking for this feature that such a macro has been implemented intoStrided.jl. |
How hard would this be to do? It would be really nice to get a ~10x speedup with a macro for lots of the easy cases. |
It would be nice to be able to put
@threads
in front of a dot call, e.g.@threads X .= f.(Y)
, and have it call a multi-threaded version ofbroadcast
(which would assumef
is thread-safe).The text was updated successfully, but these errors were encountered: