Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent API for embarrassingly parallel routines between levels of parallelism #17887

Open
ChrisRackauckas opened this issue Aug 8, 2016 · 11 comments
Labels
multithreading Base.Threads and related functionality parallelism Parallel or distributed computation

Comments

@ChrisRackauckas
Copy link
Member

It seems like it would be natural for @threads loops to allow for a reduction parameter, matching what's done for @parallel. In fact, it seems natural enough that the documentation has to make a specific mention that there isn't one. I propose that it be pretty much the same as @parallel, except over threads.

@amitmurthy
Copy link
Contributor

There is also a need for a higher level API that works across process and threads. Just thinking out loud here:

  • @threads parallelizes using threads
  • @parallel parallelizes using workers across nodes
  • @parfor is a new macro that first splits the range over workers and within each worker further uses threads. The split is dependent on the number of workers and number of threads in each worker. For nprocs()==1, @parfor is equivalent to @threads. For numthreads()==1 on the master and workers, it is equivalent to @parallel.

User code will only ever use @parfor and it will leverage both workers and threads as the case may be.

@kshyatt kshyatt added the multithreading Base.Threads and related functionality label Aug 8, 2016
@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Aug 8, 2016

That would be amazing, a simple abstraction beyond threading and multiprocessing.

Maybe this should be expanded to be about standardized tooling for embarrassingly parallel routines. For multiprocessing we have @parallel and pmap. Would a pbroadcast be reasonable as well (that would require some kind of memory sharing like a SharedArray though)? In the same sense, we have @threads. I think it would be helpful to have tmap and tbroadcast (it would make #1802 easy for one to implement on their own even if it wasn't the standard base way). And then as mention, have a @parfor, a map, and a broadcast that are smart to split evenly across workers and threads as you describe, using the threads and multiprocessing constructs. And having all of these with similar APIs would make it easy to work between the levels.

For naming, I think that instead of @parfor, the top user-facing macro which builds off of both should be called @parallel. It's more intuitive. It would make sense for @parallel and p to mean this nicely abstracted parallelism, @threads and t to mean thread level, and @workers w (or @multiprocess and m) to mean multi-process level.

@ChrisRackauckas ChrisRackauckas changed the title Reductions for @threads Consistent API for embarrassingly parallel routines between levels of parallelism Aug 8, 2016
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 9, 2016

It would be an annoying deprecation, but I would propose this is a much better naming scheme:

  • @threaded: parallelizes using threads
  • @distributed: distributes across worker nodes
  • @parallel: threaded and distributed

But I lost an argument about calling distributed stuff "parallel" a long time ago, and now I'm not sure it would be worth going through the multi-version deprecation and renaming this would require.

@Sacha0
Copy link
Member

Sacha0 commented Aug 9, 2016

Do I understand correctly that the distinction is between threads and processes rather than threads and nodes? If so, alongside 'thread' might some form of the word 'process' be more accurate than 'distribute', processes not necessarily being distributed across nodes? Forgive my ignorance. Best!

@ChrisRackauckas
Copy link
Member Author

Yes, it's more of a distinction between threads and processes. You can have multiple independent processes running on the same computer (or node), so it's not necessarily what is usually meant by distributed (although it can do distributed).

But the word "process" wouldn't be smart if we want to extend the map and broadcast functions to each level, and use a naming scheme like I proposed (appending one character in front of map and broadcast). For example, would pmap be parallel map or process map?

@eschnett
Copy link
Contributor

eschnett commented Aug 9, 2016

In the far future (say, a year from now), threading will work out of the box and will be efficient. I assume people will then basically want to use threading all the time when they are using distributed computing, e.g. to handle latencies. Thus the case "distributed, but not threaded" doesn't seem terribly important -- it is important now, but probably won't be in the future.

This would then lead to people using threaded and parallel, but in practice never using distributed.

I'd thus suggest to go for threaded and distributed, where distributed implies threaded when threading is enabled.

@oxinabox
Copy link
Contributor

oxinabox commented Aug 15, 2016

Beyond bikeshedding (I personally like @threaded, @distributed, and @parallelized), the implementation of this is fairly simple.
It is fairly easy to turn the current implementation of @threads for into a mapreduce.
And from there it is just the cascading.

Things that are needed to make it easier

  • set the number of threads in a worker process, using addprocs (I don't think we can do this right now, without some hacks around ssh. Possibly nicer to make number of threads a argument to the julia program, which defaults to the env var JULIA_NUM_THREADS.). (Setting env. variables like JULIA_NUM_THREADS for remote workers #18074)
  • default to 1 process per machine, with many threads.
  • A way to know how many threads a process has. (Possibly just a alias for remotecall_fetch(()->ENV["JULIA_NUM_THREADS"], pid), possibly just something that is saved when addprocs is done.)

The last point is needed, because we probably want to support asymmetric clusters, at least in terms of number of processors (if not interms of speed). I know my normal cluster is 12core + 12core + 4 core.
And my old cluster of lab machines was 4+4+4+4+8+16.

@oschulz
Copy link
Contributor

oschulz commented Nov 17, 2016

Thus the case "distributed, but not threaded" doesn't seem terribly important -- it is important now, but probably won't be in the future.

I don't think that's true in all cases: Sure, many user applications will just want stuff to be run in parallel using both multiple hosts and multiple threads. But more complex applications will sometimes need more control over what is done via threads and what is done distributed: For example, data partitioning/placement may may have to be taken into account. Or a complex algorithm may choose to distribute an outer loop (that is not sensitive to latency), but run an inner loop (e.g. a latency-sensitive one) on threads, with several layers of code in between.

@robsmith11
Copy link
Contributor

robsmith11 commented Dec 12, 2019

Now that Threads has matured a bit, has there been any more thought to supporting similar functionality as Distributed?

For example, I (perhaps naively) am surprised not to see an equivalent Threads function for Distributed's pmap. I've been using this simple function, which seems to work well enough for my purposes:

function tmap(f, xs::AbstractArray)
    g = Base.Generator(f,xs)
    et = Base.@default_eltype(g)
    a = Array{et}(undef, length(xs))
    Threads.@threads for i in 1:length(xs)
        a[i] = f(xs[i])
    end
    a
end

@tkf
Copy link
Member

tkf commented Jan 9, 2020

FYI, Transducers.jl supports "two-level" parallelism as of v0.4.11; i.e., each worker process uses multiple threads for executing reduce. It then give us a superset of pmap that can be fused with arbitrary stateless processing like filtering and flattening. I think this already gives us a uniform API for (not so embarrassingly) parallel (and sequential) computations executed in different backends (Base.Threads and Distributed).

See also:

@ViralBShah ViralBShah added the parallelism Parallel or distributed computation label Jul 3, 2020
@ViralBShah
Copy link
Member

Seems like this issue is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests