Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamically add/set worker threads. #16134

Open
josefsachsconning opened this issue Apr 30, 2016 · 25 comments
Open

Support dynamically add/set worker threads. #16134

josefsachsconning opened this issue Apr 30, 2016 · 25 comments
Labels
multithreading Base.Threads and related functionality speculative Whether the change will be implemented is speculative

Comments

@josefsachsconning
Copy link
Contributor

Said Sachs:
Is there any prospect for setting jl_n_threads from within Julia? That would be more convenient than setting JULIA_NUM_THREADS before starting Julia.

Said Kelman:
Sounds like a reasonable feature request to me, to have a Threads.getnum and Threads.setnum (or equivalent names) API.

@yuyichao yuyichao changed the title Provide get/set functionality for jl_n_threads Support dynamically add/set worker threads. Apr 30, 2016
@yuyichao yuyichao added the speculative Whether the change will be implemented is speculative label Apr 30, 2016
@tkelman tkelman added the multithreading Base.Threads and related functionality label Apr 30, 2016
@kpamnany
Copy link
Contributor

kpamnany commented May 3, 2016

High performance fork-join constructs (broadcast, reduce-barrier) typically require some setup that is dependent on the number of threads. Adding/removing threads would require teardown and recreation, and while this is certainly not impossible, I haven't run into a situation where this would be helpful. Can you describe such a situation?

@josefsachsconning
Copy link
Contributor Author

I have 2 use cases where this would be helpful.

  1. I have some applications where I want a maximum of 1 thread, and other applications where I want as many threads as I have cores. It would be much more convenient for the logic implementing this choice to be in the Julia code, rather than in a wrapper script that sets the JULIA_NUM_THREADS environment variable and then invokes the Julia script.
  2. When I benchmark the effect of multithreading with varying numbers of threads, I would prefer to do it in a single Julia process rather than a different Julia process for each number of threads that I am testing.

@StefanKarpinski
Copy link
Member

Any reason this can't be done by allocating space for all potential threads up to the number of physical cores and then allowing the number of those actually used to be ramped up and down dynamically?

@kpamnany
Copy link
Contributor

kpamnany commented May 4, 2016

Any reason this can't be done by allocating space for all potential threads up to the number of physical cores and then allowing the number of those actually used to be ramped up and down dynamically?

Probably be too costly (remember KNL) this way, but like I said, it isn't that hard to do. At least in the threading infrastructure.

I'm not sure about the GC or anything in the runtime. @yuyichao, @vtjnash: can we realloc jl_all_heaps and jl_all_task_states? Would it be hard to do?

@yuyichao
Copy link
Contributor

yuyichao commented May 4, 2016

Runtime shouldn't have too much trouble if the scheduler can support this (what does it mean if a worker thread request itself to be deleted?). jl_all_heaps should be removed (merged into jl_tls_states_t) and then all access to jl_all_task_states (should mostly be in slow path like gc and signal handling anyway) needs to be protected with a lock.

@vchuravy
Copy link
Member

vchuravy commented Sep 2, 2016

@vtjnash Mentioned to me that the ability of dynamically launching threads + setting up ptls state would maybe enable calling (asynchronously) into Julia from C/C++ in a thread-safe manner.

@orenbenkiki
Copy link

A potential use case for this: Running multiple cooperating Julia HPC applications on a multi-core server.

For example: If an application runs by itself, it can use all the CPUs. If a second application is launched, the "polite" thing to do would be for the 1st to halve its number of threads (at the nearest "convenient point", i.e. some task end), and for the 2nd application to only request half the number of CPUs. When the 1st application ends, the 2nd application can increase its number of threads to take over all the CPUs.

In general, assume:

  • All the applications that "matter" are run in Julia, and use the same run-time, that coordinates between them to adjust the threads and schedule tasks.
  • The applications are implemented as a DAG of "not-too-long" tasks.
  • Whenever an application starts (or ends), the run-time adjusts the (target) number of threads in the remaining applications.
  • Adding threads allows immediately scheduling a task on each.
  • Whenever a task ends, the run-time decides whether to schedule a new one on this thread, or to terminate the thread, if the target number of threads is above the current number of threads.
  • When a thread is terminated, the run-time can launch a new thread in another "more deserving" application.

Such a system would maximize CPU utilization, while providing "fairness" between multiple applications. Both are desired features in a server used by multiple users. In fact, these properties are desired even when the server is used by a single user, but the workflow invokes different Julia applications for different processing steps, possibly in parallel.

This is not to say that the above behavior should be built into Julia (though that would be very nice, at least as an option ;-). Rather, the point is that if someone wanted to implement such a run-time, the ability to dynamically adjust the number of threads would be required or at least very helpful.

Currently the only way to implement such a system would be for each application to spawn the maximal number of threads and "pause" the excess threads. I'm not certain how to effectively "pause" a thread in Julia (and be able to later wake it up).

From OS perspective, the "good old way" would be to send that process a STOP signal, but I'm not certain how Julia would respond to stopping just one thread of a multi-process application. Another option would be for the process to use a pthread_mutex, I'm not certain this is accessible from within a single Julia thread w/o impacting the other threads.

Given that launching an application is a "heavy" operation, the cost of starting/stopping threads at such times is not an excessive overhead. Assuming it is possible to "pause" a thread, the question becomes whether the extra cost of maintaining such some "paused" threads for the total duration of the application execution is higher than the overhead or occasionally starting/stopping threads.

The simplest API would be exit_current_thread() and spawn_more_threads(n). This avoids the issue of one thread killing another in mid-work.

@tknopp
Copy link
Contributor

tknopp commented Jun 27, 2020

I am quoting @mbauman from #32770, which was closed instead of this issue:

Is not the point here that there are program structures that demand either pre-emptive (software) threads or hardware threads and would otherwise deadlock? Whether or not we call that @Spawn, do we want to support this more directly (beyond just @Assert nthreads() > 1)?

Yep its exactly this point. In a UI application I want to run something in the background and do not want to think about with how many threads my application has been started. I do not want to tell my users "you need to startup Julia with this special number of arguments, otherwise the program will deadlock".

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Aug 17, 2020

Another use case: Easier use of packages that take advantage of multithreading by non-experts.

If I produce a package or script that takes advantage of multithreading, the user has to:

  • Know what environment variables are
  • How to set an environment variable
  • Know where to set the threads option (in the case of Atom/VSCode)
  • Know how many threads their CPU can run

For multiprocessing, this is very easy:

currentworkers = addprocs(exeflags="--project") # creates a number of processes that fits the user's computer
@everywhere using MyPackage # loads code into all the processes
# work
rmprocs(currentworkers) # destroys the created processes

We should be able to do the same thing for multithreading, no?

@PallHaraldsson
Copy link
Contributor

Except for Julia 1.x LTS (which will soon be dropped and 1.6 next LTS), you do not need to use environment variables as of Julia 1.5, rather -t (e.g. -tauto) startup option.

@BioTurboNick
Copy link
Contributor

@PallHaraldsson - true, but even a command line argument is potentially opaque to naive users. They're given lots of ways to automatically start the REPL and figuring out how to change that is a pain point when trying to set themselves up.

@dlfivefifty
Copy link
Contributor

As an example, I've never tried using the multi threading support mostly because I can't be bothered to figure out how to set the number of threads in VSCode. Even if I figured this out, the first thing I'd want to do is play around with performance testing with different number of threads. I'm sure there's other ways to do this than restarting Julia, but it seems burdensome to figure this out, so I think there's a good argument from just the "discovery" aspect for supporting this.

@DilumAluthge
Copy link
Member

DilumAluthge commented Aug 17, 2020

I think it would be fine to allow the user to e.g. add new threads in the REPL.

What I don't think is fine is allowing any Julia package to modify the number of threads. If e.g. I start my Julia session with 4 threads, I want it to remain at exactly 4 threads. I had a reason for picking that number. I don't want to have to worry that one or more of the packages that I use will modify this number of threads. For example, what if I install and use three packages, and all three of them have different opinions on how many threads I should have? How many threads do I end up with? Which package gets to have the final say?

The problem is: I can't think of a way to implement this functionality such that it can be used by the user in the REPL but cannot be used inside a package.

@tknopp
Copy link
Contributor

tknopp commented Aug 18, 2020

Another use case: Try to determine the speedup as a function of threads within a single Julia program -> currently impossible.

@BioTurboNick
Copy link
Contributor

Maybe what's needed isn't an ability to create threads per se, but to request X worker threads be available for use, which Julia can guarantee will be available up to sys.CPU_THREADS or the maximum set in the argument/environment variable?

It would then look to the code like it only has X worker threads, but under the hood it can have >= X worker threads.

Though I imagine there would be potential drawbacks in a multiprocessing scenario where someone might want to ensure the child processes each use only one thread?

@dlfivefifty
Copy link
Contributor

What I don't think is fine is allowing any Julia package to modify the number of threads. If e.g. I start my Julia session with 4 threads, I want it to remain at exactly 4 threads

I disagree: I don't tend to worry how many threads MS Office starts, and users of BLAS/LAPACK tend to not worry about how many threads they start either. Why should Julia packages be different? If it's an issue of "competing threads" than one should follow the BLAS/LAPACK and the packages should allow users the ability to explicit specify the number of threads.

@DilumAluthge
Copy link
Member

I don't tend to worry how many threads MS Office starts

I don't run multiple processes of Microsoft Office on multiple nodes across an HPC cluster.

I do run multiple Julia processes per node on multiple nodes. If I have a set of SLURM allocations across a set of nodes, and each SLURM allocation gives me e.g. four cores, and I start one Julia worker process per SLURM allocation, then if I make the decision that each Julia process has four threads, then Julia should respect this decision.

@dlfivefifty
Copy link
Contributor

You probably also don't run Julia UI packages on multiple nodes across an HPC cluster... I don't think what it is / is not possible in Julia should be dictated by just your use case: packages should be allowed to do what they want to do, and if they use too many threads, then you can either not use the package or make a PR that allows you to control the number of threads.

@DilumAluthge
Copy link
Member

DilumAluthge commented Aug 18, 2020

then you can either not use the package

This is easier said than done. Certainly I can avoid using the package as a top-level dependency. But it could still end up as an indirect (recursive) dependency.

make a PR that allows you to control the number of threads

Again, how does this work if the package is an indirect (recursive) dependency? I would now have to add this package as a direct dependency just to be able to control the number of threads?

I don't think what it is / is not possible in Julia should be dictated by just your use case

I would argue that running Julia in high-performance scientific computing settings (e.g. HPC clusters, often with job schedulers, in a distributed setting) is a core use case of the language, and thus should be taken into consideration when modifying the language.

One of the big goals of multithreading in Julia is to be composable - package A doesn't need to know anything about the threading behavior of package B, but both package A and package B can use multithreading, and Julia will make everything compose together nicely.

In my opinion, if package A and package B can independently create and destroy threads, this goes against the idea that "package A doesn't need to know anything about the threading behavior of package B".

@dlfivefifty
Copy link
Contributor

If Package A is a UI package, whose threads are not doing heavy lifting, then why should package B care how many threads it creates, anymore than it should care if the user also has Microsoft Office open?

If Package A is designed for HPC, it should probably behave like it does currently, and not create threads on its own.

@dlfivefifty
Copy link
Contributor

PS Package A can always call some C code and create as many threads as it pleases... so removing this functionality from Julia doesn't actually help you...

@MilesCranmer
Copy link
Member

MilesCranmer commented Jun 20, 2021

FWIW I got around this limitation with a heavy-handed approach by using multiple Distributed workers, then hacking it to copy definitions from the running session to each process (i.e., I don't expect or need the user to run @everywhere on any of their code). Code here: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/master/src/Configure.jl. First, dynamically allocate extra processes with addprocs, then you need to manually activate those in the current environment via:

@everywhere procs begin
    Base.MainInclude.eval(quote
        using Pkg
        Pkg.activate($$project_path)
        using SymbolicRegression # name of module
    end)
end

(which can conveniently be done inside a function of your module)

The very difficult part is I need to manually copy the definition of externally defined functions (e.g., a user passes in an operator they have defined, as an argument) to your new processes. I have this function to do it:

function copy_definition_to_workers(op, procs, options::Options)
    name = nameof(op)
    src_ms = methods(op).ms # Thanks https://discourse.julialang.org/t/easy-way-to-send-custom-function-to-distributed-workers/22118/2
    @everywhere procs @eval function $name end #defines functions
    for m in src_ms
        @everywhere procs @eval $m #defines methods
    end
end

Overkill but it works! So I can call my function via:

SymbolicRegression.EquationSearch(..., procs=4)

and it will occupy 4 cores, even though the user ran without multithreading and without multiprocessing turned on. In the same running session, you can then call:

SymbolicRegression.EquationSearch(..., procs=8)

and it will now occupy 8 cores instead, and dynamically close the workers it allocated at the end.

@Roger-luo
Copy link
Contributor

Roger-luo commented Sep 20, 2021

I just had another use case, when I build a Julia module to a binary via PackageCompiler, I won't be able to set the number of threads from this binary anymore, unless I write some C code or set the environment variable. it would make it much more convenient if we can set threads dynamically

@KristofferC
Copy link
Member

I won't be able to set the number of threads from this binary anymore, unless I write some C code or set the environment variable. it would make it much more convenient if we can set threads dynamically

You could also set jl_options.nthreads I think.

@Roger-luo
Copy link
Contributor

You could also set jl_options.nthreads I think.

@KristofferC thanks! this is in C code tho right? I currently do this from C in this way indeed, but it would be nice if we don't need to touch C at all...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests