-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynamically add/set worker threads. #16134
Comments
High performance fork-join constructs (broadcast, reduce-barrier) typically require some setup that is dependent on the number of threads. Adding/removing threads would require teardown and recreation, and while this is certainly not impossible, I haven't run into a situation where this would be helpful. Can you describe such a situation? |
I have 2 use cases where this would be helpful.
|
Any reason this can't be done by allocating space for all potential threads up to the number of physical cores and then allowing the number of those actually used to be ramped up and down dynamically? |
Probably be too costly (remember KNL) this way, but like I said, it isn't that hard to do. At least in the threading infrastructure. I'm not sure about the GC or anything in the runtime. @yuyichao, @vtjnash: can we realloc |
Runtime shouldn't have too much trouble if the scheduler can support this (what does it mean if a worker thread request itself to be deleted?). |
@vtjnash Mentioned to me that the ability of dynamically launching threads + setting up ptls state would maybe enable calling (asynchronously) into Julia from C/C++ in a thread-safe manner. |
A potential use case for this: Running multiple cooperating Julia HPC applications on a multi-core server. For example: If an application runs by itself, it can use all the CPUs. If a second application is launched, the "polite" thing to do would be for the 1st to halve its number of threads (at the nearest "convenient point", i.e. some task end), and for the 2nd application to only request half the number of CPUs. When the 1st application ends, the 2nd application can increase its number of threads to take over all the CPUs. In general, assume:
Such a system would maximize CPU utilization, while providing "fairness" between multiple applications. Both are desired features in a server used by multiple users. In fact, these properties are desired even when the server is used by a single user, but the workflow invokes different Julia applications for different processing steps, possibly in parallel. This is not to say that the above behavior should be built into Julia (though that would be very nice, at least as an option ;-). Rather, the point is that if someone wanted to implement such a run-time, the ability to dynamically adjust the number of threads would be required or at least very helpful. Currently the only way to implement such a system would be for each application to spawn the maximal number of threads and "pause" the excess threads. I'm not certain how to effectively "pause" a thread in Julia (and be able to later wake it up). From OS perspective, the "good old way" would be to send that process a STOP signal, but I'm not certain how Julia would respond to stopping just one thread of a multi-process application. Another option would be for the process to use a pthread_mutex, I'm not certain this is accessible from within a single Julia thread w/o impacting the other threads. Given that launching an application is a "heavy" operation, the cost of starting/stopping threads at such times is not an excessive overhead. Assuming it is possible to "pause" a thread, the question becomes whether the extra cost of maintaining such some "paused" threads for the total duration of the application execution is higher than the overhead or occasionally starting/stopping threads. The simplest API would be |
I am quoting @mbauman from #32770, which was closed instead of this issue:
Yep its exactly this point. In a UI application I want to run something in the background and do not want to think about with how many threads my application has been started. I do not want to tell my users "you need to startup Julia with this special number of arguments, otherwise the program will deadlock". |
Another use case: Easier use of packages that take advantage of multithreading by non-experts. If I produce a package or script that takes advantage of multithreading, the user has to:
For multiprocessing, this is very easy:
We should be able to do the same thing for multithreading, no? |
Except for Julia 1.x LTS (which will soon be dropped and 1.6 next LTS), you do not need to use environment variables as of Julia 1.5, rather -t (e.g. -tauto) startup option. |
@PallHaraldsson - true, but even a command line argument is potentially opaque to naive users. They're given lots of ways to automatically start the REPL and figuring out how to change that is a pain point when trying to set themselves up. |
As an example, I've never tried using the multi threading support mostly because I can't be bothered to figure out how to set the number of threads in VSCode. Even if I figured this out, the first thing I'd want to do is play around with performance testing with different number of threads. I'm sure there's other ways to do this than restarting Julia, but it seems burdensome to figure this out, so I think there's a good argument from just the "discovery" aspect for supporting this. |
I think it would be fine to allow the user to e.g. add new threads in the REPL. What I don't think is fine is allowing any Julia package to modify the number of threads. If e.g. I start my Julia session with 4 threads, I want it to remain at exactly 4 threads. I had a reason for picking that number. I don't want to have to worry that one or more of the packages that I use will modify this number of threads. For example, what if I install and use three packages, and all three of them have different opinions on how many threads I should have? How many threads do I end up with? Which package gets to have the final say? The problem is: I can't think of a way to implement this functionality such that it can be used by the user in the REPL but cannot be used inside a package. |
Another use case: Try to determine the speedup as a function of threads within a single Julia program -> currently impossible. |
Maybe what's needed isn't an ability to create threads per se, but to request X worker threads be available for use, which Julia can guarantee will be available up to sys.CPU_THREADS or the maximum set in the argument/environment variable? It would then look to the code like it only has X worker threads, but under the hood it can have >= X worker threads. Though I imagine there would be potential drawbacks in a multiprocessing scenario where someone might want to ensure the child processes each use only one thread? |
I disagree: I don't tend to worry how many threads MS Office starts, and users of BLAS/LAPACK tend to not worry about how many threads they start either. Why should Julia packages be different? If it's an issue of "competing threads" than one should follow the BLAS/LAPACK and the packages should allow users the ability to explicit specify the number of threads. |
I don't run multiple processes of Microsoft Office on multiple nodes across an HPC cluster. I do run multiple Julia processes per node on multiple nodes. If I have a set of SLURM allocations across a set of nodes, and each SLURM allocation gives me e.g. four cores, and I start one Julia worker process per SLURM allocation, then if I make the decision that each Julia process has four threads, then Julia should respect this decision. |
You probably also don't run Julia UI packages on multiple nodes across an HPC cluster... I don't think what it is / is not possible in Julia should be dictated by just your use case: packages should be allowed to do what they want to do, and if they use too many threads, then you can either not use the package or make a PR that allows you to control the number of threads. |
This is easier said than done. Certainly I can avoid using the package as a top-level dependency. But it could still end up as an indirect (recursive) dependency.
Again, how does this work if the package is an indirect (recursive) dependency? I would now have to add this package as a direct dependency just to be able to control the number of threads?
I would argue that running Julia in high-performance scientific computing settings (e.g. HPC clusters, often with job schedulers, in a distributed setting) is a core use case of the language, and thus should be taken into consideration when modifying the language. One of the big goals of multithreading in Julia is to be composable - package A doesn't need to know anything about the threading behavior of package B, but both package A and package B can use multithreading, and Julia will make everything compose together nicely. In my opinion, if package A and package B can independently create and destroy threads, this goes against the idea that "package A doesn't need to know anything about the threading behavior of package B". |
If Package A is a UI package, whose threads are not doing heavy lifting, then why should package B care how many threads it creates, anymore than it should care if the user also has Microsoft Office open? If Package A is designed for HPC, it should probably behave like it does currently, and not create threads on its own. |
PS Package A can always call some C code and create as many threads as it pleases... so removing this functionality from Julia doesn't actually help you... |
FWIW I got around this limitation with a heavy-handed approach by using multiple @everywhere procs begin
Base.MainInclude.eval(quote
using Pkg
Pkg.activate($$project_path)
using SymbolicRegression # name of module
end)
end (which can conveniently be done inside a function of your module) The very difficult part is I need to manually copy the definition of externally defined functions (e.g., a user passes in an operator they have defined, as an argument) to your new processes. I have this function to do it: function copy_definition_to_workers(op, procs, options::Options)
name = nameof(op)
src_ms = methods(op).ms # Thanks https://discourse.julialang.org/t/easy-way-to-send-custom-function-to-distributed-workers/22118/2
@everywhere procs @eval function $name end #defines functions
for m in src_ms
@everywhere procs @eval $m #defines methods
end
end Overkill but it works! So I can call my function via: SymbolicRegression.EquationSearch(..., procs=4) and it will occupy 4 cores, even though the user ran without multithreading and without multiprocessing turned on. In the same running session, you can then call: SymbolicRegression.EquationSearch(..., procs=8) and it will now occupy 8 cores instead, and dynamically close the workers it allocated at the end. |
I just had another use case, when I build a Julia module to a binary via PackageCompiler, I won't be able to set the number of threads from this binary anymore, unless I write some C code or set the environment variable. it would make it much more convenient if we can set threads dynamically |
You could also set |
@KristofferC thanks! this is in C code tho right? I currently do this from C in this way indeed, but it would be nice if we don't need to touch C at all... |
Said Sachs:
Is there any prospect for setting jl_n_threads from within Julia? That would be more convenient than setting JULIA_NUM_THREADS before starting Julia.
Said Kelman:
Sounds like a reasonable feature request to me, to have a Threads.getnum and Threads.setnum (or equivalent names) API.
The text was updated successfully, but these errors were encountered: