-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partr thread support for openblas #32786
Comments
We now have algorithms in DifferentialEquations.jl which utilize simultaneous implicit methods to enhance the parallelizability of small stiff ODEs and DAEs (i.e. <= 20 ODEs). Right now we'll just document that the user should probably set the BLAS threads to 1, but once this PR is in this algorithm can serve as a very good test case / showcase of why PARTR mixed into BLAS is useful. |
This is a fairly straightforward project for someone who doesn't mind diving in and seeing how it was done in FFTW. I will certainly try it out if nobody gives it a shot in a few weeks. |
In the long run, it would be good if partr had a documented C API for spawn/wait, which would give us a lot more flexibility in integrating it with external libraries like this. |
Do you think this something that will require changes to OpenBLAS upstream and/or compiling OpenBLAS with specific options? Just checking from a packager perspective. |
Yes, we probably have to work with OpenBLAS upstream |
I'm also implementing the FFTW strategy of a pluggable threading backend for Blosc (Blosc/c-blosc2#81). I think we can make a strong argument to upstream developers that their libraries should use this kind of strategy where possible, because it allows easy composability not only with Julia's partr, but also with Intel's TBB and other threading schedulers. It also seems possible to do this with minimal patches in cases where they have already implemented their own threading. |
I think it's attractive to implement this as a runtime option, in addition to existing threading options rather than instead of them, as I did for FFTW and Blosc. That is, we add a single exec_blas(num, queue) {
if (threads_callback) {
// pass work to the callback function
return;
}
// parallelize normally
} This has three advantages:
|
Regarding the
I'm not sure why the "other work" can't simply be added to the queue of parallel tasks, and let the runtime worry about load-balancing. |
I posted a very early draft of the requisite changes at OpenMathLib/OpenBLAS#2255 |
Actually, I thought of an even easier way to implement |
Removing milestone since this certainly wasn't release blocking for 1.3 and neither will be for 1.4 or 1.x. |
I'm confused. I thought that now that we switched to a time-based release schedule with 1.x releases, nothing is release-blocking, so should then all the remaining issues be removed from 1.4 milestone as well? |
friendly bump on this one. new AMD processors have a ton of threads but I can't take much advantage of PARTR until it works nicely with OpenBLAS since my loops all have various LAPACK calls in them (and I also have standalone LAPACK calls outside of loops that ought to still use all threads) |
Increasingly, a lot of libraries in Yggdrasil BB are using openmp, and many of them call BLAS. I suspect that we are increasingly going to see multi-threading clashes between julia threads, pthreaded libraries (openblas), and openmp. The fewer of these we can use the better! I also learnt that if MKL enters the picture, it is yet another library - tbb. |
cc @kpamnany |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Here are some notes from digging into the openblas codebase (with @stevengj) to enable partr threading support.
exec_blas
is called by all the routines. The code pattern followed is setting up the work queue and callingexec_blas
to do all the work through an openmp pragma.exec_blas_async
functions.The easiest way may be to modify the openmp threading backend, which seems amenable to something like the fftw partr backend. To start with, we should ignore lapack threading. We could probably just implement an
exec_blas_async
fallback that callsexec_blas
(and makeexec_blas_async_wait
a no-op).All of this should work on windows too, although the going through the openmp build route may need some work on the makefiles.
The patch to FFTW should be indicative of something similar to be done for the openblas build.
The text was updated successfully, but these errors were encountered: