Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partr thread support #105

Merged
merged 4 commits into from
Sep 5, 2019
Merged

partr thread support #105

merged 4 commits into from
Sep 5, 2019

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jul 30, 2019

Support partr threads (JuliaLang/julia#31398) on the latest Julia master branch via FFTW/fftw3#175.

When it is using the partr backend, by default it sets the number of FFTW "threads" to 4*nthreads — we want to spawn more tasks than we have threads to help with load balancing if other stuff is running.

It also enables the thread-safe FFTW planner (which puts a mutex lock around plan creation). In the longer run, it would be better to do the locking on the Julia side, since presumably using a Julia lock would allow other Julia tasks to wake up. Closes #66.

To get the full benefit of threading, you should precompute the FFT plan via p = plan_fft(array) [or p = plan_fft(array, flags=FFTW.MEASURE) or p = plan_fft(array, flags=FFTW.PATIENT) if you want it to self-optimize the plan], rather than calling fft(array) directly.

cc @vtjnash, who tested this at JuliaCon. I think I included all of the fixes we did on your machine.

@stevengj
Copy link
Member Author

Hmm, AppVeyor is crashing on x64 with:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x1ed37c3b -- VMOVAPD_LD at C:\projects\fftw-jl\deps\usr\bin\libfftw3-3.dll (unknown line)
in expression starting at C:\projects\fftw-jl\test\runtests.jl:10
VMOVAPD_LD at C:\projects\fftw-jl\deps\usr\bin\libfftw3-3.dll (unknown line)
hc2cfdftv_20 at C:\projects\fftw-jl\deps\usr\bin\libfftw3-3.dll (unknown line)
apply_extra_iter at C:\projects\fftw-jl\deps\usr\bin\libfftw3-3.dll (unknown line)
apply_dit_dft at C:\projects\fftw-jl\deps\usr\bin\libfftw3-3.dll (unknown line)
.text at C:\projects\fftw-jl\deps\usr\bin\libfftw3-3.dll (unknown line)
unsafe_execute! at C:\projects\fftw-jl\src\fft.jl:407 [inlined]
* at C:\projects\fftw-jl\src\fft.jl:729

The AppVeyor tests are single threads (JULIA_NUM_THREADS should be 1, the default), so it is not using any of the new partr code in this PR.

I can only speculate that something went wrong in the x64 BinaryBuilder cross-compile with the latest build tools…

@ViralBShah
Copy link
Member

ViralBShah commented Aug 4, 2019

Is fixing #66 (thread-safe planner) relevant for this PR?

@stevengj
Copy link
Member Author

@ViralBShah, this PR fixes #66, which seems like a good idea for using FFTW in an environment with more pervasive threading but is not strictly required for partr support.

@stevengj
Copy link
Member Author

stevengj commented Sep 5, 2019

Switched to the new build from JuliaPackaging/Yggdrasil#53, which uses different compiler versions, thanks to @staticfloat. Let's see if that helps.

@staticfloat
Copy link
Collaborator

Sorry, I had a typo in the build.jl I gave you. Fixed now.

…t__) and explicitly call set_num_threads(1) for no-threads check since multithreaded Julia now uses multiple FFTW threads by default
@stevengj stevengj merged commit 527d076 into master Sep 5, 2019
@stevengj stevengj deleted the partr branch September 5, 2019 13:09
@antoine-levitt antoine-levitt mentioned this pull request Jan 24, 2020
galenlynch added a commit to galenlynch/DSP.jl that referenced this pull request May 24, 2020
Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl
(JuliaMath/FFTW.jl#105) has introduced a large performance regression in `conv`
whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded
FFTW transformations by default whenever Julia has more than one thread. This
new default causes small FFT problems to run much more slowly and use much more
memory. Since the overlap-save method of `conv` in DSP breaks a convolutions
into small convolutions, and therefore performs a large number of small FFTW
transformations, this change can cause convolutions to be slower by two orders
of magnitude, and similarly use two orders of magnitude more memory. While
FFTW.jl does not provide an explicit way to set the number of threads used by a
FFTW plan without changing a global variable, generating the plans with the
planning flag set to `FFTW.PATIENT` (instead of the default `MEASURE`) allows
the planner to consider changing the number of threads. Adding this flag to the
plans generated by the overlap-save convolution method seems to rescue the
performance regression on multi-threaded instances of Julia.

Fixes JuliaDSP#399
Also see JuliaMath/FFTW.jl#121
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FFTW planner is not thread safe.
3 participants