Skip to content

Commit

Permalink
Workaround for performance regression introduced by FFTW
Browse files Browse the repository at this point in the history
Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl
(JuliaMath/FFTW.jl#105) has introduced a large performance regression in `conv`
whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded
FFTW transformations by default whenever Julia has more than one thread. This
new default causes small FFT problems to run much more slowly and use much more
memory. Since the overlap-save method of `conv` in DSP breaks a convolutions
into small convolutions, and therefore performs a large number of small FFTW
transformations, this change can cause convolutions to be slower by two orders
of magnitude, and similarly use two orders of magnitude more memory. While
FFTW.jl does not provide an explicit way to set the number of threads used by a
FFTW plan without changing a global variable, generating the plans with the
planning flag set to `FFTW.PATIENT` (instead of the default `MEASURE`) allows
the planner to consider changing the number of threads. Adding this flag to the
plans generated by the overlap-save convolution method seems to rescue the
performance regression on multi-threaded instances of Julia.

Fixes JuliaDSP#399
Also see JuliaMath/FFTW.jl#121
  • Loading branch information
galenlynch committed May 24, 2020
1 parent f53fe27 commit 2253e37
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions src/dspbase.jl
Original file line number Diff line number Diff line change
Expand Up @@ -301,17 +301,19 @@ unnormalized.
bufsize = ntuple(i -> i == 1 ? nffts[i] >> 1 + 1 : nffts[i], N)
fdbuff = similar(u, Complex{T}, NTuple{N, Int}(bufsize))

p = plan_rfft(tdbuff)
ip = plan_brfft(fdbuff, nffts[1])
# PATIENT flag needed if Julia has more than one thread (See #339)
p = plan_rfft(tdbuff, flags = FFTW.PATIENT)
ip = plan_brfft(fdbuff, nffts[1], flags = FFTW.PATIENT)

tdbuff, fdbuff, p, ip
end

@inline function os_prepare_conv(u::AbstractArray{<:Complex}, nffts)
buff = similar(u, nffts)

p = plan_fft!(buff)
ip = plan_bfft!(buff)
# PATIENT flag needed if Julia has more than one thread (See #339)
p = plan_fft!(buff, flags = FFTW.PATIENT)
ip = plan_bfft!(buff, flags = FFTW.PATIENT)

buff, buff, p, ip # Only one buffer for complex
end
Expand Down

0 comments on commit 2253e37

Please sign in to comment.