-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient pyfftw implementation: order of magnitude speedups possible! #27
Comments
Here are some more updates on profiling.
In the plot below, I show the runtime for reconstructing various numbers of holograms using different functions within The default in shampoo thus far has been Question for @jkentwallace and @LaurentRDC: is there any reason not to use one of these multithreading options by default? If not, I'll work on migrating shampoo's |
Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :) In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now. |
Gents,
These are great suggestions. I would say full steam ahead.
Also, I've mentioned this in passing before, but I think there is a
discreet Fourier transform method that can also improve speed.
I'll futz with it, and try to provide a good example.
Just wanna say it feels like we're getting there guys - thanks to your hard
work.
…-Kent
On Thu, Nov 24, 2016 at 1:52 PM Laurent P. René de Cotret < ***@***.***> wrote:
Since FFTW is written in C, multithreading should yield a performance
boost (unlike Python). I don't see why not :)
In the GUI, I'm reconstructing on a separate core right now, and it would
be easy to extend that to many cores. Since each core has more than one
thread, it would be a good idea to implement multithreading now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AWCJ2e2I9oLCHr2Gu1h5ZRqudz5wmWqoks5rBgc0gaJpZM4K76jw>
.
|
I'm thinking I'll make |
Anyone installing Anaconda is equipped with pyFFTW, so there's at least one easy way to get it. Through conda there's no need for a C compiler, so I think making it a dependency is reasonable. |
Laurent and Brett, I've spent some time yesterday refreshing my memory on some discrete Fourier Transform calculations, and I've got a few results to show. This method allows you to reconstruct only that part of the fourier plane you're interested in. It also has benefits for reducing the effect of aliasing - which we really haven't talked about. The current bandaid we use is apodization, but this results in reduced resolution. I'll provide an example of this for Monday's meeting - and we can discuss... |
@jkentwallace @LaurentRDC – I've begun doing some profiling, and I'm looking to see where the bottlenecks are. In this test I'm reconstructing the USAF test target example file at one propagation distance:
49% of the runtime is spent within the
pyfftw
module's Fourier transform, which confirms that the secondary optimizations like doing arithmetic on masked arrays are probably secondary in importance to figuring out how to do FFTs more efficiently. Reading up onpyfftw
, it looks like there's a very specific way to uses it that allows for the speedups that I expected.In simple comparison with
scipy
's FFT, my current implementation withpyfftw
is almost an order of magnitude slower for reconstructing a single hologram – I'm going to need to spend some time understanding this issue. If you'd like to see how much quicker shampoo could be going simply by switching to scipy's FFT temporarily, comment out this block of imports and replace it with only thescipy
import line.The next slowest step is the phase unwrapping, which is being done in C wrapped in Python with this algorithm: reference, source code.
The text was updated successfully, but these errors were encountered: