Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cupyx.scipy.signal.fftconvolve #3828

Merged
merged 14 commits into from
Dec 21, 2020
Merged

Conversation

coderforlife
Copy link
Contributor

This adds cupyx.scipy.signal.fftconvolve along with support for method='fft' in cupyx.scipy.signal.convolve and cupyx.scipy.signal.correlate. This function fully supports n-dimensional data.

This PR does not include adjustments to choose_conv_method or _fftconv_faster to intelligently choose between the two methods, that will be done in a future PR.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you post benchmark results? Thanks!

def _freq_domain_conv(in1, in2, axes, shape, calc_fast_len=False):
# See scipy's documentation in scipy.signal.signaltools
# TODO: cupyx.scipy.fftpack.get_fft_plan may be useful, however:
# * only complex-to-complex planning is possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. We've supported all kinds on planning in get_fft_plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this function a few months ago, maybe back then it only worked for complex-to-complex or I was misusing it. I will remove that comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I think the R2C/C2R Nd plan was supported also recently. Though I think they are harder to reuse by nature and perhaps my plan cache #3730 would serve better than explicitly managing plans here.

@coderforlife
Copy link
Contributor Author

This is run on a Titan V GPU and an Intel Xeon Gold 5122 (3.6 GHz).

Some benchmarks (in seconds) are as follows for a in1.size == 16777216 and in2.size == 8192. Columns:

  1. numpy.convolve
  2. scipy.signal.convolve with method='direct'
  3. scipy.signal.convolve with method='fft'
  4. cupy.convolve
  5. cupyx.scipy.signal.convolve with method='direct'
  6. cupyx.scipy.signal.convolve with method='fft'
mode 1 2 3 4 5 6
'full' 21.16 21.17 1.601 0.009914 0.2568 0.009545
'valid' 25.96 25.95 1.591 0.009872 0.2533 0.009501
'same' 25.98 25.98 1.590 0.009782 0.2567 0.009492

Clearly the FFT methods are faster and cupy.convolve is using FFTs. The new FFT convolution is just mildly faster than the current cupy.convolve FFT method (about 3-4% so not much).

On the small side with in1.size == 8192 and in2.size == 2048:

mode 1 2 3 4 5 6
'full' 0.002368 0.002339 0.0002659 0.0007699 0.0003114 0.001072
'valid' 0.001397 0.001395 0.0002823 0.0007692 0.0003703 0.001084
'same' 0.002211 0.002186 0.0002787 0.0007601 0.0003113 0.001064

For numpy/scipy the FFT method is faster but in cupyx.scipy direct is actually faster by this point (about 1/3 of the time). At this point, cupy.convolve is still using FFT (since it is only based on data types) and is a bit faster than the new FFT method but not by much. In any case the direct method is still the best. Note: I have noticed that 0.0010 is the absolute minimum for FFT in the new method - basically the fixed overhead.

cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
return axes


def _freq_domain_conv(in1, in2, axes, shape, calc_fast_len=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calc_fast_len is always True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I have also written up oaconvolve this does pass calc_fast_len as False. I could remove it for now just to add it back when oaconvolve is added.

cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
cupyx/scipy/signal/_signaltools_core.py Outdated Show resolved Hide resolved
@asi1024
Copy link
Member

asi1024 commented Sep 17, 2020

Jenkins, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit add37ef, target branch master) failed with status FAILURE.

@asi1024
Copy link
Member

asi1024 commented Sep 17, 2020

@coderforlife Could you check the Jenkins failure?

@coderforlife
Copy link
Contributor Author

It looks like there are 2 tests that fail, both when using float32 and the actual and desired outputs are just barely missing the tolerances. I will relax the tolerances for float32s for this test.

@kmaehashi kmaehashi added the st:awaiting-author Awaiting response from author label Sep 29, 2020
@asi1024
Copy link
Member

asi1024 commented Oct 20, 2020

@coderforlife Any updates?

@asi1024
Copy link
Member

asi1024 commented Dec 18, 2020

Jenkins, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit bb4d644, target branch master) failed with status FAILURE.

@coderforlife
Copy link
Contributor Author

I forgot which values were off by and by how much and Jenkins info was removed due to being too long ago. Was able to use the newest results and hopefully will pass this one last time. Thanks!

@leofang
Copy link
Member

leofang commented Dec 19, 2020

I forgot which values were off by and by how much and Jenkins info was removed due to being too long ago. Was able to use the newest results and hopefully will pass this one last time. Thanks!

Hi @coderforlife, in case you don't know already, or if needed, it is now possible to set tolerance per dtype: #4269.

@coderforlife
Copy link
Contributor Author

I should the reset the tolerances so it is more strict for the float64, don't run the code through Jenkins yet, I will update soon.

@coderforlife
Copy link
Contributor Author

It should be better now! Can we run it on Jenkins (which needs different tolerances than on my machine in some cases so may fail even though it is working on my own machine). Thanks!

@leofang
Copy link
Member

leofang commented Dec 19, 2020

Happy to help! (I haven't read the changes as I am not in a right condition.)

Jenkins, test this please

@coderforlife
Copy link
Contributor Author

One question about the next step when I add oaconvolve: in cupy.core.internal._normalize_axis_indices it only accepts tuples as sequences. Is there any reason to not also accept lists and possibly any other iterables? All that would be require would be to change the elif condition on line 424 to something like not isinstance(axes, (tuple, list)) or not isinstance(axes, collections.abc.Iterable) or possibly something a bit more complex like:

else:
    try:
        iter(axes)
    except TypeError:
        axes = axes,

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 9d7bc1e, target branch master) succeeded!

@asi1024
Copy link
Member

asi1024 commented Dec 21, 2020

Is there any reason to not also accept lists and possibly any other iterables?

It is because numpy does not support non-tuple axes. More specifically, for 2-dim x of numpy.ndarray, numpy.sum(x, axis=[1, 2]) and numpy.sum(x, axis=range(2)) raises TypeError.

@asi1024
Copy link
Member

asi1024 commented Dec 21, 2020

LGTM!

@asi1024 asi1024 merged commit 666d68a into cupy:master Dec 21, 2020
@asi1024 asi1024 added this to the v9.0.0b1 milestone Dec 21, 2020
@asi1024 asi1024 removed the st:awaiting-author Awaiting response from author label Dec 21, 2020
@coderforlife
Copy link
Contributor Author

It is because numpy does not support non-tuple axes. More specifically, for 2-dim x of numpy.ndarray, numpy.sum(x, axis=[1, 2]) and numpy.sum(x, axis=range(2)) raises TypeError.

Okay! Thanks, fixed the oaconvolve code itself to generate tuples for axes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants