Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power of 3 Speed up on neon #139

Open
ianfrankS opened this issue Apr 30, 2024 · 2 comments
Open

Power of 3 Speed up on neon #139

ianfrankS opened this issue Apr 30, 2024 · 2 comments

Comments

@ianfrankS
Copy link

In most FFT libraries (Including RustFFT when using non-AVX code), power-of-two FFT sizes are the fastest, and users see a steep falloff in performance when using non-power-of-two sizes. Thankfully, RustFFT using AVX acceleration is not quite as restrictive:

Any FFT whose size is of the form 2^n * 3^m can be considered the “fastest” in RustFFT.

Is there anyway to have this kind of speed-up for neon? The powers of 2 are quite restrictive.

Thank you!

@HEnquist
Copy link
Contributor

HEnquist commented May 9, 2024

What lengths do you use? Have you measured the actual speed at those lengths compared to say the next larger power of two? I often use lengths that are 2^n * 3 (so with m=1) and those are nearly as fast as 2^n.

Implementing the 3^m bits for neon would of course be possible, but it's a substantial amount of work. Afaik everyone working on this library does it in their free time.

@ejmahler
Copy link
Owner

ejmahler commented May 9, 2024

Non power of two FFT performance is absolutely one of my goals for this library.

In master (but not in any release yet), the scalar code path has a new struct called RadixN, which effectively makes power of 3, power of 5, and power of 7 first class citizens just like they are in the AVX code path.

It could be ported to the neon code path - that was my intent when I wrote it, and the only reason I haven't is that A: I'm not sure about some of the API decisions yet, and B: the free time issue already mentioned. That would give you the same 2^n*3^m optimal performance that AVX gets.

As @HEnquist said, the latest release does a pretty good job of handling any set of small mixed factors. It's worth giving your ideal size (especially if its 2^n*3^m) a try and seeing if it meet your needs, even if it isn't the fastest the library has to offer.

And if you're willing to use master instead of a release, some changes we've made result in 3*2^n being competitive with any power of two on all code paths, so you could try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants