Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real FFT? #28

Open
Boscop opened this issue Dec 29, 2020 · 41 comments
Open

Real FFT? #28

Boscop opened this issue Dec 29, 2020 · 41 comments

Comments

@Boscop
Copy link

Boscop commented Dec 29, 2020

Thanks for this great crate!
It would be great to have a faster way for doing a real to complex fft (such as from time to frequency domain), which is a very common use case.
It would approximately halve the computation time compared to doing it via a complex to complex fft.
https://www.dspguide.com/ch12/1.htm
For my use case I need to replace another fft crate (which uses cmake) because I want to cross compile to mac os, and I only need the real to complex fft for pow2 size.

@HEnquist
Copy link
Contributor

I needed the same thing and ended up making this: https://crates.io/crates/realfft

It's a simple wrapper of RustFFT that does real to complex fft, and complex to real ifft. And yes it runs about twice as fast.

@Polochon-street
Copy link

Thanks for this great crate as well 😄

Just want to mention that this was maybe up next after this AVX stabilization (awelkie/RustFFT#4 (comment)), crossing fingers.

In any case, I'm using realFFT for my project as well, just thought it'd be great to have this directly in the RustFFT crate :)

@ejmahler
Copy link
Owner

ejmahler commented Jan 4, 2021

Once 5.0 is out, I plan to turn my attention to RustDCT for a little while. I don't think I'll be implementing much AVX stuff there, but I at least want to get it compiling with 5.0, and using the same scratch space setup that rustfft uses.

After that, I can start looking into real-only FFTs.

What sort of API do you exactly need for a real-only FFT? What's the application? Do you need access to complex numbers on one side of the FFT, but not the other? Or is it real-only on both sides?

@HEnquist
Copy link
Contributor

HEnquist commented Jan 4, 2021

I think these two are by far most common use cases:

  • Transform to get spectrum of real valued data. Needs real-to-complex fft.
  • Transform real valued data, modify the spectrum, and transform back to real-valued data. Needs real-to-complex fft and complex-to-real ifft.

For even lengths, it's possible to use a complex-to-complex (i)fft of half the length, but I haven't found any such trick for odd lengths. That means that odd length real fft will most likely be much more work.

The existing implementations in other libraries do things in a few ways.
Let's start with a 6 element long real vector: [x0r, x1r, x2r, x3r, x4r, x5r]

The simple approach of just converting it to complex becomes: [(x0r, 0), (x1r, 0, (x2r, 0), (x3r, 0), (x4r, 0, (x5r, 0)]

FFT:in x becomes: [(X0r, X0i), (X1r, X1i), (X2r, X2i), (X3r, X3i), (X4r, X4i), (X5r, X5i)]

Because x was real, this becomes a bit simpler: [(X0r, 0), (X1r, X1i), (X2r, X2i), (X3r, 0), (X2r, -X2i), (X1r, -X1i)]

The last two values are redundant, so let's remove them: [(X0r, 0), (X1r, X1i), (X2r, X2i), (X3r, 0)]

Starting from here, I have seen these different solutions:

  1. Return this complex vector, [(X0r, 0), (X1r, X1i), (X2r, X2i), (X3r, 0)]. Advantage: easy to understand, can be directly multiplied with another spectrum (for convolution etc). Downside: Stores two zeros, means output is longer than input.
  2. Put X3r in the place of X0i: [(X0r, X3r), (X1r, X1i), (X2r, X2i)]. Advantage: same length as input (can be used for in-place transforms). Downside: can't be directly multiplied, can be confusing.
  3. (worst one!) Just squeeze out the zeros: [(X0r, X1r), (X1i, X2r), (X2i, X3r)]. Advantage: same length as input (can be used for in-place transforms). Downside: can't be directly multiplied, highly confusing.

I think in-place real-to-complex/complex-to-real transforms are a bad idea, as it means the complex values of the spectrum have to be split and stored as separate real values in the input vector. In RealFFT I went with option 1, and only out-of-place transforms.

As a starting point I would propose to add the same functionality as RealFFT has:

  • Only even lengths
  • Only out-of-place transforms
  • "Easy" data order
  • Change the direction enum to a transform type enum with variants C2CForward, C2CInverse, R2CForward, C2RInverse

@ejmahler
Copy link
Owner

ejmahler commented Jan 4, 2021

There's a #4: fftw's R2HC format

@HEnquist
Copy link
Contributor

HEnquist commented Jan 4, 2021

Oh, didn't see that one. Not sure which one is worst then, my #3 or the R2HC format...

@Boscop
Copy link
Author

Boscop commented Jan 5, 2021

I've been using FFTS until now, The Fastest Fourier Transform in the South (it's faster than FFTW), specifically I've been using the 1d real function for real-to-complex FFT:
https://github.com/linkotec/ffts/blob/2c8da4877588e288ff4cd550f14bec2dc7bf668c/include/ffts.h#L96

The output of a real-to-complex transform is N/2+1 complex numbers,
where the redundant outputs have been omitted.

@ejmahler @HEnquist See also this: https://www.smcm.iqfr.csic.es/docs/intel/mkl/mkl_manual/fft/fft_PackedFormats.htm
There are 3 ways to pack the complex output in-place into the input slice, to avoid allocations:
CCS, Pack and Perm.
CCS uses 2 more elements ([1] and [n+1], which are 0) so it can't be used to write the output back into the input slice.
Pack and Perm use exactly as many elements as the input, but if you compare the indices at which the output for an input ends up, Pack is easier to misuse, because the (R, I) pairs are not at even indices.
IMO, Perm makes the most sense because it uses exactly as many elements as the input and the i-th complex output is at the index 2*i, with the only exception that R[n/2] is stored at index 1.

@HEnquist
Copy link
Contributor

HEnquist commented Jan 5, 2021

I can just mention that the RealFFT crate is now updated to use RustFFT 5.0.

@Cocalus
Copy link

Cocalus commented Jan 7, 2021

For my use case I convert a real to complex then take the magnitude of the first half the resulting output.

        // Switch to Complex input
        for i in 0..FFT_LEN {
            self.0[i].re = input[i];
            self.0[i].im = 0.0;
        }
        FFT_FULL.process_with_scratch(&mut self.0[..], &mut self.1[..]);

        // Calculate the magnitude
        for i in 0..FFT_LEN/2 {
            input[i] = self.0[i].norm();
        }

Where Self is

/// FFT from the `rustfft` crate
pub struct RsFft(Box<[Complex32; FFT_LEN]>, Box<[Complex32; FFT_LEN]>);

When I last bench marked with rustfft 4.0, IPP (using the ipp-sys crate) was much faster for that I used the functions
ippsFFTFwd_RToPerm_32f then ippsMagnitude_32fc. But it complicated the build a lot so I didn't use it by default, I suspect if rust-fft had equivalent to ippsFFTFwd_RToPerm_32f, it would probably now be fast enough for me to drop ipp.

@HEnquist
Copy link
Contributor

HEnquist commented Jan 7, 2021

The main thing I dislike with the inplace solutions is that they all offer very poor ergonomics since they need to store complex values in a real vector. The advantages of the Complex type are lost.
@Cocalus what lengths are you typically using? Can they be odd? How much faster was ipp?

@ejmahler
Copy link
Owner

ejmahler commented Jan 7, 2021 via email

@Cocalus
Copy link

Cocalus commented Jan 7, 2021

I was using 4096, I think IPP only supports powers of two. I retested with my case of real to magnitudes at 4096. In my benchmark IPP 2019 is about 5.5x faster than RustFFT 5.0. With 10000 iterations and only initializing the FFTs once (plan_fft_forward + ippsFFTInit_R_32f), rustfft took 175ms and IPP took 31ms.

RustFFT has to move things into and out of the real buffer on top of twice as the effective FFT size. So adding a Real API should probably fix a decent chunk of that difference. The IPP magnitude calculation is also vectorized which probably helps. I could probably upload the hacked together benchmark if some wants to poke at it, but I think you need to register an email with intel to download IPP.

@HEnquist
Copy link
Contributor

HEnquist commented Jan 7, 2021

Have you looked at the RealFFT crate? At 4096 the fft should be about twice as fast, and you can skip the conversion to complex.

@ejmahler
Copy link
Owner

ejmahler commented Jan 7, 2021

To explain my earlier comment in more detail:

The core values of rust are performance, safety, productivity. The ideal real FFT API would have all three.

I think we're all uncomfortable with the "keep the data inplace and do some weird trick to pack the results" output formats because it would heavily sacrifice productivity.

I think it also hurts safety. Users will have to write loops with random array access to pull data out of the packed formats - and in my experience, it's easy to write loops like that correctly, but hard to write them in a way that avoids bounds checks. Many users will feel tempted to write things like get_unchecked().

So we're talking about doing the half-complex output instead - but I think that also hurts productivity and safety, just less so. Users won't have to remember a bizarre layout, but as I think about it, I think we're asking the average RustFFT user too much if we ask them to remember "we omit the last n/2 - 1 elements of the array, if you want them, iterate from element 1 to element n / 2 - 1 in reverse order and negate them all".

So why not make the default output be the full complex array? Do the reversal and negation ourselves, and return a size-N array instead of size-N/2+1, so that the caller doesn't have to remember the exact specifics of which subsection of the array to reverse.

Once we have that idea, my next thought is, how often do callers need the output in an array? In the case of RustDCT, you compute the DCT2 with an inner real-only FFT. Once you have the full complex output, you do some extra processing anyways, and the final output is just real numbers. So you never actually store the complex output, you only keep it around long enough to compute the real output, and then throw away the complex output.

@HEnquist
Copy link
Contributor

HEnquist commented Jan 7, 2021

I on the other hand definitely prefer the half-complex format. My main use case is processing of audio data, doing stuff like convolution. Then I have a real valued signal, and a real valued impulse response that I want to convolute. So I transform them both, multiply the complex results, and transform back. Using the half-complex format means I have half as many coefficients to multiply and store.

I think that most people who know they could use a Real2Complex fft in their application already have a good understanding of how ffts work, and have no problem with the half-complex format (as long as it doesn't use any weird packing tricks of course). But it would make sense to provide some conventient way of filling in the second half of the data when someone needs the full result.

Another thought, if Real2Comples fft returns the full complex result, what should the Complex2Real ifft do? Ignore half of the data? Or assert that it's conjugate symmetric? Neither seems very attractive..

@ejmahler
Copy link
Owner

ejmahler commented Jan 7, 2021

Yeah, the point about convolution makes sense.

@ejmahler
Copy link
Owner

ejmahler commented Jan 7, 2021

Although - an implicit part of the Complex-To-Real, even with the redundant elements omitted, is that the imaginary part of the first and last value is 0. Presumably if you do any sort of convolution involving multiplying with a complex number, your input to C2R is going to have a nonzero imaginary component. How does that affect the output?

@HEnquist
Copy link
Contributor

HEnquist commented Jan 7, 2021

Doing convolution that way only works when both the signals are real. Then they both have zeros in those places of their transforms, and the zeros stay also after element-wise complex multiplication of the two transforms.

@ejmahler
Copy link
Owner

ejmahler commented Jan 7, 2021

Would it ever make sense to do a R2C where the inner FFT is inverse? Or only forward?

Would it ever make sense to do a C2R where the inner FFT is forward? Or only inverse?

@HEnquist
Copy link
Contributor

HEnquist commented Jan 8, 2021

I think only forward R2C and inverse C2R are useful in practice. I have never seen the other two (fftw doesn't have them for example) and I can't think of any reason for ever using them.

@ejmahler ejmahler mentioned this issue Jan 8, 2021
6 tasks
@ejmahler
Copy link
Owner

ejmahler commented Jan 12, 2021

I did some investigation on why it's so hard to find non-even real FFT algorithms:

In order to compute a real FFT, you basically have to apply Mixed Radix, or rader's, or bluestein's. But like, a "Real FFT" version of those algorithms. You have to just derive a version of the algorithm that manually cuts out the wasted work.

Size 2 is a special case, because it takes advantage of the fact that you can compute 2 real FFTs in parallel by computing a single c2c DFT, with one set of real data in the reals, and the other FFT in the imaginaries. You can completely recover the separate outputs of the two FFTs with a O(n) postprocess. So the algorithm in the PR and elsewhere is basically merging a MixedRadix(2, n/2) step (for the 2 parallel FFTs) with the parallel FFT postprocess.

For any non-even number, we'd need mixed radix algorithm, raders, bluesteins, butterflies, planning, the works.

Having to switch back and forth between the real array and the "n / 2 + 1" array seems like a nightmare, but apparently a real-only FFT is trivially convertible to a transform called a "Discrete Hartley Transform", which is real-only, has all the same algorithms, etc, but with the advantage that all the outputs are the same size.

So my current thinking is to make a new planner specifically for real FFTs, and put all this stuff in there. It can have a FFT planner internally if we find that it can speed things up to use an actual FFT, but otherwise, it'll stay off in its own little Discrete Hartley World.

@ejmahler
Copy link
Owner

That's not a final plan obviously, but seeing how many different classes are going to be involved, I think it shouldn't be a part of the main planner.

Given that we're going to almost double the code size, we might even want to split it into a separate crate entirely.

@HEnquist
Copy link
Contributor

This explains a lot, thanks!
I'm guessing a DHT would not be able to directly reuse very much of the code in RustFFT(?). Then it makes more sense to put it in a separate crate, RustDHT perhaps?
Then the question is what to do about the Real FFTs. Another separate crate on top of RustDHT?
This would then be an argument for keeping also the R2C stuff separate from RustFFT.

Building the DHT will be a pretty long term project. What to do in the meanwhile? One easy fix that I think is still better than nothing, is to provide R2C/C2R for odd lengths by just using C2C of the same length. This would at least mean there is a single interface that works for all lengths, even if there is no speed advantage for the odds. I could add this to RealFFT for now, and include the optimizations from #50.

@ejmahler
Copy link
Owner

And even if it's not faster, there would be added convenience of not having to keep track of the redundant part of the complex output

@HEnquist
Copy link
Contributor

Yes the convenience itself makes it worth having it. Are you adding this in RustFFT now-ish? If yes I won't do anything more to RealFFT, just mark it as deprecated once this is done.

@ejmahler
Copy link
Owner

It will be a while, i recommend porting it to realfft

@Boscop
Copy link
Author

Boscop commented Jan 14, 2021

Maybe it makes sense to take inspiration from FFTS? https://github.com/linkotec/ffts
It's faster than FFTW.
The ffts_init_1d_real function does exactly what we need:
https://github.com/linkotec/ffts/blob/2c8da4877588e288ff4cd550f14bec2dc7bf668c/include/ffts.h#L96

@HEnquist
Copy link
Contributor

From what I can tell, it's using the same trick as us: https://github.com/linkotec/ffts/blob/2c8da4877588e288ff4cd550f14bec2dc7bf668c/src/ffts_real.c#L407
I don't see anything that would help with odd length real ffts.

The documentation of FFTS is pretty thin, but I think its main algorithms are only for powers of two. For anything else it uses Bluesteins. If that is true, it might be faster for powers of two, and the (quite few) special cases where the fastest solution is Bluestein. For everything else it will most likely be quite a bit slower.

@HEnquist
Copy link
Contributor

I made a big rewrite of RealFFT: HEnquist/realfft#7
It now behaves a lot more like RustFFT, with a planner that reuses ffts. And whenever it makes sense it uses the same function names as RustFFT. It also includes the optimiztions from #50
Finally it also supports odd lengths, by just using a complex fft of the same length and hiding the conversions.

@ejmahler
Copy link
Owner

I dug into FFTS, and it's even worse: If you pass in an odd size, it crashes on unaligned SSE loads. If you disable SSE, it computes the wrong output.

So yeah, they just assume your input is even without asserting it.

@HEnquist
Copy link
Contributor

Well that explains it. I was confused when I couldn't find any check for if the length is even. I thought I just didn't see it, didn't expect it to be missing...

@Boscop
Copy link
Author

Boscop commented Jan 15, 2021

Who uses odd FFT anyway?

@ejmahler
Copy link
Owner

If you're computing an even-sized FFT that isn't a power of two, then it must have at least some odd prime factors. In that case, the library you're using will need to handle odd sizes internally in order to handle the odd prime factors of your size.

@HEnquist
Copy link
Contributor

The new RealFFT is now published on crates.io as version 1.0.0!

@ejmahler
Copy link
Owner

ejmahler commented Jan 24, 2021

Update: Ever since mentioning the discrete hartley transform, I've been working on deriving a variant of the six-step mixed radix FFT algorithm for the DHT. As far as I know, this algorithm is original research, and it was pretty tough to derive:

// Computes a DHT via the six-step mixed-radix algorithm
fn compute_dht_mixedradix(buffer: &mut [f32], width: usize, height: usize) {
    let len = buffer.len();
    let mut scratch = vec![0.0; len];
    
    assert_eq!(len, width * height);

    // Step 1: Transpose the width x height array to height x width
    transpose::transpose(buffer, &mut scratch, width, height);

    // Step 2: Compute DHTs of size `height` down the rows of our transposed array
    for chunk in scratch.chunks_exact_mut(height) {
        compute_dht(chunk);
    }

    // Step 3: Apply twiddle factors
    for k in 0..height {
        // we need -k % height, but k is unsigned, so do it without actual negatives
        let k_rev = subtract_mod(0, k, height);
        for i in 0..width {
            // -i % radix, but i is unsigned, so do it without actual negatives
            let i_bot = subtract_mod(0, i, width);
            let top_twiddle = compute_dft_twiddle_inverse(i * k, len);
            let bot_twiddle = compute_dft_twiddle_inverse(i_bot * k, len);

            // Instead of just multiplying a single input vlaue with a single complex number like we do in the DFT,
            // we need to combine 4 numbers, determined by mirroring the input number across the horizontal and vertical axes of the array
            let top_fwd = scratch[i*height + k];
            let top_rev = scratch[i*height + k_rev];
            let bot_fwd = scratch[i_bot*height + k];
            let bot_rev = scratch[i_bot*height + k_rev];

            // Since we're overwriting data that our mirrored input values will need whenthey compute their own twiddles,
            // we currently can't apply twiddles inplace. An obvious optimization here is to compute all 4 values at once and write them all out at once.
            // That would cut down on the number of flops by 75%, and would let us do this inplace
            buffer[i*height + k] = 0.5 * (
                  top_fwd * top_twiddle.re
                - top_fwd * top_twiddle.im
                + top_rev * top_twiddle.re
                + top_rev * top_twiddle.im
                + bot_fwd * bot_twiddle.re
                + bot_fwd * bot_twiddle.im
                - bot_rev * bot_twiddle.re
                + bot_rev * bot_twiddle.im
            );
        }
    }

    // Step 4: Transpose the height x width array back to width x height
    transpose::transpose(&buffer, &mut scratch, height, width);

    // Step 5: Compute DHTs of size `width` down the rows of the array
    for chunk in scratch.chunks_exact_mut(width) {
        compute_dht(chunk);
    }

    // Step 6: Transpose the width x height array to height x width one lst time
    transpose::transpose(&scratch, buffer, width, height);
}

Pairing this with some trivial preprocessing or postprocessing, we can use this to implement R2C and C2R:

// Computes a DFT of real-only input by converting the problem to a DHT
fn compute_r2c_via_dht(input: &mut [f32], output: &mut [Complex<f32>]) {
    assert_eq!(output.len(), input.len() / 2 + 1);

    compute_dht(input);

    output[0] = Complex::from(input[0]);

    for (k, output_cell) in output.iter_mut().enumerate().skip(1) {
        *output_cell = Complex {
            re: (input[input.len() - k] + input[k]) * 0.5,
            im: (input[input.len() - k] - input[k]) * 0.5,
        }
    }
}

// Computes a DFT of real-only output by converting the problem to a DHT
fn compute_c2r_via_dht(input: &[Complex<f32>], output: &mut [f32]) {
    assert_eq!(input.len(), output.len() / 2 + 1);

    output[0] = input[0].re;

    for (k, input_cell) in input.iter().enumerate().skip(1) {
        output[k]               = (input_cell.re + input_cell.im);
        output[output.len() - k] = (input_cell.re - input_cell.im);
    }

    compute_dht(output);
}

The hardest part was applying twiddle factors: Instead of just a pairwise multiplication like in the FFT mixed radix algorithm, you have to compute data from 4 different points that are mirrors of each other across the 2D array.

I find this interesting, because it's sort of a reflection of the postprocessing you do to convert a R2C to a DHT. In both cases, you're looping over the front of the array and the back of the array simultaneously, and meeting in the middle.

Except I didn't derive DHT mixed radix in terms of the conversion to a DFT - I derived it entirely in terms of the O(n^2) definition of the DHT. So even though the reflection doesn't show up in the O(n^2) definition of the DHT, that reflection still seems like an inherent part of the transform. The fact that the reversal shows up both in the mixed radix algorithm and the conversion to R2C reminds me of a fractal, where you zoom in and end up seeing copies of the original fractal, and it gives you sense that that shape is very important to the computation of that fractal for some reason.

Fun stuff to think about.

I know variants of rader's algorithm and bluestein's algorithm have already been derived, and i THINK good-thomas has as well, so with those 3 we have everything need to compute every DHT size.

So next steps are, before committing to using the DHT, I also want to see if I can use the Discrete Cosine Transform Type 2 as the internal transform, since that also has a pretty trivial conversion to a DFT. I'm not sure how long that research will take, but it's entirely possible that a pure mixed radix algorithm like this for the DCT isn't a good idea, in which case it may become clear pretty quickly.

After that, I'll start a new crate called RustDHT and start building it. Finally, I'll make a crate called "RustFFT-Real" which depends on RustDHT and encapsulates it all, so that the DHT is just an implementation detail of the real-valued FFT, rather than an API-guaranteed part of the process.

Finally, once RustFFT-Real is released, I can convert RustDCT to use RustDHT instead of RustFFT. Long term, I'm considering making RustFFT a wrapper crate, that pulls in RustFFT-Complex, RustFFT-Real, RustDHT, RustDCT, the same way that num does with num-complex, num-traits etc. So that the most convenient thing is to just depend on rustFFT, but someone who wants to improve compile times can depend on only what they need.

@HEnquist
Copy link
Contributor

This is really good work!
I think the num-approach is s very good idea. In the end RustFFT-Real will have many options to choose from for the inner transform. What is faster, a wrapped avx-accelerated complex fft, or something dht-based? And what if we don't have avx? That will be fun to sort out. I'm guessing that a fixed logic planner will become too complicated when all possibilities are included. An estimating one is probably easier to get right.

@ejmahler
Copy link
Owner

Update: i've been chugging away at this over the last few weeks. I have a repo for rustDHT, although it isn't published to crates.io yet https://github.com/ejmahler/RustDHT

There's no planner or bluestein's or raders equivalent yet, I've been focusing on getting the performance for small prime factors up to where it should be.

Prime butterflies can be computed with more or less the exact same algorithm as they are on a FFT, with a manual factorization of the naive DHT matrix. Radix4 works just like RustFFT's radix4, although the twiddle factor function is more complicated.

MixedRadix in RustDHT ends up being almost the exact same speed as MixedRadix in RustFFT. My theory for this is because in the transposes, the compiler can copy both elements of a complex number with the same instruction, so it can optimize away half the instructions involved in a transpose. But you can't optimize away half the instructions of a real-valued transpose, so the real one ends up being the same instruction count as the complex one -- and thus, it takes the same amount of time.

So I'm designing the library with the notion that we won't be doing MixedRadix, ever. That also rules out PFA, because the main value of PFA is that it's mixed radix without the twiddle factors.

@jesnor
Copy link

jesnor commented Mar 1, 2021

Agreed that these would be great additions to RustFFT. I'm also using FFT for audio processing (a VST plugin) which uses 1D real-to-complex and complex-to-real transforms. I did some simple benchmarks and RustFFT c2c is about 1.5x-2x slower than FFTW r2c and c2r for 4096 single precision values, for double precision the difference is even bigger, RustFFT is about 3x slower.

For now I'm using RealFFT which is as fast as FFTW for single precision, and even faster for double precision, so big thanks for implementing that!

( All benchmarks run on AVX2 CPU of course)

@Boscop
Copy link
Author

Boscop commented May 5, 2021

Btw @jesnor, I just published my bindings to FFTS (Fastest Fourier Transform in the South, which is faster than FFTW) here:
https://github.com/Boscop/ffts-rs
It is very fast and computing a FFT doesn't allocate: https://github.com/Boscop/ffts-rs/blob/master/src/lib.rs#L59
(Note that the ffts crate on crates-io is not my crate, and that one allocates a Vec for each FFT computation: https://docs.rs/ffts/0.1.3/ffts/struct.FFTSPlan.html#method.execute
I had created my ffts-rs crate in 2016 but hadn't published it, so someone else also wrote bindings and published them to crates-io, but I always use my bindings because they don't allocate.
(Also the one on crates-io uses https://github.com/anthonix/ffts whereas I use https://github.com/linkotec/ffts which is usually ahead of https://github.com/anthonix/ffts and supports MSVC.)

@ejmahler
Copy link
Owner

ejmahler commented May 6, 2021

How does ffts real-only FFT compared to the RealFFT crate?

@HEnquist
Copy link
Contributor

HEnquist commented May 6, 2021

I also wonder about non-power-of-twos in FFTS. Since it uses Bluesteins for that, I would expect it to be far slower at that than both FFTW and RustFFT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants