Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzz testing #110

Open
Shnatsel opened this issue Jan 29, 2023 · 12 comments
Open

Fuzz testing #110

Shnatsel opened this issue Jan 29, 2023 · 12 comments

Comments

@Shnatsel
Copy link

Since rustfft uses unsafe code for SIMD, it would be great to fuzz-test it with various inputs.

A good introduction to fuzz testing in Rust can be found here: https://rust-fuzz.github.io/book/

@Shnatsel
Copy link
Author

I can make a skeleton implementation of this, but since I'm familiar with fuzzing but not FFT, I will likely need help from someone who understands FFT.

The motivation is using rustfft in Symphonia, which is exposed to untrusted input and so must be resilient against maliciously chosen FFT parameters.

@WalterSmuts
Copy link
Contributor

I'm happy to be FFT POC. Some initial thoughts are:

  • FFT is a pure function and should always return the same output for the same input
  • All combinations of input values should produce distinct output values.
  • The FFT operations and IFFT operations are inverses. One fuzz test could be to assert this fact. Keep in mind the constant factor that rustfft introduces.
  • There are other fft libraries that you can use to compare outputs.
  • Floating points numbers result in slight differences depending on the exact order of operations. Be wary of floating point equality checks.

@Shnatsel
Copy link
Author

  • The FFT operations and IFFT operations are inverses. One fuzz test could be to assert this fact. Keep in mind the constant factor that rustfft introduces.
  • There are other fft libraries that you can use to compare outputs.

That is indeed very useful! Comparing outputs and roundtripping data are both very useful fuzz testing techniques.

Symphonia uses rustfft in a rather limited way, so I'll start there, and expand on that if I have the time.

@HEnquist
Copy link
Contributor

There isn't really any need of bringing in another fft library. RustFFT has a built-in reference implementation here: https://github.com/ejmahler/RustFFT/blob/master/src/algorithm/dft.rs
It's basically just the DFT definition. That makes it quite slow, but on the other hand it's so simple it can hardly fail.

I would focus on testing with "strange" input. There are already good tests for normal float values (using random values), but nothing that tries to use the special values like NaN, positive/negative infinity, and subnormals. There are also no tests that passes input, output or scratch of the wrong length.

@Shnatsel
Copy link
Author

I've added a basic fuzzing harness for Symphonia's FFT wrapper around rustfft: pdeljanov/Symphonia#182

It's just cargo fuzz init plus filling in a small template.

Keep in mind the constant factor that rustfft introduces.

How can I account for it, in code? I only know the general principles of FFT, and haven't implemented it myself.

@ejmahler
Copy link
Owner

ejmahler commented Jan 30, 2023

How can I account for it, in code? I only know the general principles of FFT, and haven't implemented it myself.

The FFT is 100% invertible, but with a wrinkle: There's a scaling factor in the naive algorithm that some implementations add extra code to remove from the output, and some don't. rustFFT doesn't.

Thus, after every FFT pass, an extra scaling factor is introduced into the output, equal to sqrt(n). So if you do a forward, followed by an inverse, it scales each element by sqrt(n) * sqrt(n) = n.

So if you're comparing RustFFT's output of a single pass with the output of another library, you may need to multiply each element of RustFFT's output by 1/sqrt(n), depending on how the other library is implemented. And if you're checking invertibility by verifying that you get the same output when doing a forward followed by an inverse, you need to multiply each element of the final output by 1/n before doing the comparison.

@Shnatsel
Copy link
Author

I've opened #111 with an initial stab at roundtrip fuzzing.

@Shnatsel
Copy link
Author

Shnatsel commented Feb 3, 2023

Okay, so roundtrip fuzzing isn't working out due to numeric precision issues with floats.

I see that the exact algorithm implementation chosen depends on both the selected instruction set and the algorithm choice. Except AVX is documented to be a black box and you can't currently select an algorithm?

I was hoping to make a fuzz target for every combination of algorithm and instruction set, but the AVX algorithm choice being a black box gets in the way. Using the regular planner sounds like a bad option because that way some inputs won't be covered for some algorithms.

Is the restriction on AVX being a black box still relevant? If so, perhaps we could work around it by exposing some structures that you don't want to commit to in the public API yet only under #[cfg(fuzzing)]?

@Shnatsel
Copy link
Author

Shnatsel commented Feb 3, 2023

I was hoping to make a fuzz target for every combination of algorithm and instruction set,

Actually scratch that. What we should probably do is run all of them and compare the outputs.

Do any of the algorithms have any constraints on the kind of input they can reasonably handle? I don't want to run into precision issues again like I did with roundtrips.

@Shnatsel
Copy link
Author

Shnatsel commented Feb 4, 2023

Also I see that some scalar algorithms accept width and height parameters and I'm not sure how to derive those in the fuzzer

@HEnquist
Copy link
Contributor

HEnquist commented Feb 4, 2023

I was hoping to make a fuzz target for every combination of algorithm and instruction set,

Many algorithms use other ffts internally. These are normally selected by the planner. If you bypass the planner, you need to also select algorithm for the inner ffts (which may also use inner ffts and so on). The result is a very (very!) large number of possible combination!

Do any of the algorithms have any constraints on the kind of input they can reasonably handle? I don't want to run into precision issues again like I did with roundtrips.

Most of the FFT algorithms have pretty strict rules for what lengths they can handle. Radix4 only does powers of two for example. The numerical precision should be quite similar, but I would expect the ones that go through more calculation steps (such as Bluesteins and Raders) to produce a bit more noise than say Radix4.

Also I see that some scalar algorithms accept width and height parameters and I'm not sure how to derive those in the fuzzer

These are the sizes of the inner ffts. The planner derives them by factoring the length into primes, then combining them back together to produce two factors of that are reasonably similar. For example a mixed radix with width 6 and height 7 use inner ffts of length 6 and 7 to perform a length 42 FFT. Here 3x14 would also work, but this would likely be slower. If you don't use the planner, you also need to select the algorithms for the two inners. The length 7 could then be a butterfly, or either Bluesteins or Raders. The 6 could be a butterfly or a 3x2 mixed radix. With longer lengths, the number of options grows quicky.

@Shnatsel
Copy link
Author

Shnatsel commented Feb 4, 2023

I've added the initial fuzzing harness in #112. It's grouped by instruction sets and uses the planners, which should cover the typical usage scenario.

If you bypass the planner, you need to also select algorithm for the inner ffts (which may also use inner ffts and so on). The result is a very (very!) large number of possible combination!

Fuzzers are reasonably good at dealing with combinatorial explosions, so I suppose a good way to test this thoroughly is to let the fuzzer choose the parameters. I've written a test harness that auto-selects parameters here, you can see how it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants