Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: add fuzz test support #44551

Closed
katiehockman opened this issue Feb 23, 2021 · 54 comments
Closed

testing: add fuzz test support #44551

katiehockman opened this issue Feb 23, 2021 · 54 comments

Comments

@katiehockman
Copy link
Contributor

katiehockman commented Feb 23, 2021

This proposal is to add fuzz test support to Go. This will add a new testing.F type, support FuzzFoo functions in _test.go files, and add new go command behavior.

A design draft has already been published and iterated on based on feedback from the Go community. This is the next step to propose that this design draft become a language feature.

This feature will be considered experimental in Go 1.18, and the API will not be covered by the Go 1 compatibility promise yet. The functionality that goes into this release is expected to have bugs and be missing features, but should serve as a proof-of-concept for which Go developers can experiment and provide feedback. Since this will be an experimental feature to start, we also expect there will be room for growth for the mutator and fuzzing engine in future Go releases.

Below are the parts of the design draft which will not make it into 1.18, and will be considered later on:

  • support for fuzzing with -race and -msan
  • support for fuzzing with -keepfuzzing
  • deduplication of similar crashes caused by different mutations, which would be a prerequisite to implementing -keepfuzzing (to reduce noise)
  • allowing special options while fuzzing (e.g. maximum input size)
  • dictionary support
  • customizable coverage instrumentation while fuzzing (e.g. to only instrument certain packages or files)
  • custom generators for the mutator
  • structured fuzzing support for struct and non-primitive types
  • [Stretch goal for 1.18] structured fuzzing support for primitive types other than []byte (e.g. string, int, float64)

Edit: This previously said 1.17, but it was not merged to master to make it into 1.17, so this has been updated to say "1.18" instead.

@katiehockman katiehockman added Proposal fuzz Issues related to native fuzzing support labels Feb 23, 2021
@katiehockman katiehockman added this to the Proposal milestone Feb 23, 2021
@katiehockman katiehockman self-assigned this Feb 23, 2021
@robpike
Copy link
Contributor

robpike commented Feb 23, 2021

How can we keep the fuzzing corpora small, and the fuzzing process efficient and on point? Although fuzzing is good at finding certain classes of bugs, it is very expensive in CPU and storage and cost/benefit ratio remains unclear. I worry about wasting energy and filling up git repos with testdata noise. Is there a way to make the testing focused and result-driven? Also, will we wake up one day and find fuzzing is a mandatory part of all builds? That would be terrifying.

Apologies if I'm late to the party; I did not comment on the original proposal.

@FiloSottile
Copy link
Contributor

FiloSottile commented Feb 23, 2021

How can we keep the fuzzing corpora small, and the fuzzing process efficient and on point? Although fuzzing is good at finding certain classes of bugs, it is very expensive in CPU and storage and cost/benefit ratio remains unclear.

The track record of go-fuzz provides pretty amazing evidence that fuzzing is good at finding bugs that humans had not found. In my experience, just a few CPU minutes of fuzzing can be extremely effective at the margin. There aren't a lot of other ways to turn fixed amounts of CPUs (as opposed to dynamic, runtime checks) into bugs caught.

I worry about wasting energy and filling up git repos with testdata noise. Is there a way to make the testing focused and result-driven?

Yeah, filling repositories with random-looking stuff is definitely undesirable. To be clear, in this design the mutator-generated corpus is not in testdata, we agree that it would be way too noisy to check in.

Samples are only placed in testdata when they resulted in a test failure, and then it's up to the developer to check them in as a regression test, or turn them into a more full-fledged unit test.

Also, will we wake up one day and find fuzzing is a mandatory part of all builds? That would be terrifying.

I'm not sure what you mean with being a part of all builds. If you mean running as part of go build, definitely not.

The expensive, open-ended part of fuzzing won't even be a default part of go test: without the -fuzz flag only seed and testdata corpora will be tested, and the mutator won't run. (That would make tests non-deterministic, aside from resources concerns.)

@robpike
Copy link
Contributor

robpike commented Feb 24, 2021

Thanks, Filo. Your point about how we store stuff is important, and I missed that.

I understand the value of fuzzing, and my grumpy old man concerns are not an attempt to reduce that. (I will say that some of the bugs it finds are not worth fixing, and certainly not the cost of finding, although it's very hard to make that case.) But I have three general concerns:

  1. A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

  2. The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.

  3. Its adoption as part of the process. This is where the proposal to incorporate it needs attention. The technical process and the culture must make it clear that (1) and (2) mean that it cannot be part of the standard build process.

Go build will surely not run it always, but organizations will. Being automated, companies will require it, just as they require zero lint errors even as we protest that lint is not perfect.

I am just expressing caution, care, and an eye on the big picture here. I am not saying we shouldn't do this, just that we should do it with all considerations in mind.

@everestmz
Copy link

This is awesome - really excited to see native Go fuzzing in the proposal stage now.

Hopefully it's not too early to make a suggestion for the actual implementation of this, but it would be super useful if the testing.F type was an interface, rather than a struct like testing.T/B are.

I think fuzzing is in sort of a unique spot among the supported testing types in that there's no one correct way to fuzz test your code. Having the ability to build a type that implements the testing.F interface would make it simple to swap out fuzzing backends, enabling the community to take advantage of all the awesome open source work out there by building integrations to things like AFL++.

There's already been interest in something like this from the original Reddit thread mentioned on the proposal document (like this comment here), and I think it would be a missed opportunity to not allow developers to hook these functions into property-based testing tools (like Gopter) to get some minimal testing done at the go test stage, while allowing the same test to later be fuzzed to discover deeper bugs.

There are probably details I've missed here and I'd like to hear some other opinions on this, but given all the potential use cases I think it's worth some discussion.

@FiloSottile
Copy link
Contributor

  1. A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

Agreed, I like to say that fuzzing finds bugs at the margin. It's why we are interested in it as the Security team: bugs caught at the margin are ones that don't make it into production to become vulnerabilities. Just like it finds bugs that are often not found in other ways, it can't find bugs that other ways can.

We'll do our best to communicate to developers in the coming months about what fuzzing can (and can't) do for them, and we'd like to hear it if that content doesn't push in the right direction.

  1. The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.
  2. Its adoption as part of the process. This is where the proposal to incorporate it needs attention. The technical process and the culture must make it clear that (1) and (2) mean that it cannot be part of the standard build process.

Indeed, this is the hardest part: creating a culture in the ecosystem that makes the practice successful. We want fuzzing to be part of the development—not build or security—process: make a change to the relevant code, run go test -fuzz with the locally cached corpus that can already hit most code branches, find a bug quickly, debug-fix-verify with go test, turn the crasher into a unit test or check it in as a regression test.

There is probably also space for fuzzing in CI, like OSS-Fuzz demonstrated and so that the corpus can be centralized, but since it's non-deterministic it can't meaningfully block a build. Hopefully the CLI choices, like not having a fixed fuzzing time, nudge in that direction.

Likewise, I think writing will be the best tool we have to shape usage, and welcome feedback on future drafts!

@bcmills
Copy link
Contributor

bcmills commented Feb 24, 2021

@everestmz

it would be super useful if the testing.F type was an interface, rather than a struct like testing.T/B are.

Why would such an interface type need to be defined in the testing package itself? Keeping testing.F a concrete type allows for future methods to be added and documented, and packages that want to abstract over different F-like implementations can already defined their own interface for the specific methods they need.

@JAicewizard
Copy link

turn the crasher into a unit test or check it in as a regression test.

Absolute wild wild dream, but... wouldn't it be amazing if there would be an option to transform bugs found with fuzzing into unit test cases automatically? It could be as simple as copying the function, give is a unique name, and replace testData := {source of random data} with testData := {the data that makes the test fail}.

This would make it very clear that fuzzing is generating data that make your test fail, and in itself not a way to test your code. Its a random edg-case generator, not a validator.

@bcmills
Copy link
Contributor

bcmills commented Feb 24, 2021

The flow of control in the design doc seems a bit off to me. The Fuzz method in particular carries a lot of type-state invariants:

  • “Only one call to Fuzz is allowed per fuzz target, and any subsequent calls will panic.”
  • and, Add “cannot be invoked after or within the Fuzz function.”

Making Fuzz a method seems to invite user code to try to perform setup before the call to Fuzz and/or cleanup afterward. (Is that the intent?)

But then I wonder if there is a cleaner alternative, especially given the (*T).Cleanup methods added in #32111. The restriction to a single Fuzz invocation per test makes me think that it functions more like a constructor-function than a method call on an existing object. Would it make sense to make it a constructor method on a *T, returning an explicit object on which Add could then be called? That would make it clearer that the fuzz function should not call Add, since the user would have to go out of their way to even have the object with the Add method in scope at that point.

I'm imagining something like:

package testing

func (*T) NewFuzzer(ff interface{}) *F
func (*F) Add(args interface{})

That would make the example in the doc more like:

func TestFuzzMarshalFoo(t *testing.T) {
	f := t.NewFuzzer(func(t *testing.T, a string, num *big.Int) {
		t.Parallel() // seed corpus tests can run in parallel
		if num.Sign() <= 0 {
			t.Skip() // only test positive numbers
		}
		val, err := MarshalFoo(a, num)
		if err != nil {
			t.Skip()
		}
		if val == nil {
			t.Fatal("MarshalFoo: val == nil, err == nil")
		}
		a2, num2, err := UnmarshalFoo(val)
		if err != nil {
			t.Fatalf("failed to unmarshal valid Foo: %v", err)
		}
		if a2 == nil || num2 == nil {
			t.Error("UnmarshalFoo: a==nil, num==nil, err==nil")
		}
		if a2 != a || !num2.Equal(num) {
			t.Error("UnmarshalFoo does not match the provided input")
		}
	})

	// Seed the initial corpus
	f.Add("cat", big.NewInt(1341))
	f.Add("!mouse", big.NewInt(0))
}

If the -fuzz flag is set, the additional fuzzing would occur after TestFuzzMarshalFoo returns but before its Cleanup callbacks are executed.

@katiehockman
Copy link
Contributor Author

katiehockman commented Feb 24, 2021

@JAicewizard

wouldn't it be amazing if there would be an option to transform bugs found with fuzzing into unit test cases automatically?

There were discussions about code generation in some of the first designs, but it was eventually decided that it was a complexity without much gain, especially since we can gain the same benefits in a simpler way. With this design, new crashes are put into testdata and will automatically run as a regression test on the next go test run (even when -fuzz isn't used).

And if someone would prefer to put this crash into the code instead of using the one in testdata, they have a couple options:

  1. They can use f.Add with the existing fuzz target (which works basically the same way as it would if it were in testdata, its just in the code instead)
  2. They can copy-paste the Fuzz target and make slight alterations to the code to make it a unit test. We intentionally designed fuzz targets to look very similar to unit tests, which allows fuzz targets to be easily translated into unit tests, but also allows unit tests to be easily translated into fuzz targets.

@josharian
Copy link
Contributor

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions. It seems to me that crashes need to be reified in regular tests.

That said, a trained corpus (not just crashes) can make fuzzing much more effective at finding new issues. I understand not wanting to pollute a repo with a corpus, but I think we should design an intended way for people to keep and share corpora. I have used git submodules in the past reasonably hapiily, but that's git-specific (and submodules are generally frustrating to work with). I'm not sure what the answer is here, but I think at least trying for an answer is an important part of the fuzzing big picture.

@dbrumley
Copy link

A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

I think part of the attraction is not just it finds bugs, but also how it integrates with developer workflow. The nice thing about fuzzing is that each bug has a witness input. That input, and the ability to debug with your own tools, makes fuzzing-as-a-process quite nice.

That being said, definitely no silver bullets here.

The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.

This also relates to the proposal "This type of testing works best when able to run more mutations quickly, rather than fewer mutations intelligently.".

AFL/libfuzzer really focus on iteration speed, but there are other approaches like symbolic execution which are slow, but generate new coverage (more or less) by definition on each iteration. Less fast million monkeys; more like applied model checking using heavy-weight SMT/SAT solvers. Most research, and some products, pair the two together. I know for a fact Mayhem and Mechanical Phish did this in the cyber grand challenge, for instance.

The great thing about the proposal is the interface and build process is standardized. I view this proposal as relevant beyond fuzzing to the broader set of dynamic analysis tools.

@everestmz
Copy link

@bcmills

Why would such an interface type need to be defined in the testing package itself?

Putting the interface directly into the testing package gives any codebase that uses the default Go fuzzing tooling significant flexibility when it comes to changing up the testing methodology.

In the world of C++ fuzzing, the fuzz interface has been de-facto standardized on the LLVMFuzzerTestOneInput function, originally created by libFuzzer. Because this is just a function that expects a value, every other C++ fuzzing tool works with the same interface, allowing users to take a codebase full of fuzz functions and use any of the available open-source tools with no code changes.

This ability has been put to great use by tools like OSS-fuzz/clusterfuzz, which allow users to write one fuzz test and have it attacked by 3 different fuzzing engines without any more developer effort.

We constantly see evidence (like this example) of new fuzzing techniques uncovering bugs in code that had previously been thoroughly fuzzed. Making testing.F a concrete type locks down any method that uses it to the standard Go fuzzing tooling, and trying out a different tool would require significant developer effort on any codebase with a healthy collection of fuzz tests.

It would feel like an oversight to hamstring Go fuzzing right out of the gate by limiting available testing methodologies to the default - the fuzzing community consistently develops more and more effective techniques and algorithms, and it would be a shame to not take advantage of that.

Keeping testing.F a concrete type allows for future methods to be added and documented

I might be misunderstanding you here, but how would turning testing.F into an interface preclude adding more functions in the future?

@rsc
Copy link
Contributor

rsc commented Feb 24, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@katiehockman
Copy link
Contributor Author

@everestmz

I might be misunderstanding you here, but how would turning testing.F into an interface preclude adding more functions in the future?

I think what @bcmills is saying here is that if we define testing.F as an interface, the interface is basically locked as soon as it's no longer experimental. Otherwise, if we added a new function, then any types that don't update will no longer implement the interface, and will break backwards compatibility. If testing.F is a concrete type, that's not a concern.

I am almost positive that we will come up with more ways to extend the testing.F type in the future, e.g. for providing options while fuzzing (which is still a future consideration). Making it an interface greatly limits flexibility for growth.

It would feel like an oversight to hamstring Go fuzzing right out of the gate by limiting available testing methodologies to the default - the fuzzing community consistently develops more and more effective techniques and algorithms, and it would be a shame to not take advantage of that.

This is a fair point. What do you think about supporting different modes with go build to allow running with different fuzzing engines. For example, go build -libFuzzer which could instrument a go binary with what is needed for running with libFuzzer, and provide a binary that can be run directly with libFuzzer. I realize that is still somewhat limiting, as we would need to provide different modes for the instrumentations that are supported, but it's something that may be prove useful.

@katiehockman
Copy link
Contributor Author

@bcmills
re: func (*T) NewFuzzer(ff interface{}) *F

I can see your point that it has some type-state invariants. However, I'm not as concerned by that. A few things:

The initial designs looked like this: f.Fuzz(func(f *testing.F, b []byte) {...}). A big reason why we decided to change it to f.Fuzz(func(t *testing.T, b []byte) {...}) was in an attempt to make the scope of the fuzz function clearer. Because the fuzz function takes a *testing.T type, it should be clearer that it's basically a standalone unit test, and that you shouldn't be calling *testing.F functions within it (since you have a *testing.T)

I also think that if someone calls f.Add inside of f.Fuzz, then they are confused by what f.Add is doing. I believe that if we document these functions really well, that it should be apparent that seeding the corpus while fuzzing doesn't really make sense, but instead is something you do beforehand.

I also think go vet should be able to help with this.

The main reason I like f.Fuzz instead of f.NewFuzzer is that f.Fuzz is clearly an action. f.Fuzz says "Now start fuzzing this function that I've provided". I find it a bit more confusing that your proposition could call f.NewFuzzer, creating a testing.F type, then do nothing else on it. It just has to be assumed that it will start fuzzing at the end of the function, which isn't obvious to the reader. It also requires that you must define the function that is being fuzzed before you can seed the corpus, even though the actual code would first seed the corpus, then run the function. Basically requiring that the "pre-work" must be written after the actual fuzz function, which feels awkward.

Making Fuzz a method seems to invite user code to try to perform setup before the call to Fuzz and/or cleanup afterward. (Is that the intent?)

Yes and no. Pre-work should be done before calling f.Fuzz (ie. before fuzzing). But if there is cleanup work that needs to be done, then they should call f.Cleanup before f.Fuzz, or t.Cleanup inside the fuzz function, depending on what they want to happen. For example, someone could do this:

func FuzzFoo(f *testing.F) {
	f.Add([]byte{0})
	f.Cleanup(func() { f.Log("done!") })
	f.Fuzz(func(t *testing.T, b []byte) {
		t.Cleanup(func() { t.Logf("inner") })
	})
}

@zeebo
Copy link
Contributor

zeebo commented Feb 25, 2021

It doesn't seem necessary to me to have testing.F be an interface to support other implementations because the concrete type does not contain or expose any details about how the fuzzing is happening. For example, it seems to me like the testing package could grow some API that is expected to be called in testing.Main (or something) that can control exactly how all of the testing.F values will later behave.

In other words, it doesn't seem necessary to do the virtual dispatch on the Add and Fuzz methods directly as long as the concrete type internally is able to do the virtual dispatch instead.

@bcmills
Copy link
Contributor

bcmills commented Feb 25, 2021

This feature will be considered experimental in Go 1.17, and the API will not be covered by the Go 1 compatibility promise yet.

In that case, could we gate it behind a build tag or GOEXPERIMENT setting?
(See also the (lengthy!) discussion in #34409.)

@cristian-ilies-vasile
Copy link

@FiloSottile
"We'll do our best to communicate to developers in the coming months about what fuzzing can (and can't) do for them"

Well, I recommend the on line great book: The Fuzzing Book (Tools and Techniques for Generating Software Tests)
https://www.fuzzingbook.org/

@katiehockman
Copy link
Contributor Author

katiehockman commented Feb 25, 2021

@bcmills

In that case, could we gate it behind a build tag or GOEXPERIMENT setting?
(See also the (lengthy!) discussion in #34409.)

@jayconrod and I discussed this, and ended up deciding that there wasn't much to gain from gating it. Writing a fuzz target is already opt-in, since a developer would need to choose to write a target into their own source code.
And since these targets are in _test.go files, even if a module you depend on decides to add a fuzz target to one of their packages, that wouldn't be included in the build or impact your testing, unless you did something like go test ./....

It's not out of the question if we feel that not gating the feature could cause problems, but we couldn't come up with a compelling need for it. LMK if we are missing something here.

@jayconrod
Copy link
Contributor

About testing.F being an interface or not:

In our experimental implementation, testing.F doesn't have much logic on its own. It serves as an entry point into the internal/fuzz package, which contains most of the logic. We're trying to keep internal/fuzz only loosely coupled with testing. In the future, we may want to "promote" that package out of internal/, making its API available for uses outside of testing.

@bcmills
Copy link
Contributor

bcmills commented Feb 25, 2021

@katiehockman, ideally a Go user ought to be able to run go test all and get back a meaningful result about the correctness and consistency of the transitive dependencies of their module. (That is especially important when moving to a new, untested platform, or when upgrading one or more dependencies in order to check for accidental regressions or accidental reliance on undocumented behavior.)

If the fuzzing API is stable as of Go 1.17, then users who opt in to fuzzing can guard some of their _test.go files with //go:build go1.17, and not break go test all for their users when they eventually upgrade to (say) Go 1.18 or Go 1.19. However, if the fuzzing API changes between Go 1.17 and Go 1.18, then go test as of Go 1.18 may be completely unable to even build the test for the package.

So I would prefer that we put the unstable parts of the API behind an explicit build tag — perhaps go117fuzz or unstable — so that early adopters won't break go test all for their downstream consumers in a future release.

@katiehockman
Copy link
Contributor Author

katiehockman commented Mar 1, 2021

@bcmills That's a good point about go test all. Let's plan to add a build tag then. go117fuzz sounds good, since it's more specific.

@rsc
Copy link
Contributor

rsc commented Apr 1, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@karupanerura
Copy link

What is the difference between this proposal and testing/quick?

@dchapes
Copy link

dchapes commented May 20, 2021

What is the difference between this proposal and testing/quick?

I expect you've never actually heard of or looked up the term fuzzing/fuzz-testing; you should start by reading something like: https://en.wikipedia.org/wiki/Fuzz_testing

@katiehockman
Copy link
Contributor Author

The code has been merged to master, and will be landing in Go 1.18. 🎉

@leonklingele
Copy link
Contributor

leonklingele commented Dec 15, 2021

How to continue fuzzing after a crasher was found? The call to t.Errorf shouldn't stop the fuzzing process.

package main_test

import (
	"errors"
	"testing"
)

func Double(i int) (int, error) {
	if i == 42 {
		return 0, errors.New("42")
	}

	return i * 2, nil
}

func FuzzDouble(f *testing.F) {
	f.Add(1)

	f.Fuzz(func(t *testing.T, i int) {
		_, err := Double(i)
		if err != nil {
			t.Errorf("failed to double %d: %v", i, err)
		}
	})
}
$ go version
go version devel go1.18-becaeea119 Wed Dec 15 12:23:36 2021 +0100 darwin/amd64

$ go test -v -fuzz .
=== FUZZ  FuzzDouble
fuzz: elapsed: 0s, gathering baseline coverage: 0/1 completed
fuzz: elapsed: 0s, gathering baseline coverage: 1/1 completed, now fuzzing with 8 workers
fuzz: elapsed: 2s, execs: 42 (28/sec), new interesting: 0 (total: 0)
--- FAIL: FuzzDouble (1.95s)
    --- FAIL: FuzzDouble (0.00s)
        main_test.go:22: failed to double 42: 42

    Failing input written to testdata/fuzz/FuzzDouble/901ae842a79a0767746cec1401e69655708099ce968b7f89b4dea9ed92a85603
    To re-run:
    go test -run=FuzzDouble/901ae842a79a0767746cec1401e69655708099ce968b7f89b4dea9ed92a85603
FAIL
exit status 1
FAIL	fuzztest	1.955s

@jmank88
Copy link

jmank88 commented Dec 15, 2021

-keepfuzzing
Keep running the fuzz test if a crasher is found. (default false)

@leonklingele
Copy link
Contributor

Well, that option unfortunately doesn't exist in becaeea119 / go1.18beta1 :/

@leonklingele
Copy link
Contributor

#48127 mentions what happened to that option.

@katiehockman
Copy link
Contributor Author

@leonklingele this issue has been closed, so please file a new issue if you experience any problems. For questions, I encourage either filing an issue (e.g. to ask for additional documentation), or ask on the #fuzzing channel in the Gophers slack.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests