-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/fuzzdata: new repository for fuzzing corpus data #31215
Comments
LGTM. I suspect bringing the full oss-fuzz corpus back into the repo might bloat it too much, but we definitely need a place for seed corpus, so this sounds good. |
Related issue: #14304. |
Oh, how about x/fuzzdata? x/corpus does not imply fuzzing to everyone I suspect. (Bikeshed!) |
will the repository used only for fuzzing? |
For reasons of clarity, I personally like |
I was also going to suggest |
That is the current intent of the suggestion in this issue. |
If it's only for seed corpus, then seed corpus may belong to the main repo better. We do want to use it in unit tests too! See dvyukov/go-fuzz#218 (comment) for more explanation. |
To clarify, I mean only seed corpus, i.e. small number of high-quality inputs without significant churn (no more than what happens with code). For a fuzzer-generated corpus we do need a separate repo. |
Part of what is tricky is there are really 2 things running in parallel, I think:
For 2., it seems important that golang.org/x/fuzzdata gets created so that the stdlib and x subrepos can be a pilot of a decent size project that stores a decent sized corpus with some churn in a repo... and I think that is important even under the split corpus idea in dvyukov/go-fuzz#218 (comment), which I commented on there yesterday (probably too enthusiastically, sorry). |
Hi @katiehockman, I had a question for you about the generated corpus under the current draft design for first class fuzzing, which in turn might have some implications for what to do this with issue. The draft includes:
and also:
It makes sense that the details of what is inside the generated corpus can effectively be opaque to ordinary users, but the generated corpus itself is fairly valuable, and often worthy of keeping around & building up over time. Given that generated corpus is valuable, it can be fairly useful to allow more direct control of the location of the generated corpus (to allow for easier storage in a separate repo, or cloud storage, or a shared filesystem, or easier integration with the diversity of CI systems in use, or purpose built fuzzing systems like OSS-Fuzz) The prior first class fuzzing proposal went through a couple iterations on corpus behavior, but the behavior in the fzgo prototype of that proposal ultimately defaulted to reading from testdata (e.g., for a seed corpus), wrote to GOPATH/pkg/fuzz by default as a local cache for the generated corpus, but also allowed more direct control of the generated corpus via an optional GOCACHE certainly can be made to work in different scenarios, but what are your thoughts around a more convenient flag or fuzz-specific env variable to allow more direct control of the location of the generated corpus? Thanks, and exciting to see the progress you have been making! 🚀 |
I’d like to second the notion that corpuses are valuable and should be (able to be) retained and shared. It’s not just computational resources to generate them. Some corpuses I’ve developed alongside the code, so they contain good coverage already for alternative implementations. And other corpuses are seeded manually, which can make a dramatic difference to fuzzing effectiveness. |
Agreed. Even though it defaults to
But to your point about making this customizable, I've gone ahead and added that to the open issues section of the design draft. It's something that we'll likely be considering in the future, it is just unlikely to make it into the first experimental release: https://go.googlesource.com/proposal/+/master/design/draft-fuzzing.md |
Summary
In #30719 and #30979,
dvyukov/go-fuzz
compatible fuzz functions were landed in:golang.org/x
in golang.org/x/image/tiff/fuzz.goThe follow-up item in this issue here is to add a new
golang.org/x/corpus
repository to hold an example fuzzing corpus for portions of the standard library and x subrepos. This is to help with the exploration requested by the core Go team in discussion of the #19109 proposal to "make fuzzing a first class citizen".The name for the new corpus repo alternatively could be
golang.org/x/fuzz
or something else; some additional naming discussion below.Note that this issue is at least currently intended to be solely about creating the repository itself, and this issue does not cover checking in any corresponding fuzzing corpus (which is likely to be a follow-up issue).
Background
See the "Background" section of #30719 or #19109 (comment).
Additional Details
As part of the #19109 proposal discussion, there were multiple comments/requests from the core Go team asking to develop a better understanding of how a fuzzing corpus looks and behaves when it resides in a repository. For example, this March 2017 request in #19109 (comment) from Russ:
In the March 2017 proposal document in #19109 (comment), @dvyukov proposed:
golang.org/x/fuzz
seems to be a very reasonable name. Two alternative possible names:golang.org/x/corpus
golang.org/x/fuzzcorpus
Personally, I think
x/corpus
orx/fuzzcorpus
might be more evocative thanx/fuzz
(e.g., someone might incorrectly thinkx/fuzz
is where the fuzzing implementation or fuzzing functions live), but I do not have a strong opinion on the name and suspect others will have stronger opinions. Until there is additional feedback on the name, the rest of this comment here will use the termgolang.org/x/corpus
.Populating the corpus
The initial seeding of the corpus can likely come from https://github.com/dvyukov/go-fuzz-corpus for a given Fuzz function. If that happens, then right now, there would be two corpus directories populated: go-fuzz-corpus/png/corpus and go-fuzz-corpus/tiff/corpus.
In parallel, multiple people are making progress on integrating
dvyukov/go-fuzz
into oss-fuzz in google/oss-fuzz#36, google/oss-fuzz#2188, dvyukov/go-fuzz#213 and elsewhere. Most likely, it will make sense to periodically update thegolang.org/x/corpus
repo with the output from oss-fuzz (which otherwise by default is stored in a Google Cloud Storage bucket).However, the exact mechanism of populating and updating
golang.org/x/corpus
repo I think can be discussed outside of this particular issue. (Reason: In general, it seems more tractable to make progress if things are broken down into more manageable discrete chunks of work, especially to break out into separate steps the things that must be done by someone on the core Go team, vs. could be done by someone from the broader community. This issue is focused on the creation of the repo, which presumably must be done by someone on the core Go team, whereas populating the corpus can be done by a greater range of people from the broader community in consultation with the core Go team).I am of course happy to discuss anything here, and happy to be corrected if any of the above is different than how people would like to proceed.
CC @dvyukov @josharian @FiloSottile @bradfitz @acln0
The text was updated successfully, but these errors were encountered: