-
Notifications
You must be signed in to change notification settings - Fork 18k
[dev.fuzz] internal/fuzz: mutator should generate valid UTF-8 for strings #46874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
IIUC, one of the eventual goals is to be able to fuzz structs. If so, I'm not sure either idea works as well for higher-order types, where the field type is already a Perhaps there can be a method hung off of |
I like this idea. We could also have ASCIIString. I think it would be convenient to have such a mechanism, for exactly the reasons you list, but I'm wary of making it the default. Despite my comment about the readability of non-utf8 string examples, consuming arbitrary bytes DID turn up a bug that wouldn't have been found with valid characters alone (#46855).
Edit: I already dislike this last suggestion, at least as I manifested it. Fuzzing, being testing, should not add struct tags to non-test objects. |
Or, we could start with trying to have the mutator automatically balance exploration and exploitation based on which kind of string (e.g. all valid utf-8 runes, mix of valid and invalid utf-8 runes, binary data) yield the best results, in terms of defects discovered, on that specific target. So e.g. if for that target the fuzzer detects that all-valid utf-8 runes strings unearth more defects, it will progressively bias the mutator to generate more all-valid utf-8 runes strings, and less of the others types of strings. The benefit of this approach being that it requires no changes to the code being fuzzed and no additional knobs (although it may be slightly slower). Later on, if needed, we could easily add a way to disable this auto-tuning by passing an explicit configuration as suggested by @dsnet. |
Essentially this is the concept behind some of the advanced fuzzing strategies discussed in #46507. Implementing input prioritization methods which focus on inputs which produce more coverage (among other characteristics) naturally biases towards mutations that produce inputs that the program understands, without actually having to alter the mutator at all. That said there have been some discussions previously about biasing certain mutators, i.e. in order to prioritize mutators which reduce the size of inputs over those that increase the size of inputs etc. In a similar vein we may want to bias the mutators for strings, to produce inputs with valid UTF-8, over invalid UTF-8. I'd like to do some evaluation with a string based target to see how much of an effect of coverage this produces. |
I don't think we should do this. There is no property in the language which states that strings must be UTF-8, and we shouldn't special case this for fuzzing. Rob made an important point:
This stuck out to me because it demonstrates one of the biggest benefits of fuzzing: the mutator is able to generate inputs to A few general thoughts:
|
Is there anything else to discuss on this issue, or are we okay with closing it? |
Let's close. I think we're in agreement that UTF-8 by default is not a good idea. Custom mutators or defined types are appealing, but let's open an issue for those when we have a design. |
Currently, we use the same mutation engine for
string
and[]byte
. This tends to generate a lot of invalid UTF-8 strings that aren't usable for many use cases. While invalid UTF-8 is likely to turn up many shallow parser bugs, it may make the mutator less effective at finding more subtle, deeper bugs.We should have an option to make the mutator only generate UTF-8. Some ideas:
UTF8String
defined type. A fuzz function that accepts that as a parameter would only get valid UTF-8 strings.string
parameters. A function could request[]byte
for random bytes, and that can still be converted tostring
.cc @golang/fuzzing @findleyr
The text was updated successfully, but these errors were encountered: