-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching to a magic number-based collection generator? #90
Comments
Interesting! I wasn't aware of that recommendation. Yeah I think it would be worth doing. The driver would need to be updated in order to report what "mode" it was in. We do have the |
So now that I understand Driver & DriverMode a bit: I think the way this would make most sense would be as an alternative Driver, like FuzzByteSliceDriver; that’d always run in Direct mode because it would really make no sense to have it run in Forced mode. However, doing so would mean that we should hoist quite a lot more functions into Does that make sense to you? If not, I guess a DriverMode::DirectFuzz could work too; but TBH I still don’t really feel that good about DriverMode already, and adding one that would not even really be on the same axis as the current Direct and Forced would feel bad 😅 |
An example API for that would be: trait Driver {
// ...
type SequenceIterator<'a, F, T>: Iterator<Item = T>;
type SequenceDriver: Driver;
/// Note that SequenceIterator is supposed to panic if it is dropped without having been polled to completion first
fn gen_sequence<'a, F, T>(&'a mut self, allowed_lens: Range, gen: F) -> SequenceIterator<'a, F, T>
where
F: FnMut(&mut Self::SequenceDriver) -> T;
} WDYT? |
That seems pretty complicated to me... Why not have a function to indicate to the driver that we're starting/ending a list/item and the driver can decide what to do with that information. |
Hmm… so the thing is, in order to support magic number-based collection generation, we'd need to make sure the driver is the one actually defining the length. This being said, something like this would probably work: pub struct SequenceToken;
trait Driver {
// ...
fn start_sequence(&mut self, allowed_lens: Range) -> Option<SequenceToken>;
/// Returns true iff the sequence has not completed yet
fn next_sequence_item(&mut self, id: &SequenceToken) -> Option<bool>;
} Where the usage would be expected to be like: let s = d.start_sequence(len_range)?;
while d.next_sequence_item(&s)? {
gen_item(&mut d)?;
} This would work too, but then the API could be misused by buggy generators much more easily, eg. by not handling nesting properly, or by parsing an element before calling Also, it would probably actually make the implementation of each Driver more complex, as eg. the RngDriver would need to manually remember information like how many items remain in each sequence of the current sequence stack, without being able to rely on the call stack to handle that. However, it would definitely be an alternative API to the one I suggested above! It's just for these reasons that it's not the one I suggested at first :) |
Oh and I actually didn't think of that when writing the message above (which stays correct), but: even more importantly, the API above does not actually work. Indeed, in order to have a magic-number-based collection generator, we need the FuzzByteSliceDriver (thereafter fuzzdriver) to be able to decide how much of the data each child is allowed to consume. For instance, assume input like the following, with an item parser that needs two integers:
Then we want to have the fuzzer fail to produce the input, so that it learns that it needs two integers between MAGIC_NEXT and MAGIC_END. And if we use a This is especially important when nesting collections, as then the magic number must be different, and a sub-collection could otherwise way too easily eat the "next item" from the super-collection. Does that make sense to you? (Also, well, for the person who writes a Generator the first API just seems nicer to me, and people will more often write Generators than Drivers, but I guess that's just a subjective thing) |
Oh. I think I may have just figured out an API to rule them all? Though it'd still be quite complicated: /// This type's Drop implementation should panic if get_item did not return None yet
trait CollectionDriver {
type ItemDriver<'a>: Driver;
/// Returns None once there are no longer any elements in the collection
fn get_item<'a>(&'a mut self) -> Option<ItemDriver<'a>>;
}
trait Driver {
// ...
type CollectionDriver<'a>: CollectionDriver;
/// Note that SequenceIterator is supposed to panic if it is dropped without having been polled to completion first
fn start_collection<'a>(&'a mut self, allowed_lens: Range) -> Self::CollectionDriver<'a>;
} This is close to my first idea above; but instead of using Iterator and a (I've also renamed sequence to collection because it matches "collection of distinct types" better I think) With this scheme other ideas like making TypeGenerator actually use a Collection for everything it generates would become possible, though I'm not sure how worth it this would be (more structure… but not necessarily better fuzzing? I guess it depends on where there is variable-length stuff, basically anything variable-length should probably be in a collection… but then it could also make sense to skip MAGIC_NEXT when the child itself is fixed-length… well lots of things to think about here) Oh and by the way we may also need the ItemDriver to not necessarily be the same as the parent driver, because of the need for escaping magic numbers so literally every output stays representable in the input. |
Hey!
I've just recently had a look at the TypeGenerator for Vec, and noticed that the collection generators seem to use a tag-length-value format.
Such a format is very quickcheck-friendly (because all inputs are valid), but it is not at all fuzzer-friendly, because a change of one in the length of one struct is going to completely mess with the whole structure of the rest of the input.
Google recommends a magic-number-based parser, as per https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#magic-separator : instead of having a length, have a separator used to go to the next input.
With this scheme, quickcheck is much less happy (because it needs to somehow figure out this magic number), but fuzzers are much happier (because they can easily change the length of the contents of one cell without messing with the rest).
So I'm thinking, do you think it'd make sense to change the generators of cargo-bolero so that they would use a magic-number scheme? The only issue I can see would be with the quickcheck-based backend, which I think could instead always use mutate so it doesn't have the issue that most inputs would have a single element in the collection (unless the magic number magically appeared)
The text was updated successfully, but these errors were encountered: