Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: deduplicate fuzz inputs #3521

Closed
mds1 opened this issue Oct 20, 2022 · 11 comments · Fixed by #7428
Closed

feat: deduplicate fuzz inputs #3521

mds1 opened this issue Oct 20, 2022 · 11 comments · Fixed by #7428
Assignees
Labels
A-testing Area: testing C-forge Command: forge T-feature Type: feature T-perf Type: performance

Comments

@mds1
Copy link
Collaborator

mds1 commented Oct 20, 2022

Component

Forge

Describe the feature you would like

Currently foundry's fuzz inputs are quite wasteful. Here's a spreadsheet with some data from a new project that simply logs the fuzzer generated values, i.e. dict inputs are minimal so this is a worst case scenario. You can see with uints only 29% of values are unique, and 31% of values are the edge biased values.

Given that only 29% of values are unique, this means 71k of the 100k runs were wasted execution.

For a fuzz test with n inputs, we should consider tracking each unique combination of inputs used to avoid re-using the same inputs. A few consideration:

  • If we naively re-generate inputs when finding a duplicate, this might not be any better than just increasing the number of runs (performance-wise)
  • A more performant approach may be to modify the strategies. I think each input of a fuzz test is generated independently, so we don't know if we have duplicate inputs until all values are chosen. We may be able to modify the strategies such that all required inputs are generated in the same strategy

Additional context

No response

@mds1 mds1 added the T-feature Type: feature label Oct 20, 2022
@gakonst gakonst added this to Foundry Oct 20, 2022
@gakonst gakonst moved this to Todo in Foundry Oct 20, 2022
@rkrasiuk rkrasiuk added A-testing Area: testing C-forge Command: forge labels Oct 21, 2022
@mds1
Copy link
Collaborator Author

mds1 commented Feb 27, 2023

I think the best solution here is to remove the explicit edge_weight strategy used to choose edge values, and instead pre-populate the dict with all edge cases values. Specifically the dict would always be populated with:

  • 0 +/-3
  • +/-3 around every type(uintN).max
  • +/-3 around every type(intN).max
  • +/-3 around every type(intN).min

@wirew0lf I think the telegram you mentioned you may open a PR with a fix for this, should I assign to you?

@wirew0lf
Copy link

I think the best solution here is to remove the explicit edge_weight strategy used to choose edge values, and instead pre-populate the dict with all edge cases values. Specifically the dict would always be populated with:

* 0 +/-3

* +/-3 around every `type(uintN).max`

* +/-3 around every `type(intN).max`

* +/-3 around every `type(intN).min`

@wirew0lf I think the telegram you mentioned you may open a PR with a fix for this, should I assign to you?

Yeah please, assign it to me.

@mds1
Copy link
Collaborator Author

mds1 commented Feb 28, 2023

Done, thank you!

@wirew0lf
Copy link

Hey, sorry for not taking care of this earlier, I've had a crazy amount of work. I recently started trying to address this, currently I've got the approach @mds1 mentioned by completely removing the edge_weight strategy which now makes it way more random. Now need to take care of pre-populating the dictionary.

However, I've been looking into other alternatives because I don't find this to be the best possible approach for a fuzzer. I know this has been probably already discussed but features such as re-utilizing the corpus of previous runs as echidna or other binary fuzzers like AFL do, would be something that take the fuzzing/invariant testing to the next level.

However, I don't think this would be possible while the values are generated on each run. Wouldn't a better approach be pre-generating a dictionary with as many runs as configured in the foundry.toml and then just use consequent values from the dict on each run?

This would also allow to implement some other cool stuff later on, like bounding values or defining value exceptions like with vm.assume without wasting a run of the fuzzer.

@mds1
Copy link
Collaborator Author

mds1 commented Apr 24, 2023

Sorry can you expand on your suggested features? I'm not sure I totally follow. Right now I believe forge collects PUSH bytes and storage values from runs, adds them to the dictionary, and uses them on subsequent runs

@wirew0lf
Copy link

wirew0lf commented May 8, 2023

Sorry can you expand on your suggested features? I'm not sure I totally follow. Right now I believe forge collects PUSH bytes and storage values from runs, adds them to the dictionary, and uses them on subsequent runs

Most traditional binary fuzzers like AFL or even new smart contract fuzzers like echidna use the corpus from previous runs to fuzz, this is different from using push bytes and storage. The corpus is basically a set of interesting inputs used for fuzzing.

So for example, the fuzzer randomly generates a value for a call and this value increases coverage, reaching code that was not reached before by any of the other runs, we want to save this input in the corpus, so we can keep re-using this input in future runs, this has been topic of several research specially in the are of binary fuzzing. I'm by no means an expert in this subject matter but I think using completely random values is not the best approach for fuzzing, even if you include the storage and push bytes. In my opinion, saving and re-using the corpus while measuring coverage after each call on the run to look for new corpus would basically take the invariant testing feature to another whole new level.

The problem is, right now we generate the values for the fuzzer and invariant testing based on the strategy for the given type, without knowledge of previous runs. This not only leads to the duplicate input issue, which can be easily solved by moving the edge strategy to the dictionary along with the push and storage values. But also prevents us from developing more advanced features like the one I described above.

So what I suggest is coming up with a new architecture for the fuzzer that allows us to keep track of previous runs and log coverage of the run. This would allow us to use the corpus from previous runs or even use a value from the same run that we found increases coverage.

@mds1 let me know what you think about this, if you find it interesting I can open a separate feature request. I can also push the changes to the integer strategy to get rid of the edge_weight issue on the meanwhile.

@mds1
Copy link
Collaborator Author

mds1 commented May 8, 2023

Ah ok, IIUC you are basically describing coverage guided fuzzing. This is useful and something that would be great to have. However:

  1. The added logic/computation for each fuzz run means each individual fuzz runs now takes longer, so you may find more bugs per number of fuzz runs with coverage guidance, but that doesn't mean you find more bugs per unit of time. Therefore it's not always obvious whether the tradeoff of coverage guided fuzzing is worth it—sometimes it may be, other times it may not be
  2. Coverage currently doesn't work with the optimizer or via-ir because solc's source maps aren't always correct/useful with those build settings. So currently forge can't rely on exclusively using coverage guided fuzzing anyway

As a result I think this feature (#3521 (comment)) is still worth implementing right now, as it will be a lot less work than coverage guided fuzzing, and should still provide a significant benefit.

@ultrasecreth
Copy link

Not sure if this is the right place, but fuzz tests with a single bool output still run 256 times (or whatever the conf is on foundry.toml), I think it should auto limit itself to 2 times

@mds1
Copy link
Collaborator Author

mds1 commented Aug 24, 2023

@ultrasecreth I'd suggest creating a separate issue since this issue's scope is a different problem. But if you're only fuzzing over a bool you should just have 2 concrete tests instead.

I think the feature you are asking for: is if the maximum possible of input permutations for a fuzz test is less than the number of fuzz runs, don't fuzz, and instead run all permutations

@mds1
Copy link
Collaborator Author

mds1 commented Jul 24, 2024

@grandizzy Can you remind me why #7428 closes this issue? I suspect the default behavior with no fixtures would still result in many duplicate inputs since I don't think we've fundamentally changed how inputs are generated?

@grandizzy
Copy link
Collaborator

@grandizzy Can you remind me why #7428 closes this issue? I suspect the default behavior with no fixtures would still result in many duplicate inputs since I don't think we've fundamentally changed how inputs are generated?

yeah, is because in #7428 the default behaviour with no fixtures defined was changed to generate random values instead edge cases, hence reducing dupes

@jenpaff jenpaff moved this from Todo to Completed in Foundry Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Area: testing C-forge Command: forge T-feature Type: feature T-perf Type: performance
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants