Filter Duplicate Input Execution #2771

riesentoaster · 2024-12-15T15:20:48Z

Some mutators report MutationResult::Mutated, even if nothing actually changes about the input. HashMutator is a wrapper around other mutators that hashes inputs pre- and post-mutation to ensure MutationResult::Mutated is only reported if something actually changed.

This may be worth using on slow targets, where the hashing is quicker than the unnecessary additional executions of the target for previously tried inputs.

* Rules * more * aa

* fixing empty multipart name * fixing clippy * improve flexibility of DumpToDiskStage * adding note to MIGRATION.md

Updates the requirements on [bindgen](https://github.com/rust-lang/rust-bindgen) to permit the latest version. - [Release notes](https://github.com/rust-lang/rust-bindgen/releases) - [Changelog](https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md) - [Commits](rust-lang/rust-bindgen@v0.70.1...v0.71.1) --- updated-dependencies: - dependency-name: bindgen dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* no from stage * fixer * doc fix * how was this working???? * more fixes * delete more * rq * cargo-fuzz * m * aa

* go * fixing stuf * hello from windows * more * lolg * lolf * fix * a --------- Co-authored-by: Your Name <you@example.com>

* Maybe fix CI * does this help? * Very dirty 'fix'

* fixing empty multipart name * fixing clippy * New rules for the contributing (AFLplusplus#2752) * Rules * more * aa * Improve Flexibility of DumpToDiskStage (AFLplusplus#2753) * fixing empty multipart name * fixing clippy * improve flexibility of DumpToDiskStage * adding note to MIGRATION.md * Introduce WrappingMutator * introducing mutators for int types * fixing no_std * random fixes * Add hash derivation for WrappingInput * Revert fixes that broke things * Derive Default on WrappingInput * Add unit tests * Fixes according to code review * introduce mappable ValueInputs * remove unnecessary comments * Elide more lifetimes * remove dead code * simplify hashing * improve docs * improve randomization * rename method to align with standard library * add typedefs for int types for ValueMutRefInput * rename test * add safety notice to trait function * improve randomize performance for i128/u128 * rename macro * improve comment * actually check return values in test * make 128 bit int randomize even more efficient * shifting signed values --------- Co-authored-by: Dongjia "toka" Zhang <tokazerkje@outlook.com> Co-authored-by: Dominik Maier <domenukk@gmail.com>

domenukk · 2024-12-15T15:42:01Z

HashFilterMutator?

Or even HashMutationFilter

domenukk · 2024-12-15T15:44:04Z

As I stated in the discussion thread, I think a method for rejecting inputs that were already tried would be more useful (but I don't know your use case, so..)
Maybe using a Bloomfilter on the executor, or similar..

riesentoaster · 2024-12-15T22:00:43Z

As I stated in the discussion thread, I think a method for rejecting inputs that were already tried would be more useful (but I don't know your use case, so..)

I'm targeting the TCP/IP stack of an OS, so each execution takes in the order of magnitude of 1s, although most of that is spent in wait states (hence previous work like overcommit). Even still, the added runtime of this would be nothing compared to the execution, so this felt like an easy win.

Maybe using a Bloomfilter on the executor, or similar..

Something like this would definitely further improve the situation. Do you suggest creating a wrapping executor that returns either ExitKind::Ok or a new ExitKind::Skipped if the input was previously evaluated? This seems like a bodged-on solution as well though, since observers/feedbacks still run — we probably don't even want to call the executor in such cases.

Tracing this back it seems most appropriate in the stage? But that seems not that generic. So maybe in Fuzzer (resp. it's Evaluator impl)?

I'm also not sure if there's an opportunity here to combine this somehow with CentralizedLauncher?

domenukk · 2024-12-16T02:51:19Z

I think it could simply wrap an executor, yeah. And have an extra observation that's "skipped" -if it's true the testcase isn't interesting. Should be easy enough to do.

\We can still merge this PR as well, but the feedback should be renamed IMHO.

riesentoaster · 2024-12-17T16:25:52Z

How about something like this?

riesentoaster · 2024-12-17T16:49:10Z

I'll do some performance comparisons later today. Initial runs suggest that adding even a 10µs sleep to the harness reduces the performance penalty to <5%.

I might also see how many duplicate inputs actually appear. But for now I feel like for slow targets this very well might be worth using.

riesentoaster · 2024-12-18T00:09:58Z

Alright, some performance tests. Running against the libfuzzer_libpng example fuzzer:

Without the bloom filter, I'm getting a throughput of ~100k/s.
With the bloom filter, I get ~85k/s.
The rate of duplicate vs. new inputs increases over time, after 1min and 2min it was 0.6%, after 3min it was 1%, after 4min 2%, after 5min 4.4%, after ~7min it reaches 10%, after ~13mins 40%. At this point I assume most inputs are going to be duplicates.

All these numbers obviously depend on the exact fuzzers:

When the corpus count reaches a plateau, duplicate inputs become increasingly likely
When the number of possible mutations is small, duplicate inputs are more likely
If the execution time of the target is larger, the added runtime may be less than the runtime saved from not executing an input twice.
The bloom filter requires quite a bit of memory, so if that is your limitation, not using it and instead spawning additional instances may be worth it

Overall, I feel like this may be worth having in the library.

Btw: There is no easy way of adding metadata to the state such that it is printed by monitors, right? Otherwise, calculating the number/rate of duplicates may be an interesting addition.

domenukk · 2024-12-18T00:19:10Z

There is an easy way, using UserStats see... however other things do UserStats. For example the stability in the calibration stage.

Cargo.toml

fuzzers/baby/baby_fuzzer_custom_executor/Cargo.toml

riesentoaster · 2024-12-24T15:59:05Z

Alright, I've implemented some of that. I have changed the following stages to use the new evaluate_filtered:

GenStage
PowerMutationalStage
StdMutationalStage
MultiMutationalStage
SimpleConcolicMutationalStage

Notable unchanged stages (please check them yourself, I know only very little about the many different stages):

StdTMinMutationalStage (it calls evaluate_execution directly, and I'm generally unsure if this should call the filtered or unfiltered method)
CalibrationStage needs to call the unfiltered version, right?

domenukk · 2024-12-25T08:51:25Z

libafl/src/fuzzer/mod.rs

+    /// This is achieved by hashing each input and using a bloom filter to differentiate inputs.
+    ///
+    /// Use this implementation if hashing each input is very fast compared to executing potential duplicate inputs.
+    pub fn with_bloom_input_filter(


I would add a with_input_filter method and then you just provide a BloomInputFilter there (or we keep this one as extra constructor)

I've added a generic version and changed both new and with_bloom_input_filter to use this one.

libafl/src/mutators/hash.rs

libafl/src/stages/mutational.rs

libafl/src/stages/power.rs

libafl/src/stages/tuneable.rs

domenukk · 2024-12-25T08:57:02Z

Now it looks very good! Just left some minor nitpicks.
Yes, tmin and calibration need to execute unfiltered.

Thank you and merry christmas!

tokatoka

merry christmas 🎅

tokatoka · 2024-12-25T09:02:22Z

libafl/src/fuzzer/mod.rs

+}
+
+#[cfg(feature = "std")]
+impl<I: Hash> InputFilter<I> for BloomInputFilter {


is this I: Hash necessary? if not can you delete?
(always, keep the contraints minimal)

Yes, the bloom filter checks presence based on the hash value of the input.

riesentoaster · 2024-12-28T00:29:41Z

Thank you for the quick responses! I hope you had relaxing holidays anyways :D

The remaining CI issues seem unrelated(?)

riesentoaster and others added 17 commits December 6, 2024 17:02

fixing empty multipart name

aefb8e3

fixing clippy

a98c981

Merge branch 'main' into main

7acf5a3

New rules for the contributing (AFLplusplus#2752)

2da6dc5

* Rules * more * aa

Improve Flexibility of DumpToDiskStage (AFLplusplus#2753)

1e571a0

* fixing empty multipart name * fixing clippy * improve flexibility of DumpToDiskStage * adding note to MIGRATION.md

No Use* from stages (AFLplusplus#2745)

e1d0b92

* no from stage * fixer * doc fix * how was this working???? * more fixes * delete more * rq * cargo-fuzz * m * aa

Update CONTRIBUTING.md MIGRATION.md (AFLplusplus#2762)

c842eda

No Uses* from fuzzer (AFLplusplus#2761)

31d9b56

* go * fixing stuf * hello from windows * more * lolg * lolf * fix * a --------- Co-authored-by: Your Name <you@example.com>

Remove useless cfgs (AFLplusplus#2764)

c9eb2a8

Link libresolv on all Apple OSs (AFLplusplus#2767)

93b64f9

Somewhat ugly CI fix... (AFLplusplus#2768)

294d2f1

* Maybe fix CI * does this help? * Very dirty 'fix'

Add HashMutator

bab9890

Fix docs

71fc1c6

Merge branch 'main' into add-label-mutationresult

a2fa10c

Fix docs again

30e1db4

introducing bloom filter

025a56a

riesentoaster added 2 commits December 17, 2024 23:08

fix tests

63b9ac9

Merge branch 'main' into add-label-mutationresult

92c3f08

domenukk reviewed Dec 18, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

domenukk reviewed Dec 18, 2024

View reviewed changes

fuzzers/baby/baby_fuzzer_custom_executor/Cargo.toml Outdated Show resolved Hide resolved

riesentoaster added 6 commits December 24, 2024 17:04

Revert changes to global Cargo.toml

28b5c4a

Hide std-dependent dependency behind std feature

60e188f

Fix example fuzzer

68041b9

Rename constructor for filtered fuzzer

e3f530e

Reorder generics alphabetically

db994a4

Rename HashingMutator, add note to MutationResult about filtered fuzzers

d2dc266

riesentoaster requested review from tokatoka and domenukk December 24, 2024 16:42