Skip to content

Conversation

@yash-atreya
Copy link
Contributor

@yash-atreya yash-atreya commented Sep 23, 2025

Motivation

towards #8898 (#11842)

Solution

  • Replaces CorpusManager with WorkerCorpus
  • WorkerCorpus is the corpus to be used by parallel worker threads.
  • Each WorkerCorpus has an id: u32. The master worker has the id = 0.
  • The WorkerCorpus's share their in_memory_corpus amongst each other via the file system using a star-pattern where the master worker (id = 0) is in the center.
  • The corpus_dir is now organized as:
corpus_dir/
    worker0/ - // master
        sync/
        corpus/
    worker1/
        sync/
        corpus/
    worker2/
        sync/
  • Each non-master worker exports their corpus to worker0/sync/ - See fn export in 089f30b
  • The master worker distributes it's worker0/corpus entries (which includes entries from all workers when synced) to each workers sync/ directory - See fn distribute in d4200e4
  • Each worker then pulls the new corpus entries from their respective corpur_dir/workerId/sync dir into corpus_dir/workerId/corpus if it leads to new coverage and updates it's history_map - See fn calibrate in 488d09d
  • In fn calibrate we are fetching the new corpus entries from the workers sync/ dir and replaying the tx sequences to check if they lead to new coverage for this particular worker. If it does then we're updating history_map.
  • The pub fn sync introduced in e9d8d3c handles all of the above.

Note: This PR does not address parallelizing the fuzz runs, only prepares for it. Opened for initial feedback on the approach.

PR Checklist

  • Added Tests
  • Added Documentation
  • Breaking changes

@yash-atreya yash-atreya changed the title [wip] feat(evm): SharedCorpus for multiple worker threads [wip] feat(fuzz): SharedCorpus for multiple worker threads Sep 23, 2025
@yash-atreya yash-atreya changed the title [wip] feat(fuzz): SharedCorpus for multiple worker threads [wip] feat(fuzz): WorkerCorpus for multiple worker threads Sep 25, 2025
Comment on lines +388 to +390
// Track in-memory corpus changes to update MasterWorker on sync
let new_index = self.in_memory_corpus.len();
self.new_entry_indices.push(new_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but may result in some corpus entries not getting sync'd e.g. due to a crash or ctr+c and restart. If you persist the last sync'd timestamp, it can recover from restarts by checking if there are newer entries written before a sync could occur

@yash-atreya yash-atreya changed the title [wip] feat(fuzz): WorkerCorpus for multiple worker threads feat(fuzz): WorkerCorpus for multiple worker threads Sep 29, 2025
@yash-atreya yash-atreya self-assigned this Sep 29, 2025
@yash-atreya yash-atreya moved this to Ready For Review in Foundry Sep 29, 2025
@yash-atreya yash-atreya added this to the v1.5.0 milestone Sep 29, 2025
@DaniPopes
Copy link
Member

please merge master, this is still using old CI runners

Comment on lines +235 to +236
'corpus_replay: for entry in std::fs::read_dir(corpus_dir)? {
let path = entry?.path();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we continually write to the corpus directory? This is very expensive as we not only iterate a directory and read the files, but we also (if gzip is enabled) do decompression over and over, potentially of the same file. It feels like the corpus should be default in-memory, and we only write at the end.

Copy link
Contributor

@0xalpharush 0xalpharush Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just happening at start up. The entries are held in memory so long as they don't exceed a configurable limit and then flushed to disk (and compressed if it's enabled).

Your point does still stand elsewhere. IIUC workers share compressed corpus entries so they potentially are repeatedly decompressing the same files. Moving compression to the very end would resolve this

@jenpaff jenpaff modified the milestones: v1.5.0, v1.6.0 Oct 30, 2025
if !tx_seq.is_empty() {
let mut new_coverage_on_sync = false;
for tx in &tx_seq {
if can_replay_tx(tx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the entire sequence prior to executing may make sense as I think currently new_coverage_on_sync can be set and a later tx contains an invalid selector. Ideally, we could just drop those txs and see if we can recover and retain the modified, valid corpus entry

@0xalpharush
Copy link
Contributor

Overall looks pretty good to me. I left a few comments and there are some outstanding. I would recommend adding a config to turn syncing off and also a way to make it occur more/less frequently.

@DaniPopes
Copy link
Member

Merged into #12713.

@DaniPopes DaniPopes closed this Dec 2, 2025
@github-project-automation github-project-automation bot moved this from Ready For Review to Done in Foundry Dec 2, 2025
@DaniPopes DaniPopes deleted the yash/shared-corpus branch December 2, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants