Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure Fuzzer Seed Data #2

Merged
merged 1 commit into from
May 8, 2024

Conversation

DaveLak
Copy link
Collaborator

@DaveLak DaveLak commented May 8, 2024

Merging this change requires gitpython-developers/GitPython#1913, the corresponding GitPython PR, to be merged at the same time in the GitPython repo to update the OSS-Fuzz scripts to prevent breaking the builds.

The most significant change introduced here is replacing the .zip files containing seed corpora with sub-directories for each fuzz target containing an uncompressed corpus. This change makes it easier to update each corpus on a per-input-blob basis, as well as making the content of each corpus easier to inspect manually.

The inputs added in this commit are taken directly from the corpora backups generated by ClusterFuzz/OSS-Fuzz and can (and should) likely be reduced to their minimal sets in a follow-up PR.

Other changes:

  • Renamed the dict sub-directory to dictionaries for clarity.
  • Updated the README to better document what this repository contains. The README could use some improvement as well, including instructions for contributing and generating new inputs, but for now it should be good enough (I hope.) The "Dictionaries" section was migrated from the GitPython fuzzing README.

**Merging this change requires a corresponding PR to be merged at the same time
in the GitPython repo to update the OSS-Fuzz scripts to prevent breaking the
builds.**

The most significant change introduced here is replacing the `.zip` files
containing seed corpora with sub-directories for each fuzz target containing an
uncompressed corpus. This change makes it easier to update each corpus on a
per-input-blob basis, as well as making the content of each corpus easier to
inspect manually.

The inputs added in this commit are taken directly from the corpora backups
generated by ClusterFuzz/OSS-Fuzz and can (and should) likely be reduced to
their minimal sets in a follow-up PR.

Other changes:
- Renamed the `dict` sub-directory to `dictionaries` for clarity.
- Updated the README to better document what this repository contains. The README
  could use some improvement as well, including instructions for contributing and
  generating new inputs, but for now it should be good enough (I hope.) The
  "Dictionaries" section was migrated from the GitPython fuzzing README.
DaveLak added a commit to DaveLak/GitPython that referenced this pull request May 8, 2024
This change is required to support the changes to the seed data repo
structure introduced in:
gitpython-developers/qa-assets#2

This moves most of the seed data related build steps into the OSS-Fuzz
Docker image build via `container-environment-bootstrap.sh`. This
includes moveing the dictionaries into that repo.

The fuzzing/README.md here should be updated in a follow-up with a link
to the qa-assets repo (and probably some context setting about corpora
in general) but I have opted to defer that as I think the functionality
added by the seed data improvements is valuable as is and shouldn't be
blocked by documentation writers block.
@DaveLak DaveLak requested a review from Byron May 8, 2024 07:31
@DaveLak DaveLak marked this pull request as ready for review May 8, 2024 07:32
@Byron Byron merged commit c1a5b83 into main May 8, 2024
@Byron Byron deleted the refactor-gitpython-dir-and-improve-readme branch May 8, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants