Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added fuzzing support #3007

Merged
merged 14 commits into from
Dec 6, 2023
Merged

Conversation

fuzzy-boiii23a
Copy link
Contributor

This just adds fuzzing support such that if any changes are made to libgumbo it's easy to fuzz these changes before merging something into the main branch.

@fuzzy-boiii23a
Copy link
Contributor Author

thoughts @flavorjones ?

@flavorjones
Copy link
Member

@fuzzy-boiii23a Thank you for opening this! I'm traveling this week but will make time to take a look.

@stevecheckoway
Copy link
Contributor

SanityCheckPointers (and as an aside, I've been trying to move away from such terminology) doesn't have any side effects and it doesn't return any values. 1. Have you checked that it doesn't get dead stripped? 2. How will you ensure that it doesn't get dead stripped in the future?

The else will definitely get pruned so if you're hoping that will check if the text pointer is accessible, I don't think that will do it.

There are also pointers for attributes that probably need to be checked as well.

@fuzzy-boiii23a
Copy link
Contributor Author

SanityCheckPointers (and as an aside, I've been trying to move away from such terminology) doesn't have any side effects and it doesn't return any values. 1. Have you checked that it doesn't get dead stripped? 2. How will you ensure that it doesn't get dead stripped in the future?

The else will definitely get pruned so if you're hoping that will check if the text pointer is accessible, I don't think that will do it.

There are also pointers for attributes that probably need to be checked as well.

hi @stevecheckoway , thanks for the heads up! i've made some changes and tested them using -03 to ensure i make it into each if else statement and i do and i also added checking attributes

@fuzzy-boiii23a
Copy link
Contributor Author

Hey @flavorjones hope you're going well and had a great trip, can you think of anything else that I could improve to make it easier to use?

@flavorjones
Copy link
Member

@fuzzy-boiii23a I've not been able to spend much time on OSS over the past few weeks, I appreciate your patience.

@fuzzy-boiii23a
Copy link
Contributor Author

@fuzzy-boiii23a I've not been able to spend much time on OSS over the past few weeks, I appreciate your patience.

Absolutely no stress at all :)

Copy link
Member

@flavorjones flavorjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for doing this work!! I really appreciate it, and I'm sorry it's taken me so long to review (I've been traveling a lot).

Can you also add a (very brief) summary of how to run the fuzzer to CONTRIBUTING.md? I've only ever used a fuzzer in one other project, and the configuration was much different from this.

  • What commands should I run?
  • What does output look like if no issues are found?
  • What does output look like if an issue is found?

gumbo-parser/fuzzer/build.sh Outdated Show resolved Hide resolved
gumbo-parser/fuzzer/build.sh Outdated Show resolved Hide resolved
@fuzzy-boiii23a
Copy link
Contributor Author

Thank you again for doing this work!! I really appreciate it, and I'm sorry it's taken me so long to review (I've been traveling a lot).

Can you also add a (very brief) summary of how to run the fuzzer to CONTRIBUTING.md? I've only ever used a fuzzer in one other project, and the configuration was much different from this.

  • What commands should I run?
  • What does output look like if no issues are found?
  • What does output look like if an issue is found?

Added those changes https://github.com/fuzzy-boiii23a/nokogiri/blob/fuzzing_support/CONTRIBUTING.md#fuzzing-your-gumbo-parser-changes hope you like them and no need to be sorry thanks for checking out the changes! hope your travel went well :)

Copy link
Member

@flavorjones flavorjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response. I made a few small requests.

Also, I noticed that the user account associated with the last few commits was different, wanted to ask if that was intentional?

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
gumbo-parser/fuzzer/build.sh Outdated Show resolved Hide resolved
@fuzzy-boiii23a
Copy link
Contributor Author

Thanks for the quick response. I made a few small requests.

Also, I noticed that the user account associated with the last few commits was different, wanted to ask if that was intentional?

Addressed those changes and the user account was unintentional that's a personal account I use for non fuzzing related projects and forgot to update config :)

and
- set some bash safety flags
- make it runnable from any directory
- simplify the makefile execution
use separate compilation directories to avoid the jobs clobbering each
other if users run make with a `-j` flag
and rename the top-level target from `fuzzing` to `fuzzers`
@flavorjones
Copy link
Member

flavorjones commented Nov 20, 2023

@fuzzy-boiii23a I added a few commits primarily to clean up the build.sh script and allow building the fuzzers in parallel.

At this point, I'm able to run the fuzzers, and I've even found a couple of (very small) memory leaks in gumbo! (See #3036)

I will want to do some follow-up work to assemble a corpus that's useful and make that scriptable. What else do you think needs to be done? Anything? Are you OK if I squash-merge this?

@fuzzy-boiii23a
Copy link
Contributor Author

@fuzzy-boiii23a I added a few commits primarily to clean up the build.sh script and allow building the fuzzers in parallel.

At this point, I'm able to run the fuzzers, and I've even found a couple of (very small) memory leaks in gumbo! (See #3036)

I will want to do some follow-up work to assemble a corpus that's useful and make that scriptable. What else do you think needs to be done? Anything? Are you OK if I squash-merge this?

Thanks for that and that's great to hear!, I'm happy to assemble a corpus and a dictionary but for now I think everything is fine to be merged. I will get around to doing that next week.

@fuzzy-boiii23a
Copy link
Contributor Author

@fuzzy-boiii23a I added a few commits primarily to clean up the build.sh script and allow building the fuzzers in parallel.

At this point, I'm able to run the fuzzers, and I've even found a couple of (very small) memory leaks in gumbo! (See #3036)

I will want to do some follow-up work to assemble a corpus that's useful and make that scriptable. What else do you think needs to be done? Anything? Are you OK if I squash-merge this?

Alright so i added a dictionary and corpus and added how to use it in CONTRIBUTING.md which greatly improves coverage so i cannot think of any further improvements to make, looks to to merge to me thanks for the support @flavorjones

flavorjones added a commit that referenced this pull request Nov 21, 2023
**What problem is this PR intended to solve?**

The fuzzer in #3007 found a memory leak when incomplete tags are
encountered.

The repro added to the memory leak test suite is:

```html
<asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf foo="bar
```

which will leak the tag name without this fix.

cc @stevecheckoway 

**Have you included adequate test coverage?**

Yes, but added it to the memory leak suite which can only be run
manually today (see #1603 for context on plans to improve it).

**Does this change affect the behavior of either the C or the Java
implementations?**

No functional change.
@flavorjones
Copy link
Member

@fuzzy-boiii23a Thank you!

Can you help me understand what's in the gumbo_corpus.zip? How was it generated, how would I add to it, etc.?

@fuzzy-boiii23a
Copy link
Contributor Author

fuzzy-boiii23a commented Nov 21, 2023

@fuzzy-boiii23a Thank you!

Can you help me understand what's in the gumbo_corpus.zip? How was it generated, how would I add to it, etc.?

No problem, I generated that corpus by running the fuzzer overnight with a html corpus I grabbed from https://github.com/dvyukov/go-fuzz-corpus along with some scraping of html via the web and after running for 24 hours I performed corpus minimization (https://llvm.org/docs/LibFuzzer.html#id25) which will run each file through the fuzzer and if for example 10 files execute the same amount of code in the target, libfuzzer will discard 9 files and keep 1 that exercises the same functionality. You could improve the corpus by collecting heaps of html files and then performing minimization of these inputs using https://llvm.org/docs/LibFuzzer.html#id25

@fuzzy-boiii23a
Copy link
Contributor Author

@flavorjones I had a look for the public corpus that google currently uses for fuzzing nokogiri's implementation of gumbo and it can be downloaded from https://storage.googleapis.com/nokogiri-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/nokogiri_parse_fuzzer/public.zip I'd say that this is the best one to use as the corpus is updated and maintained across their daily run against gumbo which is about a billion executions on average

@fuzzy-boiii23a
Copy link
Contributor Author

Hey @flavorjones hope you're doing well, can we please finalise and merge this one to close it off?

@flavorjones
Copy link
Member

@fuzzy-boiii23a Thanks for the nudge. The only other thing I need you to educate me on is how to manage the oss-fuzz emails I've been receiving to my personal email.

In #2992 we talked about using the google group I created, https://groups.google.com/g/nokogiri-oss-fuzz. How can I configure the oss-fuzz settings?

@fuzzy-boiii23a
Copy link
Contributor Author

@fuzzy-boiii23a Thanks for the nudge. The only other thing I need you to educate me on is how to manage the oss-fuzz emails I've been receiving to my personal email.

In #2992 we talked about using the google group I created, https://groups.google.com/g/nokogiri-oss-fuzz. How can I configure the oss-fuzz settings?

Hi @flavorjones , you can manage oss fuzz by going to https://oss-fuzz.com/ and logging in using the google email you're receiving alerts on however you can have a look at crashes and how to reproduce them by checking out https://oss-fuzz.com/testcases?project=nokogiri&open=yes whilst logged in, unfortunately google group cannot be used as you won't be able to access the oss fuzz dashboard as mentioned in google/oss-fuzz#11004 (comment)

@flavorjones
Copy link
Member

@fuzzy-boiii23a Ah, thank you! The pointer to google/oss-fuzz#11004 is what I was missing (I had no idea how it was configured).

Really very appreciative of your work here!!! ❤️

@flavorjones flavorjones merged commit 0e34562 into sparklemotion:main Dec 6, 2023
124 checks passed
flavorjones added a commit that referenced this pull request Feb 3, 2024
- compile_commands should been included in 464f8d4
- gumbo fuzzer files should have been included in #3007 and #3093
flavorjones added a commit that referenced this pull request Feb 3, 2024
- compile_commands should been included in 464f8d4
- gumbo fuzzer files should have been included in #3007 and #3093
flavorjones added a commit that referenced this pull request Feb 3, 2024
- compile_commands should been included in 464f8d4
- gumbo fuzzer files should have been included in #3007 and #3093

(cherry picked from commit b90b2c1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants