Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate deeper fuzzing #975

Closed
turbolent opened this issue Jun 3, 2021 · 6 comments
Closed

Investigate deeper fuzzing #975

turbolent opened this issue Jun 3, 2021 · 6 comments
Assignees

Comments

@turbolent
Copy link
Member

turbolent commented Jun 3, 2021

Issue To Be Solved

We currently support fuzzing the parser and checker using go-fuzz. This helped during the development of the new parser last year.

Extend the fuzzing support to get deeper coverage and also support fuzzing the interpreter.

Currently, input is random, so the coverage for the checker is small, as only few inputs are valid syntactically. By providing syntactically, and further also semantically correct inputs, the coverage for the checker and interpreter can be improved.

Suggested Solution

  • Develop a good corpus, potentially by using Mainnet contracts and transactions
  • Consider grammar based input generation using e.g. Xsmith (talk), Dharma, etc.
  • Add support for fuzzing the interpreter (programs + inputs)
  • Consider improving mutations, using e.g. https://github.com/comby-tools/comby-mutation-server
  • Set up a continuous fuzzing process, potentially by leveraging cloud offerings

Stages

  1. Set up CI for the current fuzzing of the parser/checker
  2. Generate syntactically valid programs to get deeper coverage of the checker (parser won't reject)
  3. Generate semantically valid programs to get deeper coverage of the interpreter (checker won't reject)

Inspiration

@bluesign
Copy link
Contributor

bluesign commented Jun 4, 2021

Also native fuzzing for golang[0] can be useful I guess

[0] https://blog.golang.org/fuzz-beta

@fxamacker
Copy link
Member

Maybe we can use https://github.com/thepudds/fzgo until Go's proposed fuzzer is released. It's basically a prototype of Go's proposed fuzzer but it's already being used by some non-hobby projects. I might use it to stress test storage PoC.

I'd love to have custom mutators as an option in go-fuzz. 😆 I'd also love for go-fuzz to display cover as a percent during fuzzing.

A slightly customized version of dvyukov/go-fuzz@fca3906 was used to test fxamacker/cbor v2.3.0 release last week and I wish go-fuzz had the missing features.

Curated Lists of Fuzzing Resources

@turbolent
Copy link
Member Author

@bluesign wow, what are the odds! I had seen the GitHub issue tracking the development of that, and now they finally announced it just in time 🙂 We should definitely check that out!

@fxamacker Oh I hadn't seen this, thanks for sharing! Yes, we should also check that out and see how we can get good mutation for out use-case 😄 💯

@robert-e-davidson3
Copy link
Contributor

Discussed intent and scope with @turbolent and @j1010001. Here's the summary:


The goal of fuzzing is to quickly increase test coverage. Concretely this means fuzzing can run for a long time (say, a week) without finding any new bugs.

Our original fuzzing was quick and dirty: feed some bytes to the parser and fail on panics. The search space of this approach is so broad that it cannot run fast enough to provide much test coverage - almost every input it gives is syntactically invalid. This code lives in onflow/cadence and can be ignored.

We needed a grammar-based fuzzer to cut down the search space to just syntactically-valid inputs. The first idea was to write Cadence's grammar into the EBNF format so we could leverage existing tools. Unfortunately Cadence doesn't fit into EBNF cleanly enough to satisfy those tools.

Joe had a solution: modify the Cadence parser so that, at any point, it can give a list of valid tokens that can follow. This "parser-based fuzzer" is an implicit grammar-based fuzzer. That code lives in onflow/fuzzer as an old fork of onflow-cadence and was run on FuzzBuzz.io.

The next step is to evaluate the status of the onflow/fuzzer code and how it integrates with FuzzBuzz.io. Once that's established, rebase the latest Cadence code from onflow/cadence:master to onflow/fuzzer:master and make adjustments so the fuzzer still works.

That might be sufficient. If it isn't then we need to investigate alternatives such as go-fuzz or the built-in fuzzing support in Go 1.18. The approach here is to use mainnet contracts and transactions as the seed corpus. That will require maintenance because as the Cadence language is developed and so diverges from the deployed contracts.

If that isn't sufficient then we need to investigate writing and integrating our own mutation algorithm. Most likely this would merge both prior methods, mutating existing contracts and transactions randomly but based on the existing contracts.

@j1010001
Copy link
Member

@robert-e-davidson3
Copy link
Contributor

Deprecating this ticket in favor of epic #1309.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants