🧪 Add OSS-Fuzz set up #255

DavidKorczynski · 2023-03-15T11:53:13Z

This is a follow-up to #254 (comment)

@chrisjsewell I added some documentation which may be helpful to navigate the OSS-Fuzz infra. I hope this should make it easier to e.g. experiment and test with the fuzzing.

I'm the original author of the fuzzers and am happy to license them according to the MIT of this repository.

welcome · 2023-03-15T11:53:15Z

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out EBP's Code of Conduct and our Contributing Guide, as this will greatly help the review process.

Welcome to the EBP community! 🎉

Signed-off-by: David Korczynski <david@adalogics.com>

for more information, see https://pre-commit.ci

codecov · 2023-03-25T22:00:46Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (d1852a5) 96.07% compared to head (497fee8) 96.07%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #255   +/-   ##
=======================================
  Coverage   96.07%   96.07%           
=======================================
  Files          62       62           
  Lines        3236     3236           
=======================================
  Hits         3109     3109           
  Misses        127      127

Flag	Coverage Δ
pytests	`96.07% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

chrisjsewell · 2023-03-26T09:34:21Z

Thanks @DavidKorczynski!
As you can see, I just added a workflow to get an idea of how it runs; obviously it looks like it takes too long to run for all PRs, and would so would just be run as a maybe weekly cron job

From the README.md, it is not quite clear what the difference is between running the files in tests/fuzz directly, e.g. python tests/fuzz/fuzz_markdown.py, and following the steps to run using the git clone https://github.com/google/oss-fuzz steps, e.g. python infra/helper.py run_fuzzer markdown-it-py fuzz_markdown
What is the difference?
Is the code in google/oss-fuzz more thorough?

For the actual code, I'd like to understand it a little more, the language support is a little lacking 😅:

Is there a simple way to explain what "instrumenting" actually does?
what can you actually pass with sys.argv, I tried doing things like --help and it didn't seem to actually do anything
can you restrict the fuzzing to only a certain module or something? or does it always have to run through everything?
- is the way I run it in the workflow even right, or is it running through the whole of python's core code or something 😅

chrisjsewell · 2023-03-26T15:43:23Z

so yeh, the fuzzing hit the 6 hours time limit, is that right!?

DavidKorczynski · 2023-03-27T09:17:45Z

so yeh, the fuzzing hit the 6 hours time limit, is that right!?

When run as follows python3 ./fuzz_markdown.py the fuzzer will run forever or until a crash is found.

You can limit the fuzzer time using -max_total_time=X where X is the number of seconds to run. python3 ./fuzz_markdown.py -max_total_time=10 runs the fuzzer for 10 seconds where the timer starts post instrumentation.

what can you actually pass with sys.argv, I tried doing things like --help and it didn't seem to actually do anything

atheris uses libFuzzer under the hood, which means you can pass the commandline flags specified here: https://llvm.org/docs/LibFuzzer.html

Is there a simple way to explain what "instrumenting" actually does?

The instrumentation adds code that essentially correspond to counters throughout the Python code. These counters will be used by the fuzzer to trace what code was executed by a given input, and further uses this to check whether a input executes unique parts of markdown-it. This is needed because the fuzzer will find unique inputs that trigger unique parts of the code. Fuzzing is a "genetic mutational algorithm" and the instrumentation is used to separate which which seeds to save, I talk a small bit about here https://youtu.be/zIyIZxAZLzo?t=368 and from [06:08 - 10:30] approximately.

From the README.md, it is not quite clear what the difference is between running the files in tests/fuzz directly, e.g. python tests/fuzz/fuzz_markdown.py, and following the steps to run using the git clone https://github.com/google/oss-fuzz steps, e.g. python infra/helper.py run_fuzzer markdown-it-py fuzz_markdown
What is the difference?
Is the code in google/oss-fuzz more thorough?

No difference as such. It's the same program.

OSS-Fuzz will, however, wrap the code in a package using pyinstaller and offers some additional things e.g. code coverage visualisation to check how much of the target code is executed.

I would not necessarily recommend setting up a cron-job in the CI, as OSS-Fuzz will make sure the fuzzers run continuously. If anything, I'd advice to set it up in a way that tests if the fuzzers work (i.e. run for a short period of time) or use the CIFuzz job that comes accompanied with OSS-Fuzz: https://google.github.io/oss-fuzz/getting-started/continuous-integration/#integrating-into-your-repository

chrisjsewell · 2023-03-28T13:50:43Z

Thanks for the detailed response!

ERROR: 100.0% of fuzz targets seem to be broken. See the list above for a detailed information: https://github.com/executablebooks/markdown-it-py/actions/runs/4543755476/jobs/8008855531?pr=255

I don't think that I've quite mastered this yet 😅 what did I do wrong?

DavidKorczynski · 2023-03-28T13:57:28Z

I don't think that I've quite mastered this yet 😅 what did I do wrong?

I think it's the language attribute missing: https://google.github.io/oss-fuzz/getting-started/continuous-integration/#optional-configuration

If you set it to python I think it should work!

chrisjsewell · 2023-03-28T14:22:59Z

thanks, that seems to have worked: https://github.com/executablebooks/markdown-it-py/actions/runs/4543950408/jobs/8009331193?pr=255

could you explain the result for me:

...
2023-03-28 14:15:20,013 - root - INFO - Trying to reproduce crash using: /tmp/tmp6zgwo3_i/crash-d9e621f7e53a457e0c6b9133b72f77d5e81943e1.
2023-03-28 14:16:50,748 - root - INFO - Reproduce command returned: 1. Reproducible on /github/workspace/cifuzz-prev-build/fuzz_markdown.
2023-03-28 14:16:50,799 - root - INFO - Not uploading crashes because on OSS-Fuzz.

it seems there was a crash, but it did not upload it, so I can't reporoduce it?
are crashes only failing/uploaded if they are caused specifically by the changed code in the PR?

chrisjsewell · 2023-03-28T14:33:00Z

are crashes only failing/uploaded if they are caused specifically by the changed code in the PR?

oh yeh I should properly read the documentation lol: https://google.github.io/oss-fuzz/getting-started/continuous-integration/#how-it-works

DavidKorczynski · 2023-03-28T14:41:44Z

it seems there was a crash, but it did not upload it, so I can't reporoduce it?
are crashes only failing/uploaded if they are caused specifically by the changed code in the PR?

Yes -- this given crash was visible before, it seems to be this one: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55371

chrisjsewell · 2023-03-28T15:11:07Z

ok great, so from your readme/explanation, it seems:

the workflow could be helpful, for "early detection" of failures in PRs, although perhaps I don't want to wait 10 minutes for every PR commit 🤔
The command: python3 infra/helper.py reproduce markdown-it-py {FUZZER_NAME} {PATH_TO_MINIMIZED_TESTCASE}, seems the most useful, for quickly reproducing bugs reported by you guys. Perhaps I can set up a "tox env" to git clone + run this

I'm not sure of the usefulness of tests/fuzz/fuzz_markdown.py and tests/fuzz/fuzz_markdown_extended.py though; I'm not sure how/if I would actually ever use them

DavidKorczynski · 2023-03-28T15:58:01Z

Many projects prefer to have a shorter runtime -- I personally have mine on 120sec, but definitely tailor it to your needs and what fits your workflow. OSS-Fuzz will run the fuzzers continuously which will catch bugs that are more complex to reach.

I'm not sure of the usefulness of tests/fuzz/fuzz_markdown.py and tests/fuzz/fuzz_markdown_extended.py though; I'm not sure how/if I would actually ever use them

You're not meant to use them as such. Having a folder with fuzzers in this repo that OSS-Fuzz then fetches to run continuously serves the purpose of you being able to add/modify the fuzzers without having to make PRs on the OSS-Fuzz repo.

chrisjsewell · 2023-04-27T14:26:02Z

Hey @DavidKorczynski thanks for your patience 😅

In 2d46a43 and d1852a5 I separately added the CI workflow for PRs and a tox environment to reproduce crash files,

Then here, in 497fee8, I rewrote the README as I now understand the process.
If you could have a look and check that this makes sense, then I think this is good to merge!

DavidKorczynski · 2023-05-11T09:17:30Z

Looks great @chrisjsewell!

welcome · 2023-05-31T17:56:47Z

Congrats on your first merged pull request in this project! 🎉

Thank you for contributing, we are very proud of you! ❤️

Ref: executablebooks/markdown-it-py#255 Signed-off-by: David Korczynski <david@adalogics.com>

@hukkinj1

## 3.0.0 - 2023-06-03 ⚠️ This release contains some minor breaking changes in the internal API and improvements to the parsing strictness. **Full Changelog**: <executablebooks/markdown-it-py@v2.2.0...v3.0.0> ### ⬆️ UPGRADE: Drop support for Python 3.7 Also add testing for Python 3.11 ### ⬆️ UPGRADE: Update from upstream markdown-it `12.2.0` to `13.0.0` A key change is the addition of a new `Token` type, `text_special`, which is used to represent HTML entities and backslash escaped characters. This ensures that (core) typographic transformation rules are not incorrectly applied to these texts. The final core rule is now the new `text_join` rule, which joins adjacent `text`/`text_special` tokens, and so no `text_special` tokens should be present in the final token stream. Any custom typographic rules should be inserted before `text_join`. A new `linkify` rule has also been added to the inline chain, which will linkify full URLs (e.g. `https://example.com`), and fixes collision of emphasis and linkifier (so `http://example.org/foo._bar_-_baz` is now a single link, not emphasized). Emails and fuzzy links are not affected by this. * ♻️ Refactor backslash escape logic, add `text_special` [#276](executablebooks/markdown-it-py#276) * ♻️ Parse entities to `text_special` token [#280](executablebooks/markdown-it-py#280) * ♻️ Refactor: Add linkifier rule to inline chain for full links [#279](executablebooks/markdown-it-py#279) * ‼️ Remove `(p)` => `§` replacement in typographer [#281](executablebooks/markdown-it-py#281) * ‼️ Remove unused `silent` arg in `ParserBlock.tokenize` [#284](executablebooks/markdown-it-py#284) * 🐛 FIX: numeric character reference passing [#272](executablebooks/markdown-it-py#272) * 🐛 Fix: tab preventing paragraph continuation in lists [#274](executablebooks/markdown-it-py#274) * 👌 Improve nested emphasis parsing [#273](executablebooks/markdown-it-py#273) * 👌 fix possible ReDOS in newline rule [#275](executablebooks/markdown-it-py#275) * 👌 Improve performance of `skipSpaces`/`skipChars` [#271](executablebooks/markdown-it-py#271) * 👌 Show text of `text_special` in `tree.pretty` [#282](executablebooks/markdown-it-py#282) ### ♻️ REFACTOR: Replace most character code use with strings The use of `StateBase.srcCharCode` is deprecated (with backward-compatibility), and all core uses are replaced by `StateBase.src`. Conversion of source string characters to an integer representing the Unicode character is prevalent in the upstream JavaScript implementation, to improve performance. However, it is unnecessary in Python and leads to harder to read code and performance deprecations (during the conversion in the `StateBase` initialisation). See [#270](executablebooks/markdown-it-py#270), thanks to [@hukkinj1](https://github.com/hukkinj1). ### ♻️ Centralise indented code block tests For CommonMark, the presence of indented code blocks prevent any other block element from having an indent of greater than 4 spaces. Certain Markdown flavors and derivatives, such as mdx and djot, disable these code blocks though, since it is more common to use code fences and/or arbitrary indenting is desirable. Previously, disabling code blocks did not remove the indent limitation, since most block elements had the 3 space limitation hard-coded. This change centralised the logic of applying this limitation (in `StateBlock.is_code_block`), and only applies it when indented code blocks are enabled. This allows for e.g. ```md <div> <div> I can indent as much as I want here. <div> <div> ``` See [#260](executablebooks/markdown-it-py#260) ### 🔧 Maintenance changes Strict type annotation checking has been applied to the whole code base, [ruff](https://github.com/charliermarsh/ruff) is now used for linting, and fuzzing tests have been added to the CI, to integrate with Google [OSS-Fuzz](https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py) testing, thanks to [@DavidKorczynski](https://github.com/DavidKorczynski). * 🔧 MAINTAIN: Make type checking strict [#](executablebooks/markdown-it-py#267) * 🔧 Add typing of rule functions [#283](executablebooks/markdown-it-py#283) * 🔧 Move linting from flake8 to ruff [#268](executablebooks/markdown-it-py#268) * 🧪 CI: Add fuzzing workflow for PRs [#262](executablebooks/markdown-it-py#262) * 🔧 Add tox env for fuzz testcase run [#263](executablebooks/markdown-it-py#263) * 🧪 Add OSS-Fuzz set up by @DavidKorczynski in [#255](executablebooks/markdown-it-py#255) * 🧪 Fix fuzzing test failures [#254](executablebooks/markdown-it-py#254)

DavidKorczynski and others added 3 commits March 15, 2023 04:55

Add OSS-Fuzz set up

1163e9a

Signed-off-by: David Korczynski <david@adalogics.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0da4309

for more information, see https://pre-commit.ci

Merge branch 'master' into fuzzers-upstream

461639e

chrisjsewell changed the title ~~Add OSS-Fuzz set up~~ 🧪 Add OSS-Fuzz set up Mar 25, 2023

Try adding CI job

d05a346

Update fuzz.yml

6d27f50

Update fuzz.yml

e78ec69

chrisjsewell added 2 commits April 26, 2023 16:46

Merge branch 'master' into fuzzers-upstream

cdee552

reword fuzz README.md

497fee8

chrisjsewell merged commit baa8658 into executablebooks:master May 31, 2023

DavidKorczynski added a commit to google/oss-fuzz that referenced this pull request May 31, 2023

markdown-it-py: clean up follow upstream merge

707153d

Ref: executablebooks/markdown-it-py#255 Signed-off-by: David Korczynski <david@adalogics.com>

DavidKorczynski mentioned this pull request May 31, 2023

markdown-it-py: clean up follow upstream merge google/oss-fuzz#10431

Merged

AdamKorcz pushed a commit to google/oss-fuzz that referenced this pull request May 31, 2023

markdown-it-py: clean up follow upstream merge (#10431)

1a5c616

Ref: executablebooks/markdown-it-py#255 Signed-off-by: David Korczynski <david@adalogics.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧪 Add OSS-Fuzz set up #255

🧪 Add OSS-Fuzz set up #255

DavidKorczynski commented Mar 15, 2023

welcome bot commented Mar 15, 2023

codecov bot commented Mar 25, 2023 •

edited

Loading

chrisjsewell commented Mar 26, 2023

chrisjsewell commented Mar 26, 2023

DavidKorczynski commented Mar 27, 2023

chrisjsewell commented Mar 28, 2023

DavidKorczynski commented Mar 28, 2023

chrisjsewell commented Mar 28, 2023

chrisjsewell commented Mar 28, 2023

DavidKorczynski commented Mar 28, 2023

chrisjsewell commented Mar 28, 2023

DavidKorczynski commented Mar 28, 2023

chrisjsewell commented Apr 27, 2023

DavidKorczynski commented May 11, 2023

welcome bot commented May 31, 2023

🧪 Add OSS-Fuzz set up #255

🧪 Add OSS-Fuzz set up #255

Conversation

DavidKorczynski commented Mar 15, 2023

welcome bot commented Mar 15, 2023

codecov bot commented Mar 25, 2023 • edited Loading

Codecov Report

chrisjsewell commented Mar 26, 2023

chrisjsewell commented Mar 26, 2023

DavidKorczynski commented Mar 27, 2023

chrisjsewell commented Mar 28, 2023

DavidKorczynski commented Mar 28, 2023

chrisjsewell commented Mar 28, 2023

chrisjsewell commented Mar 28, 2023

DavidKorczynski commented Mar 28, 2023

chrisjsewell commented Mar 28, 2023

DavidKorczynski commented Mar 28, 2023

chrisjsewell commented Apr 27, 2023

DavidKorczynski commented May 11, 2023

welcome bot commented May 31, 2023

codecov bot commented Mar 25, 2023 •

edited

Loading