Improve data for (reviewed) withdrawn advisories #2420

Marcono1234 · 2023-06-18T17:40:50Z

Marcono1234
Jun 18, 2023

Hello,
for a university project a fellow student and I had a look in December 2022 at the JSON data of the back then 141 GitHub-reviewed withdrawn advisories in the GitHub Advisory Database and noticed multiple issues.

Disclaimer: The number of withdrawn advisories is only a small fraction compared to the total number of advisories (back then ~1,4% of all GitHub-reviewed advisories were withdrawn). So possibly the work needed to improve this is not justified, but maybe the points mentioned below are useful nonetheless.

Some general improvements to the OSV schema have been proposed in ossf/osv-schema#160; that issue also contains some numbers regarding the GitHub Advisory Database.

The following are points more specific to the GitHub Advisory Database:

for 48 of the withdrawn non-duplicate advisories we found advisories in other databases (e.g. NVD), but for those only ~45% were withdrawn in the other database as well; this might indicate missing synchronization between the databases, which can further increase confusion for users
based on the time values in the JSON data, for 13 advisories the withdrawing date was before the publishing date, and for some other the withdrawing date was a few seconds after the publishing date; this might indicate some inconsistent definition of what these date values mean (maybe especially for advisories imported from other databases, but we did not investigate that further)
the differentiation between Repository Advisories and Database Advisories with the same exact GHSA ID but different content can be confusing, see GHSA-qq97-vm5h-rrhg out-of sync. Why does it have different states? #224 (comment)
for some advisories the title and / or description was completely removed, making it difficult to understand what the advisory was originally about, e.g. GHSA-8q5c-93vg-c747
sometimes GitHub Issue references did not link to a specific summarizing comment so you have to read the complete conversation to understand the reason why something is not considered a vulnerability
a common reason for withdrawn advisories was missing communication with maintainers in advance (might have mostly / only affected advisories imported from NVD though)
a comparatively large number of Rust advisories was withdrawn without reason, with the corresponding CVE not having been withdrawn, e.g. GHSA-7mg7-m5c3-3hqj (RUSTSEC-2020-0116, CVE-2020-36436). We assume this might have been done because the advisories do not directly describe a vulnerability but only a bug which could break functionality (and might have security implications). However, no official reason why these advisories are withdrawn is documented anywhere. Edit: It appears they might actually be duplicates; for this example the actual advisory seems to be GHSA-686f-ch3r-xwmh, but this should really have been mentioned in the duplicating advisories.

Notes about specific advisories:

GHSA-crmx-v835-hcp4 (respectively the original CVE) might have been withdrawn erroneously, see https://blog.sonatype.com/cve-2017-17461-vulnerable-or-not; it appears Sonatype did not get the CVE reopened or requested a new CVE but instead created an advisory in their own database 🙄
GHSA-364w-9g92-3grq is withdrawn saying:

is not a security vulnerability with Laravel itself, but rather a userland issue.

But the linked CVE has not been withdrawn and says:

NOTE: this CVE Record is for Laravel Framework, and is unrelated to any reports concerning incorrectly written user applications for image upload.

Results and suggestions:

There should be a uniform way to describe why an advisory was withdrawn and this information should be required, see Add separate field for withdrawn reason & add guidelines for withdrawing ossf/osv-schema#160
When an advisory is withdrawn as duplicate the duplicated advisory ID should be referenced
Synchronization between the GitHub Advisory Database and other databases should be improved (if possible in both directions?)
- It might be good to add additional metadata to the JSON to indicate when an adversary was last synchronized with each other database
- For imported advisories it might be good to add additional metadata to the JSON indicating from which database an advisory originally came from, so a user can at least manually check the other database for more up to date information
- If it is clearer from which database an advisory originated from it might also be easier for maintainers to find the correct CNA to withdraw an advisory
"Bot spam" on issues and pull requests can make them nearly unreadable, see for example Javascript execution from template handlebars-lang/handlebars.js#1267. I had contacted WhiteSource (now Mend) to adjust the bot to stop linking to issues and pull requests, so at least for them this should hopefully be mostly solved now. But the general problem still exists and for example Dependabot's solution of using https://redirect.github.com/... URLs to avoid issue and pull request references feels personally a bit like a workaround rather than a solution to me. Maybe it would be better to additionally ignore any references from comments created by bots?
Even without the "bot spam" issue, discussions can get pretty long so a feature like pinned comments, as suggested by https://github.com/orgs/community/discussions/47912, would hopefully help making the comments containing the outcome stand out more
Maybe improve browsing history of advisory edits? Advisories shown on https://github.com/advisories/ have the text "This advisory has been edited. See History." at the right sidebar which links to the Git history of the advisory JSON file. But that is not that helpful because the bot often commits changes for multiple advisories at once (as "Advisory Database Sync" commits) (maybe this would be some general improvement on GitHub for viewing the Git history for a single file). Also for a lot of advisories the Git blame functionality is unfortunately limited due to most advisories have been removed #827 (maybe a force push would be justified, or maybe there are other ways to solve this?).

In general though we see the GitHub Advisory Database positively and hope that, also in combination with advisories created and CVEs requested directly on GitHub by repository maintainers and the new Private Vulnerability Reporting feature, it will be (a lot) easier for maintainers to publish and adjust advisories. We also appreciate that the data of the Advisory Database is public as Git repository here. This allowed us to perform this analysis quite easily locally without having to use some API and risking to hit rate limiting.

Hopefully this information and the suggestions are helpful for you!

If you need the list of all advisories we considered and in which category we put them (e.g. "unknown reason", "duplicate", "duplicate without referenced advisory", ...) I can try to provide them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve data for (reviewed) withdrawn advisories #2420

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Improve data for (reviewed) withdrawn advisories #2420

Marcono1234 Jun 18, 2023

Replies: 0 comments

Marcono1234
Jun 18, 2023