Add product synonyms #2819

anthonyharrison · 2023-03-14T18:36:37Z

#2685 is related

Some products have multiple names and it would be good if we could handle this in an elegant way particulalry for the language and SBOM parsers although there may be some benefits for the checkers as well. My idea would be to have a list of synonyms which can be checked against.

I am thinking particulalry of Java packages which can sometimes be included as org.xxxx in a POM.XML file but it can also be referred to as xxxx.jar when scanning a directory or archive.

Thoughts?

terriko · 2023-03-14T21:50:44Z

The checkers already do have synonyms in that they support multiple {vendor, product} pairs. I'm wondering if maybe we should have a similar format for information for basically everything we detect.

I know @anthonyharrison knows what this looks like but for the benefit of writing it out here's what this could look like with my favourite multi-name example, beautifulsoup (it's my favourite because all the names are on the website for me to cut and paste easily):

CPE / {vendor, product} pairs for nvd
packaging names / { package_source, name }
- {pypi, beautifulsoup4}
- {pypi, bs4} (yes, it's a real package to limit typosquatting: https://pypi.org/project/bs4/)
- {debian, python3-bs4}
- {ubuntu, python3-bs4}
- {fedora, python-beautifulsoup4}

That would let us potentially map all the package pairs to nvd lookup pairs as an N:N set, which I think is something we need.

The next question would be... how do we store and use this? We could potentially extend the existing checker format:

class BeautifulSoupChecker(Checker):
    CONTAINS_PATTERNS: list[str] = []
    FILENAME_PATTERNS: list[str] = []
    VERSION_PATTERNS = []
    VENDOR_PRODUCT = [("crummy_not_in_db", "beautifulsoup4")]
    PYPI_PACKAGE = ["beautifulsoup4", "bs4"]
    DEBIAN_PACKAGE = ["python-bs4", "python3-bs4"]
    FEDORA_PACKAGE = ["python-beautifulsoup4"]

Some notes here:

since bs4 doesn't actually have any CVEs I'm using the same _not_in_db indicator we use for our own requirements scan; we could maybe do something fancier or codify that better.
there's no VERSION_PATTERN which would mean this couldn't be used in the binary scanner. That might be a thing we want, or maybe we'd want to add a pattern -- I'm not sure what the right way to go is or if it's maybe have both be an option? But we'd definitely need to consider how to handle non-binary checkers/checkers without binary search patterns and do it consistently.
I opted for a variable for each packaging type -- we might prefer to group the _PACKAGE ones into a single data structure that could be iterated through more easily. Not sure.

Open questions:

Should this actually be in python code, or are we at the point where this should be json input or something else? (We went with python for the checkers to facilitate the regexes, but that might not apply to non-binary checkers.)
Should the binary checker data and the packaging data be in separate file formats?
Do we want to make it possible to have explicit non-matches? e.g. be able to say that {java_jar, json-parser} is not the same as {fedora, json-parser} and doesn't use the same nvd lookup pairs?

terriko · 2023-03-14T21:55:53Z

Design goals:

We want to make it easy for people to do pull requests and add/fix data. (so, probably use a text-based format that can be edited by hand and handled similar to code pull requests)
We may want to be able to list binary checkers and each package types' "checkers" (metadata) separately
We will want to integrate this data into the other metadata discussed in GSoC 2023 Project idea: Improved product representation & meta-info about products. #2633 (assuming we get someone to work on that project via summer of code)

ffontaine · 2023-03-15T13:21:55Z

release-monitoring.org could be helpful to retrieve the different package names or "synonyms" used by distributions.
Here is the web page for beautifouls-soup4: https://release-monitoring.org/project/3779
release-monitoring.org is actively used by Fedora, alpine, Arch-Linux, buildroot: https://release-monitoring.org/distros
release-monitoring could also be used to retrieve and display the latest version for a given package.

anthonyharrison · 2023-03-15T17:24:50Z

@ffontaine Thanks for the reference to release-monitoring.org. I note it has an API which would make integration with cve-bin-tool relatively easy although it wouldn't work in offline mode unless we could mirror a local copy of the database.

ffontaine · 2023-03-15T18:31:41Z

Indeed, we're already using the API in buildroot to generate this web page: http://autobuild.buildroot.org/stats/master.html (the source code is here: https://git.buildroot.net/buildroot/tree/support/scripts/pkg-stats). I don't know if we can retrieve a local copy.

metabiswadeep · 2023-03-18T06:49:14Z

We will want to integrate this data into the other metadata discussed in GSoC 2023 Project idea: Improved product representation & meta-info about products. #2633 (assuming we get someone to work on that project via summer of code)

@terriko So in that project the metadata that needs to be added can be added in the checker files of their respective products using extra parameters like LICENSE_INFO=[""] defined in it?

terriko · 2023-03-20T16:52:06Z

We will want to integrate this data into the other metadata discussed in GSoC 2023 Project idea: Improved product representation & meta-info about products. #2633 (assuming we get someone to work on that project via summer of code)

@terriko So in that project the metadata that needs to be added can be added in the checker files of their respective products using extra parameters like LICENSE_INFO=[""] defined in it?

Maybe. There's an open question of whether this should actually go in the checker file itself or whether it should be a separate thing, and a proposal could go either way. I'll put some more thoughts directly in the gsoc issue.

terriko · 2023-03-27T20:12:03Z

slightly pedantic note: it appears that there is a CPE for beautifulsoup:

  "cpe": "cpe:2.3:a:leonard_richardson:beautifulsoup4:4.12.0:*:*:*:*:*:*:*",

Although my assertion about it not having one above may have been incorrect, the fact that we'll likely recognize some number of products that don't have them stands.

ffontaine · 2023-03-28T08:02:35Z

Where did you get this CPE? I didn't found it on cvedetails.com or nvd.nist.gov

terriko · 2023-03-28T21:16:41Z

Hm, maybe ti's just what's auto-generated by the sbom tool and not a real CPE id? It's still likely more correct than my previous entry but maybe we do need to annotate these better.

anthonyharrison · 2023-03-30T09:48:50Z

@terriko @ffontaine sbom4python autogenerates the CPE and PURL references based on the project metadata. This may not be correct but I do state this in the documentation, 'Whilst PURL and CPE references are automatically generated for each Python module, the accuracy of such references cannot be guaranteed as they are dependent on the validity of the data associated with the Python module.

terriko mentioned this issue Mar 21, 2023

GSoC 2023 Project idea: Improved product representation & meta-info about products. #2633

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add product synonyms #2819

Add product synonyms #2819

anthonyharrison commented Mar 14, 2023

terriko commented Mar 14, 2023

terriko commented Mar 14, 2023 •

edited

Loading

ffontaine commented Mar 15, 2023

anthonyharrison commented Mar 15, 2023

ffontaine commented Mar 15, 2023

metabiswadeep commented Mar 18, 2023 •

edited

Loading

terriko commented Mar 20, 2023

terriko commented Mar 27, 2023

ffontaine commented Mar 28, 2023

terriko commented Mar 28, 2023

anthonyharrison commented Mar 30, 2023

Add product synonyms #2819

Add product synonyms #2819

Comments

anthonyharrison commented Mar 14, 2023

terriko commented Mar 14, 2023

terriko commented Mar 14, 2023 • edited Loading

ffontaine commented Mar 15, 2023

anthonyharrison commented Mar 15, 2023

ffontaine commented Mar 15, 2023

metabiswadeep commented Mar 18, 2023 • edited Loading

terriko commented Mar 20, 2023

terriko commented Mar 27, 2023

ffontaine commented Mar 28, 2023

terriko commented Mar 28, 2023

anthonyharrison commented Mar 30, 2023

terriko commented Mar 14, 2023 •

edited

Loading

metabiswadeep commented Mar 18, 2023 •

edited

Loading