Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: heuristic splitting on '-' for lookups #3839

Merged
merged 9 commits into from
Apr 3, 2024
Merged

Conversation

mastersans
Copy link
Member

@mastersans mastersans commented Feb 19, 2024

fixes #2956
fixes #2846

@mastersans mastersans marked this pull request as draft February 19, 2024 15:41
@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 92.85714% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 78.94%. Comparing base (d6cbe40) to head (58566ae).
Report is 132 commits behind head on main.

Files Patch % Lines
cve_bin_tool/sbom_manager/__init__.py 90.90% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3839      +/-   ##
==========================================
+ Coverage   75.41%   78.94%   +3.52%     
==========================================
  Files         808      818      +10     
  Lines       11983    12262     +279     
  Branches     1598     1442     -156     
==========================================
+ Hits         9037     9680     +643     
+ Misses       2593     2150     -443     
- Partials      353      432      +79     
Flag Coverage Δ
longtests ?
win-longtests 78.94% <92.85%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mastersans mastersans marked this pull request as ready for review February 19, 2024 17:01
@mastersans
Copy link
Member Author

Hey @terriko do we need anything else with this one other than tests, with initial review I'll get test written in no time.

Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good other than tests & docs, but you should ping @anthonyharrison for better code review since I'm kind of out of it right now.

(my other tab froze so this might post twice)

@mastersans
Copy link
Member Author

Hey @anthonyharrison could you take a look at this PR to review it, whenever you have a chance?

Copy link
Contributor

@anthonyharrison anthonyharrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the prefix will lose the context. So processing a product name such as python3-openssl and golang-openssl will result in a product called openssl which will result in any openssl vulnerabilities and not those related to the Python or Go implementation. I wonder if we should be converting the product name into a purl e.g. python3-openssl becomes pkg:pypi/openssl@version.

I think we need to do more tests to check that removing the prefixes results in improved matching of product/vendors. I suspect that each language/product may have different rules which need to be considered. Maybe have a regex might be a more flexible approach rather than just splitting on '-'.

It would be good if the supported prefixes could be in a data file so it is easy to add additional prefixes without changing the code.

If removing the prefix doesn't result in a match, I wouldn't report this as a warning (it will be very noisy). Switch to debug.

@mastersans
Copy link
Member Author

mastersans commented Mar 12, 2024

@anthonyharrison we can surely use this prefix data for purl generation, but i guess it would be useful when the gsoc project regarding the purl is completed, other then that i am not sure how would converting into purl would help as currently we ingest purl if present and get same name and version pair and then find the vendor of those, should i include purl generation here??.

regarding regex i have mainly taken reference from here, many other projects and there names listed under mapping section, "language-" was most common pattern, well i agree that there was some different once too but those were very few.

I'll shift these prefix in a json file and I have switched warning for debug..

Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so I like this code, but I think maybe we shouldn't make it active until we're prepared to handle de-duplication better because it's going to really raise our false positive rate.

@mastersans can you shove this into a common_prefix_split function or something and make unit-tests that call that directly, but not call it outside of tests for now? Then we could use it later for PURL generation and even try it as is once we've got appropriate de-duplication in place so we can manage the false positives before users see them, but you wouldn't be stuck with this open PR potentially going stale for a few months until we're ready.

@mastersans
Copy link
Member Author

@terriko I have made the necessary changes you suggested.

Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! I'm really hopeful that we'll be able to make good use of this soon.

@terriko
Copy link
Contributor

terriko commented Apr 3, 2024

Github won't let me merge for some reason; I'm going to update the branch and see if that resolves it or if I have to dismiss the other review or what.

@terriko terriko added the awaiting maintainer Need a maintainer to respond / help out label Apr 3, 2024
@terriko terriko dismissed anthonyharrison’s stale review April 3, 2024 18:52

Github won't let me merge without dismissing the review, even though the concerns have been addressed by making this a separate function (for now)

@terriko terriko merged commit 11ba018 into intel:main Apr 3, 2024
23 checks passed
@mastersans mastersans deleted the i2956 branch April 4, 2024 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting maintainer Need a maintainer to respond / help out
Projects
None yet
4 participants