-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: heuristic splitting on '-' for lookups #3839
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3839 +/- ##
==========================================
+ Coverage 75.41% 78.94% +3.52%
==========================================
Files 808 818 +10
Lines 11983 12262 +279
Branches 1598 1442 -156
==========================================
+ Hits 9037 9680 +643
+ Misses 2593 2150 -443
- Partials 353 432 +79
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hey @terriko do we need anything else with this one other than tests, with initial review I'll get test written in no time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good other than tests & docs, but you should ping @anthonyharrison for better code review since I'm kind of out of it right now.
(my other tab froze so this might post twice)
Hey @anthonyharrison could you take a look at this PR to review it, whenever you have a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the prefix will lose the context. So processing a product name such as python3-openssl and golang-openssl will result in a product called openssl which will result in any openssl vulnerabilities and not those related to the Python or Go implementation. I wonder if we should be converting the product name into a purl e.g. python3-openssl becomes pkg:pypi/openssl@version.
I think we need to do more tests to check that removing the prefixes results in improved matching of product/vendors. I suspect that each language/product may have different rules which need to be considered. Maybe have a regex might be a more flexible approach rather than just splitting on '-'.
It would be good if the supported prefixes could be in a data file so it is easy to add additional prefixes without changing the code.
If removing the prefix doesn't result in a match, I wouldn't report this as a warning (it will be very noisy). Switch to debug.
@anthonyharrison we can surely use this prefix data for purl generation, but i guess it would be useful when the gsoc project regarding the purl is completed, other then that i am not sure how would converting into purl would help as currently we ingest purl if present and get same name and version pair and then find the vendor of those, should i include purl generation here??. regarding regex i have mainly taken reference from here, many other projects and there names listed under mapping section, "language-" was most common pattern, well i agree that there was some different once too but those were very few. I'll shift these prefix in a json file and I have switched warning for debug.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so I like this code, but I think maybe we shouldn't make it active until we're prepared to handle de-duplication better because it's going to really raise our false positive rate.
@mastersans can you shove this into a common_prefix_split
function or something and make unit-tests that call that directly, but not call it outside of tests for now? Then we could use it later for PURL generation and even try it as is once we've got appropriate de-duplication in place so we can manage the false positives before users see them, but you wouldn't be stuck with this open PR potentially going stale for a few months until we're ready.
@terriko I have made the necessary changes you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! I'm really hopeful that we'll be able to make good use of this soon.
Github won't let me merge for some reason; I'm going to update the branch and see if that resolves it or if I have to dismiss the other review or what. |
Github won't let me merge without dismissing the review, even though the concerns have been addressed by making this a separate function (for now)
fixes #2956
fixes #2846