-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: cpe validation and standardize tests data #4014
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4014 +/- ##
==========================================
+ Coverage 75.41% 80.45% +5.04%
==========================================
Files 808 820 +12
Lines 11983 12572 +589
Branches 1598 1950 +352
==========================================
+ Hits 9037 10115 +1078
+ Misses 2593 2038 -555
- Partials 353 419 +66
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
string_pattern = r"^(?P<scheme>.+):(?P<type>.+)/(?P<namespace>.+)/(?P<name>.+)@(?P<version>.+)\??(?P<qualifiers>.*)#?(?P<subpath>.*)$" | ||
|
||
elif string_type == "cpe23": | ||
string_pattern = r"^cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?\!\"#\$%&'\(\)\+,\-\.\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?\!\"#\$%&'\(\)\+,\-\.\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){4}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love a ridiculous regex, but I think this is going to be hard to maintain and it's almost certainly going to be vulnerable to regex denial of service.
Options:
- does anyone have a library for validating CPE strings we could leverage?
- In lieu of this, can we split on
:
and evaluate each piece separately with an explanation of what we're looking for in the comments? - In lieu of this, can we split on
:
and only evaluate the parts we're going to use later? (e.g.vendor, product, version = cpe[2], cpe[3], cpe[4]
) - In lieu of this, maybe we should validate and sanitize only
vendor, product, version
after the existing split?
I'm leaning towards the last one as potentially the right solution since that would allow us to have some util functions for sanitize_vendor()
, sanitize_product()
, sanitize_version(
) that we could re-use elsewhere in purl and triage work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, re-thinking my strategy here because I''m hoping to do a 3.3rc3 pre-release this afternoon.
I'm going to merge this as is, since it moves us forwards and gives us some validation (which is better than no validation!) as part of the pre-release, and I'll file my comments as a new issue.
fixes: #4013
References:
cpe2.3 and cpe2.3 schema