Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate ExternalPackageRef #439

Conversation

armintaenzertng
Copy link
Collaborator

fixes #373

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Copy link
Collaborator

@meretp meretp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! I didn't look at each testcase and regex in detail, but overall this looks very good to me. Only two remarks from my side.
I'm open to discuss if a dictionary approach is suitable here but as I have another use case with the rdf writer, I think it makes sense to use such a mapping.

Comment on lines 107 to 115
CPE22TYPE_REGEX = r'^c[pP][eE]:/[AHOaho]?(:[A-Za-z0-9._\-~%]*){0,6}$'
CPE23TYPE_REGEX = r'^cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&\'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&\'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){4}$'
MAVEN_CENTRAL_REGEX = r'^[^:]+:[^:]+(:[^:]+)?$'
NPM_REGEX = r'^[^@]+@[^@]+$'
NUGET_REGEX = r'^[^/]+/[^/]+$'
BOWER_REGEX = r'^[^#]+#[^#]+$'
PURL_REGEX = r'^pkg:.+(\/.+)?\/.+(@.+)?(\?.+)?(#.+)?$'
SWH_REGEX = r'^swh:1:(snp|rel|rev|dir|cnt):[0-9a-fA-F]{40}$'
GITOID_REGEX = r'^gitoid:(blob|tree|commit|tag):(sha1:[0-9a-fA-F]{40}|sha256:[0-9a-fA-F]{64})$'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would import these regex statements from external_package_ref_validator.py to prevent possible mismatch in the future if one of the regex needs changes and simply to avoid duplications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 71 to 78
if reference_type == "npm":
return validate_against_regex(locator, NPM_REGEX, "npm", context)
if reference_type == "nuget":
return validate_against_regex(locator, NUGET_REGEX, "nuget", context)
if reference_type == "bower":
return validate_against_regex(locator, BOWER_REGEX, "bower", context)
if reference_type == "purl":
return validate_against_regex(locator, PURL_REGEX, "purl", context)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of code duplication. We could use a dictionary instead to map ExternalPackageRefCategory to a set of valid types and map the regex expressions to the corresponding type.
As I am currently working on the rdf writer and there I need to check if the specified type is listed (so one of cpe22Type, cpe23Type, swid and so on) or not, I would need such a mapping as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, the code is now much more concise. Please have a look if you can work with this in rdf now, too! :)

…fining them

Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Signed-off-by: Armin Tänzer <armin.taenzer@tngtech.com>
Copy link
Collaborator

@meretp meretp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me and yes, I can use it now, thanks a lot! 😊

@armintaenzertng armintaenzertng merged commit 6b3899f into spdx:refactor-python-tools Jan 25, 2023
@armintaenzertng armintaenzertng deleted the validateExternalPackageRef branch January 25, 2023 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants