-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(license): improve license normalization #7131
feat(license): improve license normalization #7131
Conversation
- Remove "THE " prefix - Remove "LICENSE " suffix - Fix lowercase key - Add BSD LICENSE 3 - Add ECLIPSE PUBLIC LICENSE 1.0 - Add MIT-0 - Add tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @pbaumard
Thanks for your work!
LGTM.
Can you fix tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @pbaumard
Sorry for delay.
Left comments. Take a look when you have time.
pkg/licensing/normalize.go
Outdated
var mapping = make(map[string]expression.SimpleExpr) | ||
|
||
func addMap(name, key string, hasPlus bool) { | ||
license := normalizeKeyAndSuffix(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC licenses are already normalized.
Do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The license
from normalizeKeyAndSuffix
is only used for the following assertion with the panic
error.
I added some comemnts to make it more explicit.
The check could be made in unit test but it means having mapping
public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand your idea - you want to check the newly added licenses (from another PR later).
I think we don't need to do this check every time we start Trivy.
The test can be done in a unit test, but this means that the mapping will be public.
You have added so many licenses, so I am not sure that many licenses will be added later.
Maybe a comment before mapping
to the instructions for the new licenses will be enough. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially with latest commit with the version regular expression, it's becoming quite difficult to make sure to only add standardized keys.
I added an InvalidMappingKeys
used in test and which might also be used later if adding new mappings from CLI or other mean becomes possible.
pkg/sbom/cyclonedx/marshal.go
Outdated
@@ -288,7 +290,7 @@ func (*Marshaler) Licenses(licenses []string) *cdx.Licenses { | |||
choices := lo.Map(licenses, func(license string, i int) cdx.LicenseChoice { | |||
return cdx.LicenseChoice{ | |||
License: &cdx.License{ | |||
Name: license, | |||
Name: NormalizeLicense(license), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this function here?
trivy/pkg/licensing/normalize.go
Lines 735 to 737 in 907683d
func Normalize(name string) string { | |
return NormalizeLicense(name).String() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the CycloneDX normalization from this PR to reduce the scope.
From cycloneDX spec that would ideally mean having :
- Id: "A valid SPDX license ID"
- Name: "If SPDX does not define the license used, this field may be used to provide the license name",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not check all fields for OS packages:
trivy/pkg/fanal/test/integration/library_test.go
Lines 321 to 324 in 051ac39
for i := 0; i < len(expectedPkgs); i++ { | |
require.Equal(t, expectedPkgs[i].Name, detail.Packages[i].Name, tc.name) | |
require.Equal(t, expectedPkgs[i].Version, detail.Packages[i].Version, tc.name) | |
} |
Therefore, there is no point in updating the OS packages for the files pkg/fanal/test/integration/testdata/goldens/packages/*.json.golden
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I confused you a little.
containerd
tests use full files.
I have updated these golden files in 39f796e.
Normalized SPDX license should be in Id field.
pkg/licensing/normalize.go
Outdated
var mapping = make(map[string]expression.SimpleExpr) | ||
|
||
func addMap(name, key string, hasPlus bool) { | ||
license := normalizeKeyAndSuffix(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand your idea - you want to check the newly added licenses (from another PR later).
I think we don't need to do this check every time we start Trivy.
The test can be done in a unit test, but this means that the mapping will be public.
You have added so many licenses, so I am not sure that many licenses will be added later.
Maybe a comment before mapping
to the instructions for the new licenses will be enough. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I confused you a little.
containerd
tests use full files.
I have updated these golden files in 39f796e.
}, | ||
"Version": "1.6.3-r0", | ||
"Arch": "x86_64", | ||
"SrcName": "apr", | ||
"SrcVersion": "1.6.3-r0", | ||
"Licenses": [ | ||
"ASL2.0" | ||
"Apache-2.02.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbaumard can you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest commit.
pkg/licensing/normalize.go
Outdated
func addMap(name, key string, hasPlus bool) { | ||
mapping[name] = expression.SimpleExpr{License: key, HasPlus: hasPlus} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we don't need this function anymore and we can just put everything in the map right away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest commit, even if I am not sure it is more readable that way.
pkg/licensing/normalize.go
Outdated
var versionRegexpString = "([A-UW-Z)]{2,})( LICENSE)?\\s*[,(-]?\\s*(V|V\\.|VERSION|VERSION-|-)?\\s*([1-9](\\.\\d)*)[)]?" | ||
|
||
// case insensitive version match anywhere in string | ||
var versionRegexp = regexp.MustCompile("(?i)" + versionRegexpString) | ||
|
||
// version suffix match | ||
var versionSuffixRegexp = regexp.MustCompile(versionRegexpString + "$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand where we need this.
Can you add an example in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to avoid using regex if possible.
Maybe we can avoid using this regex here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The many version variations are in normalize_test.
Most of the mappings in current trivy version or in oss-review-toolkit/ort are because of slight variations in the way the version is declared in the license.
So this regexp allows to:
- greatly limit the number of mappings
- avoid missing version mappings
This regexp is strict by checking only version suffixes.
pkg/licensing/normalize_test.go
Outdated
@@ -8,6 +8,219 @@ import ( | |||
"github.com/aquasecurity/trivy/pkg/licensing" | |||
) | |||
|
|||
func TestMap(t *testing.T) { | |||
assert.Empty(t, licensing.InvalidMappingKeys((nil))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.Empty(t, licensing.InvalidMappingKeys((nil))) | |
assert.Empty(t, licensing.InvalidMappingKeys(nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidMappingKeys no more used in latest commit.
@@ -453,3 +450,23 @@ func TestParseApkInfo(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
func TestParseLicense(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a redundant test.
We see this function working in TestParseApkInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestLaxSplitLicense was moved to normalize_test in latest commit.
pkg/licensing/normalize.go
Outdated
func InvalidMappingKeys(licenseToNormalized map[string]expression.SimpleExpr) []string { | ||
if licenseToNormalized == nil { | ||
licenseToNormalized = mapping | ||
} | ||
var invalid []string | ||
for key := range licenseToNormalized { | ||
standardized := standardizeKeyAndSuffix(key) | ||
if standardized.License != key { | ||
invalid = append(invalid, key) | ||
} | ||
} | ||
return invalid | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we move this function to the test.
To get mapping
, we can create a function like GetBuiltinRules() for secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalize_test is now in package licensing to use internal mapping and functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbaumard Thanks for work!
And sorry for delay.
LGTM.
Left 1 small comment.
pkg/licensing/normalize_test.go
Outdated
[]string{ | ||
"Apache+", | ||
}, | ||
"Apache-2.0+", | ||
"Apache-2.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add field names for better readability in these tests?
e.g.:
[]string{ | |
"Apache+", | |
}, | |
"Apache-2.0+", | |
"Apache-2.0", | |
licenses: []string{ | |
"Apache+", | |
}, | |
normalized: "Apache-2.0+", | |
normalizedKey: "Apache-2.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field names have been added in last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: knqyf263 <knqyf263@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an awesome contribution! I planned to add this kind of improvement but didn't find the time. Thanks for your great work and patience.
Description
Related issues
Checklist