Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluator.rules: Add handling for further license categories #77

Merged
merged 9 commits into from
Nov 17, 2022

Conversation

fviernau
Copy link
Member

See individual commits.

Context: #66.

@fviernau fviernau requested a review from a team as a code owner November 16, 2022 11:22

issue(
Severity.ERROR,
"The dependency ${pkg.metadata.id.toCoordinates()} is licensed under the ScanCode 'commercial' " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized too late that we're already doing this in other rules, but isn't it a bit risky to refer to ScanCode as a scanner here? This basically means users should not add commercial licenses from other scanners to the license-classifications.yml's commercial category, which creates an unnecessary tight coupling to the contents of evaluator.rules.kts.

Copy link
Member Author

@fviernau fviernau Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules refer to the category as defined in ScanCode. So, I do not see a problem with it, in particular since license-classifications.yml is generate from ScanCode license DB.

The error messages are consistent and follow the current approach. Changing that IMO is out of scope of this PR. We can discuss outside of this PR whether we want to ditch the mention of ScanCode from the error message, if this is what you're after.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO as long as classifications are generated from ScanCode it's good to mention it here as well to make clear where the classification is coming from.

@@ -43,11 +43,22 @@ val commercialLicenses = getLicensesForCategory("commercial")
val copyleftLicenses = getLicensesForCategory("copyleft")
val copyleftLimitedLicenses = getLicensesForCategory("copyleft-limited")
val freeRestrictedLicenses = getLicensesForCategory("free-restricted")
val genericLicenses = getLicensesForCategory("generic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, ScanCode 31.0.1 introduced an is_generic flag in its output (see e.g. here for LicenseRef-scancode-generic-cla). Maybe we should add support for such a flag in LicenseFinding, upgrade ScanCode (and the parser to populate that new field), and rely on that property instead of hard-coding license names.

Copy link
Member Author

@fviernau fviernau Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This category is not a scancode category but is created from the is_generic flag you mentioned. So, that is already taken into account.

I do not see how this can related to eliminating the below hard-coded license ID list.
Can you explain why you think this could be eliminated?

@@ -47,16 +47,22 @@ val genericLicenses = getLicensesForCategory("generic")
val permissiveLicenses = getLicensesForCategory("permissive")
val proprietaryFreeLicenses = getLicensesForCategory("proprietary-free")
val publicDomainLicenses = getLicensesForCategory("public-domain")
val unknownLicenses = getLicensesForCategory("unknown")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, we could add support for ScanCode's is_unknown to our data model and handle unknown findings generically.

Copy link
Member Author

@fviernau fviernau Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the "generic" category discussion. It is analog.

@sschuberth sschuberth requested a review from a team November 16, 2022 12:59
@fviernau fviernau requested a review from sschuberth November 16, 2022 13:21
@fviernau fviernau force-pushed the handling-of-further-categories branch 2 times, most recently from 4d8489a to a300116 Compare November 16, 2022 21:45
@fviernau fviernau requested a review from a team as a code owner November 16, 2022 21:45
@fviernau fviernau force-pushed the handling-of-further-categories branch from a300116 to 4ce040c Compare November 17, 2022 08:35
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Make the violation as severe as a copyleft violation.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
This makes the code more compact.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The handling for the 'unstated-licenses' category has been implemented,
so this comment has become obsolete.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@fviernau fviernau force-pushed the handling-of-further-categories branch from 4ce040c to 936c2ee Compare November 17, 2022 12:44

issue(
Severity.ERROR,
"The dependency ${pkg.metadata.id.toCoordinates()} is licensed under the ScanCode 'commercial' " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO as long as classifications are generated from ScanCode it's good to mention it here as well to make clear where the classification is coming from.

val permissiveLicenses = getLicensesForCategory("permissive")
val proprietaryFreeLicenses = getLicensesForCategory("proprietary-free")
val publicDomainLicenses = getLicensesForCategory("public-domain")
val unstatedLicenses = getLicensesForCategory("unstated-license")

// Set of licenses, which are not acted upon by the below policy rules.
val ignoredLicenses = listOf(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not the same as genericLicenses? After reading the commit message I would have expected that all generic licenses are ignored?

Copy link
Member Author

@fviernau fviernau Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set of ignored licenses is currently a real subset of the union of 'generic' and 'unknown'.
It contains only some of the generic licenses which I believe should not be investigated.

Which part of the commit message makes you believe that all generic licenses are ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't obvious why a subset was chosen because the explanation sounds like it applies to generic licenses in general. With the added comments it's clearer.

// 'unkown' category:
"LicenseRef-scancode-free-unknown",
"LicenseRef-scancode-license-file-reference",
"LicenseRef-scancode-see-license"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the question in the previous commit, what makes these unknown licenses special so that they shall be ignored but not others from the unknownLicenses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view it's waste of time to investigate these.
Consider e.g. LicenseRef-scancode-see-license flags "See license in LICENSE".

How is that actionable? Create a curation which turns the detected license into NONE because the LICENSE file is scanned anyway? ...No, that's waste of time IMO and does add value.

Copy link
Member Author

@fviernau fviernau Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed 'free-unkown' from the list after looking again at it's triggers.
Even though it flags on things like: license: "MIT"

That gives quite a bit of things to investigate , as the effort is always limited, investigating 'free-unknow' may not be the best place to spend it. Anyhow, let's not add it to the ignore list for now to move this PR forward.

The generic license category contains license IDs which do not represent
an exact license, but triggers which only indicate that the tooling has
found something which may be a license.

Generic licenses do require investigation, as (if the finding is valid)
a dedicated license ID needs to be created. As some of these triggers
match quite broadly huge amount of manual effort can be created.
Requiring the investigation of all generic triggers in fact only leads
to moving the focus from working on the most important things towards
working mostly on irrelevant details with the result of not getting
anything done anymore.

So, introduce a hard-coded list of `ignoredLicenses` containing license
IDs which should not be flagged, to avoid that problem.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Handle the 'unkown' licenses in the same way as the 'generic' licenses,
because the associated license IDs also do not represent real licenses
but just triggers which indicate that the tooling found something it
does not know but may be relevant.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@fviernau fviernau force-pushed the handling-of-further-categories branch from d726f69 to c2496af Compare November 17, 2022 14:23
@tsteenbe tsteenbe merged commit 6e5d940 into main Nov 17, 2022
@tsteenbe tsteenbe deleted the handling-of-further-categories branch November 17, 2022 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants