Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Improve License Clarity Scoring #2861

Closed
DennisClark opened this issue Feb 9, 2022 · 8 comments
Closed

RFC: Improve License Clarity Scoring #2861

DennisClark opened this issue Feb 9, 2022 · 8 comments

Comments

@DennisClark
Copy link
Member

DennisClark commented Feb 9, 2022

Replace the current license clarity scoring, which was originally associated with the Clearly Defined project, with the following structure:

Proposed Element Definitions

License Clarity

License Clarity is a set of criteria that indicate how clearly, comprehensively and accurately a software project has defined and communicated the licensing that applies to the project software. Note that this is not an indication of the license clarity of any software dependencies.

Score

The license clarity score is a value from 0-100 calculated by combining the weighted values determined for each of the scoring elements: Declared license, Identification precision, License texts, Declared copyright, Ambiguous compound licensing, and Conflicting license categories.

Declared license

When true, indicates that the software package licensing is documented at top-level or well-known locations in the software project, typically in a package manifest, NOTICE, LICENSE, COPYING or README file. Scoring Weight = 40.

Identification precision

Identification precision indicates how well the license statement(s) of the software identify known licenses that can be designated by precise keys (identifiers) as provided in a publicly available license list, such as the ScanCode LicenseDB, the SPDX license list, the OSI license list, or a URL pointing to a specific license text in a project or organization website, Scoring Weight = 40.

License texts

License texts are provided to support the declared license expression in files such as a package manifest, NOTICE, LICENSE, COPYING or README. Scoring Weight = 10.

Declared copyright

When true, indicates that the software package copyright is documented at top-level or well-known locations in the software project, typically in a package manifest, NOTICE, LICENSE, COPYING or README file. Scoring Weight = 10.

Ambiguous compound licensing

When true, indicates that the software has a license declaration that makes it difficult to construct a reliable license expression, such as in the case of multiple licenses where the conjunctive versus disjunctive relationship is not well defined. Scoring Weight = -10 (note negative weight).

Conflicting license categories

When true, indicates the declared license expression of the software is in the permissive category, but that other potentially conflicting categories, such as copyleft and proprietary, have been detected in lower level code. Scoring Weight = -20 (note negative weight).

@pombredanne pombredanne changed the title Improve License Clarity Scoring RFC: Improve License Clarity Scoring Feb 11, 2022
@pombredanne
Copy link
Member

@LeChasseur FYI, we would be interested in your take on this!

@pombredanne
Copy link
Member

pombredanne commented Feb 11, 2022

@DennisClark the only edits I made were to add tags for titles and bold

@LeChasseur
Copy link

The idea sounds good to me. If possible I wouldn't give the same weight to NOTICE, LICENSE, COPYING or README if more of one of these categories contains license informations. To my experience most projects use LICENSE or COPYING. The README contains sometimes additional information that isn't a declaration of the applicable license. I would give such information less weight if LICENSE or COPYING exists.
However, GNU projects seems to have a slightly different system to use COPYING-files (e.g. https://github.com/gcc-mirror/gcc). This needs consideration.
The term "NOTICE" comes from Apache-2.0. Mostly there are additional licenses and copyright notices if the main license is Apache-2.0.
In any case, it is very helpful for peopole doing license compliance work to have a suggestion which is the main license of the project.

JonoYang added a commit that referenced this issue Feb 17, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 18, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 19, 2022
    * Show boolean flags in scoring_elements to show what license judgement criteria was used

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 19, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
@JonoYang
Copy link
Member

JonoYang commented Feb 22, 2022

@DennisClark

I've began making modifications to the license clarity scoring, which I have created as a separate plugin for now. You can test out the new license clarity scoring by using the --license-clarity-score-2 and --classify option.

scancode -clip --classify --license-clarity-score-2 <package directory> --json-pp <json output location>

I had some questions about how the following criteria should be handled.

Ambiguous compound licensing

When true, indicates that the software has a license declaration that makes it difficult to construct a reliable license expression, such as in the case of multiple licenses where the conjunctive versus disjunctive relationship is not well defined. Scoring Weight = -10 (note negative weight).

  • Does this mean that in the case where we scan a package and get different license expressions from the key files, but there are no expressions that contain AND or OR, that we should modify the score by -10? I am assuming that if no license conjunction or disjunction is in the license expression, that no choice of license was detected from key files.
  • Do you know of any packages that would be a good test case for this?

Conflicting license categories

When true, indicates the declared license expression of the software is in the permissive category, but that other potentially conflicting categories, such as copyleft and proprietary, have been detected in lower level code. Scoring Weight = -20 (note negative weight).

  • I noticed in the previous license clarity scoring plugin that we checked for license consistency, where we check that the file-level licenses are consistent with the licenses detected in top-level key files. Should we do the same here or is it enough to check that the file-level licenses do not conflict with the top-level licenses, even if the file-level licenses are not stated at the top-level?
  • This is a more general licensing question, but is there a conflict where we detect copyleft-limited licensed files in a permissively licensed package? For example, a permissively licensed package that contains files licensed under gpl-2.0-plus WITH autoconf-simple-exception-2.0

JonoYang added a commit that referenced this issue Feb 22, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 22, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
@DennisClark
Copy link
Member Author

regarding "Ambiguous ..." If more than one license is mentioned in a paragraph of text, and there is no "OR" and no "AND", then we have ambiguity. I realize that we may need to refine that after some testing. The main culprit is the mention of a "dual" or "Dual" license without any qualifier to indicate if it is a choice or not.

regarding "Conflicting..." It is nearly impossible that all file-level licenses will get stated at the top level, so we don't want to evaluate that anymore. Yes, "it enough to check that the file-level licenses do not conflict with the top-level licenses, even if the file-level licenses are not stated at the top-level". It does not necessarily indicate a "problem" since the file-level license with conflicting licenses might be optionally deployed (tests, docs, etc.) but it does take away from license clarity. So the example in your question is a case of diminished clarity, but it is not necessarily a serious conflict.

JonoYang added a commit that referenced this issue Feb 23, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 23, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 23, 2022
    * Show boolean flags in scoring_elements to show what license judgement criteria was used

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 23, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 23, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 23, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 24, 2022
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 1, 2022
    * If a package has conflicting or ambigous licenses and the score is already zero, do not subtract from the score

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 1, 2022
    * The classify plugin was determining the types of key files by checking the start or end of file names to see if they are a special type of file. However, the code checked the full filename with extension. This would cause us to not classify certain key files properly.

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 1, 2022
    * Fix logic in check_
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 1, 2022
    * Fix logic in check_for_license_ambiguity
    * Removed unused test file

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 3, 2022
   * Add test for license ambiguity
   * Remove previous license clarity scoring plugin

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 4, 2022
   * Add test for license ambiguity
   * Remove previous license clarity scoring plugin

Signed-off-by: Jono Yang <jyang@nexb.com>
pombredanne pushed a commit that referenced this issue Mar 9, 2022
   * Add test for license ambiguity
   * Remove previous license clarity scoring plugin

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Mar 9, 2022
   * Add test for license ambiguity
   * Remove previous license clarity scoring plugin

Signed-off-by: Jono Yang <jyang@nexb.com>
pombredanne added a commit that referenced this issue Mar 11, 2022
Instead use a function that preserves original order.

Reference: #2842
Reference: #2861
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member

At this stage I think we are mostly feature complete with the code merged in develop and is ready to test and use there.
We will need to have through feedback and testing once we have a release out.

@pombredanne
Copy link
Member

Next step will be to deprecate the old license scoring

@JonoYang
Copy link
Member

The new license clarity scoring changes have been merged into develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants