Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade ScanCode to version 32.1.0 #8470

Merged
merged 4 commits into from
Apr 2, 2024
Merged

Upgrade ScanCode to version 32.1.0 #8470

merged 4 commits into from
Apr 2, 2024

Conversation

sschuberth
Copy link
Member

Please have a look at the individual commit messages for the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message: Could you explain the change a bit more detailed? IIUC license findings that are references to license findings in other files are now ignored, because they already appear as findings for those other files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I've used your wording for the commit message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added another comment to the ScanCode issue, it could be considered here (and in the commit message) as well: aboutcode-org/scancode-toolkit#3648 (comment)

Copy link
Member Author

@sschuberth sschuberth Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that keeping findings that are in fact references is not currently supported by the ORT data model. That's because line ranges always have to refer to the file from the finding, not a references file. As such I consider (having an option for) keeping references to be out of scope of this PR as it would require larger refactorings.

Copy link
Member

@fviernau fviernau Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that keeping findings that are in fact references is not currently supported by the ORT data model. That's because line ranges always have to refer to the file from the finding, not a references file.

This indicates to me that you likely didn't get my point. Consider this:

  1. In older ScanCode versions we had License-scancode-unkown-license-reference findings
    with start / endLine pointing to the scanned file.
  2. In newer ScanCode version findings such as 1. do not appear, but they are replaced with the licenses
    detected in the referenced file.

This commit now filters out the findings. So, compared to older ScanCode version scan results the License-scancode-unkown-license-reference findings are missing. These missing findings can be represented in ORT's model, as the start / endLine corresponds to the actual file path. I think this is undesired and should be avoided.

Copy link
Member Author

@sschuberth sschuberth Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit now filters out the findings.

That's not correct, and I believe it's you who might be misunderstanding something.

When taking about "references" in this PR, this is not about findings that are named like License-scancode-unkown-license-reference. It is about literal entries in the raw ScanCode JSON, e.g. a LGPL-2.1-only finding in a file named COPYING, whereas the finding actually comes from file COPYING.LGPLv2.1 and the text in COPYING just refers to file COPYING.LGPLv2.1. Please read through #8190 again to get the full picture.

Copy link
Member Author

@sschuberth sschuberth Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I've asked for clarification over here.

See [1]. Resolves #8457.

[1]: https://github.com/nexB/scancode-toolkit/releases/tag/v32.1.0

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
See [1] for context and the test data used.

[1]: aboutcode-org/scancode-toolkit#3648

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
@sschuberth sschuberth force-pushed the scancode-32-1-0 branch 2 times, most recently from cbffdf2 to 38cc437 Compare April 2, 2024 14:45
…nces

License findings that are references to license findings in other files
are now ignored, because they already appear as findings for those other
files.

Fixes #8190.

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
Make use of a new field added with ScanCode 32.1.0 in output file format
3.1.0.

Note that in the per-file `detectedLicenseExpression(Spdx)` fields
ScanCode seems to consolidate e.g. separate `gpl-2.0` and `gpl-2.0-plus`
matches to just `gpl-2.0-plus`, which seems to be a bug in ScanCode.
This is why this change also needs to update a test where previous the
ScanCode key to SPDX ID mapping did not succeed if ScanCode was run
without `--license-references`.

Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
@sschuberth sschuberth merged commit b1de439 into main Apr 2, 2024
36 checks passed
@sschuberth sschuberth deleted the scancode-32-1-0 branch April 2, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants