-
-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dual-licenses identified as multi-licenses (>2); wrong recognition of copyright/holder #3797
Comments
Thanks for the detailed report! These are bugs alright.
You wrote:
We have a specific handler for Debian copyright files, both machine-readable or not. You should give it a try. This is when using the --package option. It knows about the specific structure of Debian copyright files. You also wrote:
There is an option Note about Debian:
|
The --filter-clues regression was introduced with major license data structures changes and this regen of test fixtures 6a91773#diff-4f79cdefc1686c77dd86c999ccba902ec305df1203c1e31ad613c1056a6162bb |
@AyanSinhaMahapatra I have a fix for #3797 (comment) |
@AyanSinhaMahapatra see f3f2c78 @Joerki this part is only fixing the redundant copyright detection and restore the --filter-clues option back |
* Detect odd name in copyright #3655 Reported-by: Anton Augsburg @vw-anton Reference: #3655 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Do not detect trailing Distributed in copyright #3735 Reported-by: Dimitris Iliou @dimitris-iliou Reference: #3735 Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve misc. copyright detections Spotted in some common python libraries such as numpy and scipy Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Add new script to generate copyright tests Use an input file where each line is either: - a URL to fetch - a text to test Then generate a test data files pair accordingly Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright detection - Start detecting "is held by" - Do not include some trailing junk Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Detect NN/EMAIL copyright combo #3764 Reference: #3764 Reported-by: Anton Augsburg @vw-anton Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Detect NN/EMAIL copyright combo #3764 Make detection of copyright with a single lowercase name more specific Reference: #3764 Reported-by: Anton Augsburg @vw-anton Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Align license with improved copyrights Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright detection of "distributed" Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Do not detect some words as NNP This makes copyright detection more specific Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright tests Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Detect OpenStreetMap correctly Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Add new copyright detection tests Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright detection side-effects Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Enable generation of copyright test file Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright debug tracing Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Detect new form of copyright Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Do not add arbitrary space around markup Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve handle of parens in copyright Also improve NOTICEs, and other misc. variants Don not detect "The Initial Developer" Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Correctly filter copyrights in licenses #3797 Reference: #3797 Reported-by: Jörg Arndt @Joerki Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright detection Handle corner cases with markup Detect new copyright forms. Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Rename README file Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright detection * Handle better various parens, markup and quotes Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Improve copyright detection Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Refine copyright detection Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Use latest commoncode Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Enable generation of copyright test data files Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * Do not regen demarkup tests Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> --------- Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Description
This software is copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
Terms of the Perl programming language system itself
a) the GNU General Public License as published by the Free
Software Foundation; either version 1, or (at your option) any
later version, or
b) the "Artistic License"
--- The GNU General Public License, Version 1, February 1989 ---
This software is Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org.
This is free software, licensed under:
The GNU General Public License, Version 1, February 1989
<...>
That's all there is to it!
--- The Artistic License 1.0 ---
This software is Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org.
<...>
The End
{
"path": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"type": "file",
"name": "LICENSE",
"base_name": "LICENSE",
"extension": "",
"size": 18412,
"date": "2023-09-19",
"sha1": "f12894289cb0f379f24b8d63e2e761dbcba1b216",
"md5": "97c2218f01bb60644ec141f8761067e5",
"sha256": "9837f05336ef3cbacb6a96e1672a0426d81ad01191f214b8d48e22ca62338181",
"mime_type": "text/plain",
"file_type": "ASCII text",
"programming_language": null,
"is_binary": false,
"is_text": true,
"is_archive": false,
"is_media": false,
"is_source": false,
"is_script": false,
"package_data": [],
"for_packages": [],
"detected_license_expression": "(gpl-1.0-plus OR artistic-1.0) AND gpl-1.0 AND artistic-1.0",
"detected_license_expression_spdx": "(GPL-1.0-or-later OR Artistic-1.0) AND GPL-1.0-only AND Artistic-1.0",
"license_detections": [
{
"license_expression": "(gpl-1.0-plus OR artistic-1.0) AND gpl-1.0 AND artistic-1.0",
"license_expression_spdx": "(GPL-1.0-or-later OR Artistic-1.0) AND GPL-1.0-only AND Artistic-1.0",
"matches": [
{
"license_expression": "gpl-1.0-plus OR artistic-1.0",
"spdx_license_expression": "GPL-1.0-or-later OR Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 3,
"end_line": 11,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 59,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0-plus_or_artistic-1.0_2.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-1.0-plus_or_artistic-1.0_2.RULE",
"matched_text": "This is free software; you can redistribute it and/or modify it under\nthe same terms as the Perl 5 programming language system itself.\n\nTerms of the Perl programming language system itself\n\na) the GNU General Public License as published by the Free\n Software Foundation; either version 1, or (at your option) any\n later version, or\nb) the "Artistic License""
},
{
"license_expression": "gpl-1.0",
"spdx_license_expression": "GPL-1.0-only",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 13,
"end_line": 13,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 9,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0_10.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-1.0_10.RULE",
"matched_text": "--- The GNU General Public License, Version 1, February 1989 ---"
},
{
"license_expression": "gpl-1.0",
"spdx_license_expression": "GPL-1.0-only",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 17,
"end_line": 19,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 15,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0_37.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-1.0_37.RULE",
"matched_text": "This is free software, licensed under:\n\n The GNU General Public License, Version 1, February 1989"
},
{
"license_expression": "gpl-1.0",
"spdx_license_expression": "GPL-1.0-only",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 21,
"end_line": 270,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 2039,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0.LICENSE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-1.0.LICENSE",
"matched_text": " <...>
},
{
"license_expression": "artistic-1.0",
"spdx_license_expression": "Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 273,
"end_line": 273,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 5,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "artistic-1.0_9.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-1.0_9.RULE",
"matched_text": "--- The Artistic License 1.0 ---"
},
{
"license_expression": "artistic-1.0",
"spdx_license_expression": "Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 277,
"end_line": 279,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 11,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "artistic-1.0_7.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-1.0_7.RULE",
"matched_text": "This is free software, licensed under:\n\n The Artistic License 1.0"
},
{
"license_expression": "artistic-1.0",
"spdx_license_expression": "Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 281,
"end_line": 378,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 761,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "artistic-1.0.SPDX.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-1.0.SPDX.RULE",
"matched_text": " The Artistic License\n\nPreamble <...>"
}
],
"identifier": "gpl_1_0_plus_or_artistic_1_0__and_gpl_1_0_and_artistic_1_0-b6665ce0-ba7e-787b-cb67-a365d0fe95da"
}
],
"license_clues": [],
"percentage_of_license_text": 98.67,
"copyrights": [
{
"copyright": "copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org",
"start_line": 1,
"end_line": 1
},
{
"copyright": "Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org",
"start_line": 15,
"end_line": 15
},
{
"copyright": "Copyright (c) 1989 Free Software Foundation, Inc.",
"start_line": 24,
"end_line": 24
},
{
"copyright": "copyrighted by the Free Software Foundation",
"start_line": 183,
"end_line": 184
},
{
"copyright": "Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org",
"start_line": 275,
"end_line": 275
}
],
"holders": [
{
"holder": "Mark Jason Dominus",
"start_line": 1,
"end_line": 1
},
{
"holder": "Mark Jason Dominus",
"start_line": 15,
"end_line": 15
},
{
"holder": "Free Software Foundation, Inc.",
"start_line": 24,
"end_line": 24
},
{
"holder": "the Free Software Foundation",
"start_line": 183,
"end_line": 184
},
{
"holder": "Mark Jason Dominus",
"start_line": 275,
"end_line": 275
}
],
"authors": [],
"emails": [
{
"email": "mjd@cpan.org",
"start_line": 1,
"end_line": 1
}
],
"urls": [
{
"url": "http://ftp.uu.net/",
"start_line": 326,
"end_line": 326
}
],
"files_count": 0,
"dirs_count": 0,
"size_count": 0,
"scan_errors": []
},
I started to use ScanCode to see where I have an alternative to identify Debian packages that are not machine-readable.
Here I see two issues.
First: A dual license coming with additional printed license text is not recognized in a single context.
Analysis result is:
"detected_license_expression": "(gpl-1.0-plus OR artistic-1.0) AND gpl-1.0 AND artistic-1.0",
ScanCode finds several license matches, but finally there should be just one match, the dual use of Artistic and GPL-1:
"detected_license_expression": "gpl-1.0-plus OR artistic-1.0",
It is not a multi-license!
This also happens when ScanCode analyses a machine-readable Debian copyright file (this file I handle differently without ScanCode).
Second:
There is a copyright notice for "Free Software Foundation". The text is misinterpreted, because here FSF refers to the license itself and not to the application.
How To Reproduce
I scanned the source code of openssl-3.0.11 package.
subdir=openssl-3.0.11
docker run --rm -v ${PWD}/:/project scancode-toolkit -clipeu --license-text --verbose --json-pp /project/scancode-${subdir}.json /project/${subdir}
I'm sad, since this project looks very promising (like other nexB projects, thanks for them, Philippe and others!).
I need to create an SBOM and attribution report to satisfy legal requirements. Correctness is crucial in that scope,
System configuration
What OS are you running on? (Windows/MacOS/Linux)
Linux
What version of scancode-toolkit was used to generate the scan file?
main branch (32.1.0)
What installation method was used to install/run scancode? (pip/source download/other)
Docker build from source
The text was updated successfully, but these errors were encountered: