Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dual-licenses identified as multi-licenses (>2); wrong recognition of copyright/holder #3797

Open
Joerki opened this issue Jun 7, 2024 · 4 comments
Labels

Comments

@Joerki
Copy link

Joerki commented Jun 7, 2024

Description

This software is copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

Terms of the Perl programming language system itself

a) the GNU General Public License as published by the Free
Software Foundation; either version 1, or (at your option) any
later version, or
b) the "Artistic License"

--- The GNU General Public License, Version 1, February 1989 ---

This software is Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org.

This is free software, licensed under:

The GNU General Public License, Version 1, February 1989

                GNU GENERAL PUBLIC LICENSE
                 Version 1, February 1989

<...>

That's all there is to it!

--- The Artistic License 1.0 ---

This software is Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org.

<...>

The End

{
"path": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"type": "file",
"name": "LICENSE",
"base_name": "LICENSE",
"extension": "",
"size": 18412,
"date": "2023-09-19",
"sha1": "f12894289cb0f379f24b8d63e2e761dbcba1b216",
"md5": "97c2218f01bb60644ec141f8761067e5",
"sha256": "9837f05336ef3cbacb6a96e1672a0426d81ad01191f214b8d48e22ca62338181",
"mime_type": "text/plain",
"file_type": "ASCII text",
"programming_language": null,
"is_binary": false,
"is_text": true,
"is_archive": false,
"is_media": false,
"is_source": false,
"is_script": false,
"package_data": [],
"for_packages": [],
"detected_license_expression": "(gpl-1.0-plus OR artistic-1.0) AND gpl-1.0 AND artistic-1.0",
"detected_license_expression_spdx": "(GPL-1.0-or-later OR Artistic-1.0) AND GPL-1.0-only AND Artistic-1.0",
"license_detections": [
{
"license_expression": "(gpl-1.0-plus OR artistic-1.0) AND gpl-1.0 AND artistic-1.0",
"license_expression_spdx": "(GPL-1.0-or-later OR Artistic-1.0) AND GPL-1.0-only AND Artistic-1.0",
"matches": [
{
"license_expression": "gpl-1.0-plus OR artistic-1.0",
"spdx_license_expression": "GPL-1.0-or-later OR Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 3,
"end_line": 11,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 59,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0-plus_or_artistic-1.0_2.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-1.0-plus_or_artistic-1.0_2.RULE",
"matched_text": "This is free software; you can redistribute it and/or modify it under\nthe same terms as the Perl 5 programming language system itself.\n\nTerms of the Perl programming language system itself\n\na) the GNU General Public License as published by the Free\n Software Foundation; either version 1, or (at your option) any\n later version, or\nb) the "Artistic License""
},
{
"license_expression": "gpl-1.0",
"spdx_license_expression": "GPL-1.0-only",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 13,
"end_line": 13,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 9,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0_10.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-1.0_10.RULE",
"matched_text": "--- The GNU General Public License, Version 1, February 1989 ---"
},
{
"license_expression": "gpl-1.0",
"spdx_license_expression": "GPL-1.0-only",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 17,
"end_line": 19,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 15,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0_37.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-1.0_37.RULE",
"matched_text": "This is free software, licensed under:\n\n The GNU General Public License, Version 1, February 1989"
},
{
"license_expression": "gpl-1.0",
"spdx_license_expression": "GPL-1.0-only",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 21,
"end_line": 270,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 2039,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "gpl-1.0.LICENSE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/gpl-1.0.LICENSE",
"matched_text": " <...>
},
{
"license_expression": "artistic-1.0",
"spdx_license_expression": "Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 273,
"end_line": 273,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 5,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "artistic-1.0_9.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-1.0_9.RULE",
"matched_text": "--- The Artistic License 1.0 ---"
},
{
"license_expression": "artistic-1.0",
"spdx_license_expression": "Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 277,
"end_line": 279,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 11,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "artistic-1.0_7.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-1.0_7.RULE",
"matched_text": "This is free software, licensed under:\n\n The Artistic License 1.0"
},
{
"license_expression": "artistic-1.0",
"spdx_license_expression": "Artistic-1.0",
"from_file": "openssl-3.0.11/external/perl/Text-Template-1.56/LICENSE",
"start_line": 281,
"end_line": 378,
"matcher": "2-aho",
"score": 100.0,
"matched_length": 761,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "artistic-1.0.SPDX.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/artistic-1.0.SPDX.RULE",
"matched_text": " The Artistic License\n\nPreamble <...>"
}
],
"identifier": "gpl_1_0_plus_or_artistic_1_0__and_gpl_1_0_and_artistic_1_0-b6665ce0-ba7e-787b-cb67-a365d0fe95da"
}
],
"license_clues": [],
"percentage_of_license_text": 98.67,
"copyrights": [
{
"copyright": "copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org",
"start_line": 1,
"end_line": 1
},
{
"copyright": "Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org",
"start_line": 15,
"end_line": 15
},
{
"copyright": "Copyright (c) 1989 Free Software Foundation, Inc.",
"start_line": 24,
"end_line": 24
},
{
"copyright": "copyrighted by the Free Software Foundation",
"start_line": 183,
"end_line": 184
},
{
"copyright": "Copyright (c) 2013 by Mark Jason Dominus mjd@cpan.org",
"start_line": 275,
"end_line": 275
}
],
"holders": [
{
"holder": "Mark Jason Dominus",
"start_line": 1,
"end_line": 1
},
{
"holder": "Mark Jason Dominus",
"start_line": 15,
"end_line": 15
},
{
"holder": "Free Software Foundation, Inc.",
"start_line": 24,
"end_line": 24
},
{
"holder": "the Free Software Foundation",
"start_line": 183,
"end_line": 184
},
{
"holder": "Mark Jason Dominus",
"start_line": 275,
"end_line": 275
}
],
"authors": [],
"emails": [
{
"email": "mjd@cpan.org",
"start_line": 1,
"end_line": 1
}
],
"urls": [
{
"url": "http://ftp.uu.net/",
"start_line": 326,
"end_line": 326
}
],
"files_count": 0,
"dirs_count": 0,
"size_count": 0,
"scan_errors": []
},

I started to use ScanCode to see where I have an alternative to identify Debian packages that are not machine-readable.

Here I see two issues.

First: A dual license coming with additional printed license text is not recognized in a single context.

Analysis result is:
"detected_license_expression": "(gpl-1.0-plus OR artistic-1.0) AND gpl-1.0 AND artistic-1.0",

ScanCode finds several license matches, but finally there should be just one match, the dual use of Artistic and GPL-1:

"detected_license_expression": "gpl-1.0-plus OR artistic-1.0",

It is not a multi-license!

This also happens when ScanCode analyses a machine-readable Debian copyright file (this file I handle differently without ScanCode).

Second:
There is a copyright notice for "Free Software Foundation". The text is misinterpreted, because here FSF refers to the license itself and not to the application.

How To Reproduce

I scanned the source code of openssl-3.0.11 package.

subdir=openssl-3.0.11

docker run --rm -v ${PWD}/:/project scancode-toolkit -clipeu --license-text --verbose --json-pp /project/scancode-${subdir}.json /project/${subdir}

I'm sad, since this project looks very promising (like other nexB projects, thanks for them, Philippe and others!).
I need to create an SBOM and attribution report to satisfy legal requirements. Correctness is crucial in that scope,

System configuration

  • What OS are you running on? (Windows/MacOS/Linux)
    Linux

  • What version of scancode-toolkit was used to generate the scan file?
    main branch (32.1.0)

  • What installation method was used to install/run scancode? (pip/source download/other)
    Docker build from source

@Joerki Joerki added the bug label Jun 7, 2024
@Joerki Joerki changed the title Dual-licenses identified as dual/multi-licenses; wrong recognition of copyright/holder Dual-licenses identified as multi-licenses (>2); wrong recognition of copyright/holder Jun 7, 2024
@pombredanne
Copy link
Member

Thanks for the detailed report! These are bugs alright.

You wrote:

I started to use ScanCode to see where I have an alternative to identify Debian packages that are not machine-readable.
....
This also happens when ScanCode analyses a machine-readable Debian copyright file (this file I handle differently without ScanCode).

We have a specific handler for Debian copyright files, both machine-readable or not. You should give it a try. This is when using the --package option. It knows about the specific structure of Debian copyright files.

You also wrote:

Second:
There is a copyright notice for "Free Software Foundation". The text is misinterpreted, because here FSF refers to the license itself and not to the application.

There is an option --filter-clues to remove reporting copyright that are in the license text (like here "Free Software Foundation")
But this is not working in this case.... this is a bug!

Note about Debian:

  • if you are into Debian licensing there is an active IRC channel #licenses on Debian's OFTC IRC server.
  • ScanCode can write Debian machine-readable copyright files with this option:
   --debian FILE           Write scan output in machine-readable Debian
                            copyright format to FILE.

@pombredanne
Copy link
Member

The --filter-clues regression was introduced with major license data structures changes and this regen of test fixtures 6a91773#diff-4f79cdefc1686c77dd86c999ccba902ec305df1203c1e31ad613c1056a6162bb

@pombredanne
Copy link
Member

@AyanSinhaMahapatra I have a fix for #3797 (comment)

pombredanne added a commit that referenced this issue Jun 7, 2024
Reference: #3797
Reported-by: Jörg Arndt @Joerki
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member

@AyanSinhaMahapatra see f3f2c78
FWIW, I think we were trigger happy in pushing updated test expectation when we did the major license detection data structure change .

@Joerki this part is only fixing the redundant copyright detection and restore the --filter-clues option back

AyanSinhaMahapatra added a commit that referenced this issue Jun 26, 2024
* Detect odd name in copyright #3655

Reported-by: Anton Augsburg @vw-anton
Reference: #3655
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Do not detect trailing Distributed in copyright #3735

Reported-by:  Dimitris Iliou @dimitris-iliou
Reference: #3735
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve misc. copyright detections

Spotted in some common python libraries such as numpy and scipy

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Add new script to generate copyright tests

Use an input file where each line is either:
- a URL to fetch
- a text to test

Then generate a test data files pair accordingly

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright detection

- Start detecting "is held by"
- Do not include some trailing junk

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Detect NN/EMAIL copyright combo #3764

Reference: #3764
Reported-by: Anton Augsburg @vw-anton
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Detect NN/EMAIL copyright combo #3764

Make detection of copyright with a single lowercase name more specific

Reference: #3764
Reported-by: Anton Augsburg @vw-anton
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Align license with improved copyrights

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright detection of "distributed"

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Do not detect some words as NNP

This makes copyright detection more specific

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright tests

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Detect OpenStreetMap correctly

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Add new copyright detection tests

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright detection side-effects

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Enable generation of copyright test file

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright debug tracing

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Detect new form of copyright

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Do not add arbitrary space around markup

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve handle of parens in copyright

Also improve NOTICEs, and other misc. variants
Don not detect "The Initial Developer"

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Correctly filter copyrights in licenses #3797

Reference: #3797
Reported-by: Jörg Arndt @Joerki
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright detection

Handle corner cases with markup
Detect new copyright forms.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Rename README file

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright detection

* Handle better various parens, markup and quotes

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Improve copyright detection

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Refine copyright detection

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Use latest commoncode

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Enable generation of copyright test data files

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

* Do not regen demarkup tests

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>

Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>

---------

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants