Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional testing #14

Open
goneall opened this issue Mar 11, 2021 · 5 comments
Open

Additional testing #14

goneall opened this issue Mar 11, 2021 · 5 comments

Comments

@goneall
Copy link
Member

goneall commented Mar 11, 2021

I would like to use this code to replace the Java license matching used in the SPDX online tools.

Before making that change, I would like to test all of the SPDX listed licenses.

This can be done by downloaded all of the license templates from the License List Data templates repo and downloading text files from the License List XML test files repo.

If we do license compares against all the files in the test files and it matches all the templates from the templates directory that would demonstrate we have no false negatives.

We can also test for false positives by finding any matches against more than one template. There are some expected duplicates which are all documented in the License List XML expected-warnings file.

Note that to keep the test and template files consistent, you should download the same tagged version (e.g. v3.12 used in the above links).

@rtgdk
Copy link
Collaborator

rtgdk commented Mar 21, 2021

@goneall Great initiative! Please let me know if I can be of any help with Python. I can write a python script utility to download specific license templates/xml file repo and run the tool on them. Maybe having this utility as part of test utils will help in generating more future tests like for v3.13 or v2.x.

@goneall
Copy link
Member Author

goneall commented Mar 21, 2021

I can write a python script utility to download specific license templates/xml file repo and run the tool on them.

@rtgdk that would be a great help if you could write such a utility

@sanjibansg
Copy link

sanjibansg commented Mar 27, 2021

I would like to contribute to this issue. Is this open currently? @rtgdk @goneall

@goneall
Copy link
Member Author

goneall commented Mar 27, 2021

@Kahanikaar Please feel free to contribute to this issue - thanks

@stefan6419846
Copy link

stefan6419846 commented Jun 5, 2023

Given that https://github.com/m1kit/yalm-resources/blob/874bc2162b3edab7fabb6ed0a76e97dc7c828530/meta.json declares to use the license list in version 3.14, I just did some quick testing and observed a success rate of 285/515 (55.34 %) for the exact target matches.

Further observations:

  • ImageMagick license matching seems to always timeout in yalm.licenses.SpdxLicense._test_regex for the (correct) regex file ImageMagick - even with 20 workers and a timeout of 180 minutes (which I consider rather large values for matching purposes).
  • 199 cases do return None during matching (probably not due to timeout).
  • 16 licenses start with depreciate_, while the results miss this prefix. Ignoring this prefix during comparison will slightly increase the success rate to 289/515 (56.12 %) for the exact target matches.

Testing code:

import json
from collections import defaultdict
from importlib import resources
from pathlib import Path

from yalm import detect_license, resources as data


duplicates = json.loads(resources.read_text(data, 'expected-duplicates.json'))
duplicate_mapping = defaultdict(set)
for entry in duplicates:
    duplicate_mapping[entry['from']].add(entry['to'])
duplicate_mapping = dict(duplicate_mapping)


correct, total, result_is_none = 0, 0, 0

for path in sorted(
        Path('license-list-XML-3.14', 'test', 'simpleTestForGenerator').glob('*.txt'),
        key=lambda x: x.name.lower()
):
    expected = path.stem
    if expected.startswith('depreciate_'):
        expected = expected[11:]
    result = detect_license(text=path.read_text(), timeout=60, num_workers=20)
    actual = result if not result else result.template.id
    if expected == actual or actual in duplicate_mapping.get(expected, set()):
        correct += 1
        print('✔', actual)
    else:
        print('✗', expected, actual)
    if actual is None:
        result_is_none += 1
    total += 1


print(f'{correct}/{total} ({correct / total:.2%}) detected correctly.')
# print(result_is_none)

Complete results:

✔ 0BSD
✗ 389-exception None
✔ AAL
✔ Abstyles
✔ Adobe-2006
✔ Adobe-Glyph
✔ ADSL
✔ AFL-1.1
✔ AFL-1.2
✔ AFL-2.0
✔ AFL-2.1
✗ AFL-3.0 None
✔ Afmparse
✗ AGPL-1.0-only None
✗ AGPL-1.0-or-later None
✗ AGPL-1.0 None
✗ AGPL-3.0-only None
✗ AGPL-3.0-or-later None
✗ AGPL-3.0 None
✔ Aladdin
✗ AMDPLPA None
✔ AML
✔ AMPAS
✗ ANTLR-PD-fallback None
✔ ANTLR-PD
✗ Apache-1.0 None
✗ Apache-1.1 None
✔ Apache-2.0
✔ APAFML
✗ APL-1.0 None
✗ APSL-1.0 None
✗ APSL-1.1 None
✗ APSL-1.2 None
✗ APSL-2.0 None
✔ Artistic-1.0-cl8
✔ Artistic-1.0-Perl
✔ Artistic-1.0
✔ Artistic-2.0
✗ Autoconf-exception-2.0 None
✗ Autoconf-exception-3.0 None
✔ Bahyph
✔ Barr
✔ Beerware
✗ Bison-exception-2.2 GPL-2.0-with-bison-exception
✗ BitTorrent-1.0 None
✗ BitTorrent-1.1 None
✔ blessing
✔ BlueOak-1.0.0
✗ Bootloader-exception None
✔ Borceux
✔ BSD-1-Clause
✔ BSD-2-Clause-FreeBSD
✗ BSD-2-Clause-NetBSD BSD-2-Clause
✔ BSD-2-Clause-Patent
✔ BSD-2-Clause-Views
✔ BSD-2-Clause
✗ BSD-3-Clause-Attribution None
✔ BSD-3-Clause-Clear
✔ BSD-3-Clause-LBNL
✔ BSD-3-Clause-Modification
✔ BSD-3-Clause-No-Military-License
✔ BSD-3-Clause-No-Nuclear-License-2014
✔ BSD-3-Clause-No-Nuclear-License
✔ BSD-3-Clause-No-Nuclear-Warranty
✔ BSD-3-Clause-Open-MPI
✔ BSD-3-Clause
✔ BSD-4-Clause-Shortened
✗ BSD-4-Clause-UC BSD-4-Clause
✔ BSD-4-Clause
✔ BSD-Protection
✔ BSD-Source-Code
✔ BSL-1.0
✗ BUSL-1.1 None
✗ bzip2-1.0.5 None
✗ bzip2-1.0.6 None
✗ C-UDA-1.0 None
✗ CAL-1.0-Combined-Work-Exception None
✗ CAL-1.0 None
✔ Caldera
✔ CATOSL-1.1
✔ CC-BY-1.0
✔ CC-BY-2.0
✗ CC-BY-2.5-AU None
✔ CC-BY-2.5
✔ CC-BY-3.0-AT
✔ CC-BY-3.0-DE
✔ CC-BY-3.0-NL
✔ CC-BY-3.0-US
✗ CC-BY-3.0 None
✔ CC-BY-4.0
✔ CC-BY-NC-1.0
✔ CC-BY-NC-2.0
✔ CC-BY-NC-2.5
✔ CC-BY-NC-3.0-DE
✔ CC-BY-NC-3.0
✔ CC-BY-NC-4.0
✔ CC-BY-NC-ND-1.0
✔ CC-BY-NC-ND-2.0
✔ CC-BY-NC-ND-2.5
✔ CC-BY-NC-ND-3.0-DE
✔ CC-BY-NC-ND-3.0-IGO
✔ CC-BY-NC-ND-3.0
✔ CC-BY-NC-ND-4.0
✔ CC-BY-NC-SA-1.0
✗ CC-BY-NC-SA-2.0-FR None
✗ CC-BY-NC-SA-2.0-UK None
✔ CC-BY-NC-SA-2.0
✔ CC-BY-NC-SA-2.5
✔ CC-BY-NC-SA-3.0-DE
✗ CC-BY-NC-SA-3.0-IGO None
✔ CC-BY-NC-SA-3.0
✔ CC-BY-NC-SA-4.0
✔ CC-BY-ND-1.0
✗ CC-BY-ND-2.0 None
✔ CC-BY-ND-2.5
✔ CC-BY-ND-3.0-DE
✔ CC-BY-ND-3.0
✔ CC-BY-ND-4.0
✔ CC-BY-SA-1.0
✗ CC-BY-SA-2.0-UK None
✔ CC-BY-SA-2.0
✔ CC-BY-SA-2.1-JP
✔ CC-BY-SA-2.5
✔ CC-BY-SA-3.0-AT
✔ CC-BY-SA-3.0-DE
✗ CC-BY-SA-3.0 None
✔ CC-BY-SA-4.0
✔ CC-PDDC
✔ CC0-1.0
✗ CDDL-1.0 None
✗ CDDL-1.1 None
✗ CDL-1.0 None
✗ CDLA-Permissive-1.0 None
✔ CDLA-Permissive-2.0
✗ CDLA-Sharing-1.0 None
✗ CECILL-1.0 None
✔ CECILL-1.1
✗ CECILL-2.0 None
✗ CECILL-2.1 None
✗ CECILL-B None
✗ CECILL-C None
✗ CERN-OHL-1.1 None
✗ CERN-OHL-1.2 None
✗ CERN-OHL-P-2.0 None
✗ CERN-OHL-S-2.0 None
✗ CERN-OHL-W-2.0 None
✔ ClArtistic
✗ Classpath-exception-2.0 None
✗ CLISP-exception-2.0 None
✔ CNRI-Jython
✔ CNRI-Python-GPL-Compatible
✔ CNRI-Python
✗ Condor-1.1 None
✗ copyleft-next-0.3.0 None
✗ copyleft-next-0.3.1 None
✗ CPAL-1.0 None
✔ CPL-1.0
✔ CPOL-1.02
✔ Crossword
✔ CrystalStacker
✗ CUA-OPL-1.0 None
✗ Cube None
✔ curl
✗ D-FSL-1.0 None
✔ eCos-2.0
✗ GPL-1.0+ GPL-1.0
✗ GPL-2.0+ None
✗ GPL-2.0-with-autoconf-exception None
✔ GPL-2.0-with-bison-exception
✗ GPL-2.0-with-classpath-exception None
✗ GPL-2.0-with-font-exception None
✗ GPL-2.0-with-GCC-exception None
✗ GPL-3.0+ None
✗ GPL-3.0-with-autoconf-exception None
✔ GPL-3.0-with-GCC-exception
✗ LGPL-2.0+ None
✗ LGPL-2.1+ None
✗ LGPL-3.0+ None
✔ StandardML-NJ
✗ WXwindows None
✔ diffmark
✗ DigiRule-FOSS-exception None
✔ DOC
✔ Dotseqn
✔ DRL-1.0
✔ DSDP
✔ dvipdfm
✔ ECL-1.0
✔ ECL-2.0
✗ eCos-exception-2.0 None
✔ EFL-1.0
✔ EFL-2.0
✔ eGenix
✗ Entessa None
✔ EPICS
✗ EPL-1.0 None
✗ EPL-2.0 None
✔ ErlPL-1.1
✗ etalab-2.0 None
✗ EUDatagrid None
✗ EUPL-1.0 None
✗ EUPL-1.1 None
✗ EUPL-1.2 None
✗ Eurosym None
✗ Fair None
✗ Fawkes-Runtime-exception None
✗ FLTK-exception None
✗ Font-exception-2.0 None
✔ Frameworx-1.0
✗ FreeBSD-DOC None
✔ FreeImage
✗ freertos-exception-2.0 None
✔ FSFAP
✔ FSFUL
✔ FSFULLR
✔ FTL
✗ GCC-exception-2.0 None
✗ GCC-exception-3.1 None
✗ GD None
✗ GFDL-1.1-invariants-only GFDL-1.1
✗ GFDL-1.1-invariants-or-later GFDL-1.1
✗ GFDL-1.1-no-invariants-only GFDL-1.1
✗ GFDL-1.1-no-invariants-or-later GFDL-1.1
✗ GFDL-1.1-only GFDL-1.1
✗ GFDL-1.1-or-later GFDL-1.1
✔ GFDL-1.1
✗ GFDL-1.2-invariants-only GFDL-1.2
✗ GFDL-1.2-invariants-or-later GFDL-1.2
✗ GFDL-1.2-no-invariants-only GFDL-1.2
✗ GFDL-1.2-no-invariants-or-later GFDL-1.2
✗ GFDL-1.2-only GFDL-1.2
✗ GFDL-1.2-or-later GFDL-1.2
✔ GFDL-1.2
✗ GFDL-1.3-invariants-only GFDL-1.3
✗ GFDL-1.3-invariants-or-later GFDL-1.3
✗ GFDL-1.3-no-invariants-only GFDL-1.3
✗ GFDL-1.3-no-invariants-or-later GFDL-1.3
✗ GFDL-1.3-only GFDL-1.3
✗ GFDL-1.3-or-later GFDL-1.3
✔ GFDL-1.3
✔ Giftware
✔ GL2PS
✔ Glide
✔ Glulxe
✔ GLWTPL
✗ gnu-javamail-exception None
✔ gnuplot
✗ GPL-1.0-only GPL-1.0
✗ GPL-1.0-or-later GPL-1.0
✔ GPL-1.0
✗ GPL-2.0-only None
✗ GPL-2.0-or-later None
✗ GPL-2.0 None
✗ GPL-3.0-linking-exception None
✗ GPL-3.0-linking-source-exception None
✗ GPL-3.0-only None
✗ GPL-3.0-or-later None
✗ GPL-3.0 None
✗ GPL-CC-1.0 None
✗ gSOAP-1.3b None
✔ HaskellReport
✔ Hippocratic-2.1
✔ HPND-sell-variant
✗ HPND None
✔ HTMLTIDY
✗ i2p-gpl-java-exception None
✔ IBM-pibs
✔ ICU
✔ IJG
✗ ImageMagick None
✔ iMatix
✗ Imlib2 None
✔ Info-ZIP
✔ Intel-ACPI
✔ Intel
✗ Interbase-1.0 None
✔ IPA
✔ IPL-1.0
✔ ISC
✔ JasPer-2.0
✔ JPNIC
✔ JSON
✗ LAL-1.2 None
✗ LAL-1.3 None
✔ Latex2e
✔ Leptonica
✗ LGPL-2.0-only None
✗ LGPL-2.0-or-later None
✗ LGPL-2.0 None
✗ LGPL-2.1-only None
✗ LGPL-2.1-or-later None
✗ LGPL-2.1 None
✗ LGPL-3.0-linking-exception None
✗ LGPL-3.0-only None
✗ LGPL-3.0-or-later None
✗ LGPL-3.0 None
✗ LGPLLR None
✗ libpng-2.0 None
✗ Libpng None
✔ libselinux-1.0
✔ libtiff
✗ Libtool-exception None
✗ LiLiQ-P-1.1 None
✗ LiLiQ-R-1.1 None
✗ LiLiQ-Rplus-1.1 None
✔ Linux-OpenIB
✗ Linux-syscall-note None
✗ LLVM-exception None
✔ LPL-1.0
✔ LPL-1.02
✗ LPPL-1.0 None
✔ LPPL-1.1
✔ LPPL-1.2
✔ LPPL-1.3a
✔ LPPL-1.3c
✗ LZMA-exception None
✔ MakeIndex
✗ mif-exception None
✔ MirOS
✔ MIT-0
✗ MIT-advertising None
✔ MIT-CMU
✗ MIT-enna None
✗ MIT-feh None
✔ MIT-Modern-Variant
✔ MIT-open-group
✔ MIT
✔ MITNFA
✗ Motosoto None
✔ mpich2
✔ MPL-1.0
✗ MPL-1.1 None
✗ MPL-2.0-no-copyleft-exception None
✗ MPL-2.0 None
✔ MS-PL
✔ MS-RL
✔ MTLL
✔ MulanPSL-1.0
✔ MulanPSL-2.0
✗ Multics None
✔ Mup
✔ NAIST-2003
✔ NASA-1.3
✔ Naumen
✔ NBPL-1.0
✗ NCGL-UK-2.0 None
✔ NCSA
✔ Net-SNMP
✔ NetCDF
✔ Newsletr
✔ NGPL
✔ NIST-PD-fallback
✔ NIST-PD
✗ NLOD-1.0 None
✗ NLOD-2.0 None
✔ NLPL
✗ Nokia-Qt-exception-1.1 None
✗ Nokia None
✗ NOSL None
✔ Noweb
✔ NPL-1.0
✗ NPL-1.1 None
✔ NPOSL-3.0
✔ NRL
✔ NTP-0
✔ NTP
✗ Nunit None
✔ O-UDA-1.0
✗ OCaml-LGPL-linking-exception None
✗ OCCT-exception-1.0 None
✗ OCCT-PL None
✗ OCLC-2.0 None
✗ ODbL-1.0 None
✗ ODC-By-1.0 None
✔ OFL-1.0
✔ OFL-1.0
✔ OFL-1.0
✗ OFL-1.1-no-RFN OFL-1.1
✔ OFL-1.1
✔ OFL-1.1
✔ OGC-1.0
✔ OGDL-Taiwan-1.0
✗ OGL-Canada-2.0 None
✗ OGL-UK-1.0 None
✗ OGL-UK-2.0 None
✗ OGL-UK-3.0 None
✔ OGTSL
✗ OLDAP-1.1 NBPL-1.0
✔ OLDAP-1.2
✔ OLDAP-1.3
✔ OLDAP-1.4
✔ OLDAP-2.0.1
✔ OLDAP-2.0
✔ OLDAP-2.1
✔ OLDAP-2.2.1
✔ OLDAP-2.2.2
✔ OLDAP-2.2
✔ OLDAP-2.3
✔ OLDAP-2.4
✔ OLDAP-2.5
✔ OLDAP-2.6
✔ OLDAP-2.7
✔ OLDAP-2.8
✔ OML
✗ OpenJDK-assembly-exception-1.0 None
✗ OpenSSL None
✗ openvpn-openssl-exception None
✗ OPL-1.0 None
✔ OPUBL-1.0
✗ OSET-PL-2.1 None
✔ OSL-1.0
✔ OSL-1.1
✔ OSL-2.0
✔ OSL-2.1
✔ OSL-3.0
✔ Parity-6.0.0
✗ Parity-7.0.0 None
✗ PDDL-1.0 None
✗ PHP-3.0 None
✗ PHP-3.01 None
✔ Plexus
✔ PolyForm-Noncommercial-1.0.0
✔ PolyForm-Small-Business-1.0.0
✔ PostgreSQL
✗ PS-or-PDF-font-exception-20170817 None
✗ PSF-2.0 None
✔ psfrag
✗ psutils None
✔ Python-2.0
✔ Qhull
✔ QPL-1.0
✗ Qt-GPL-exception-1.0 None
✗ Qt-LGPL-exception-1.1 None
✗ Qwt-exception-1.0 None
✔ Rdisc
✔ RHeCos-1.1
✗ RPL-1.1 None
✗ RPL-1.5 None
✔ RPSL-1.0
✔ RSA-MD
✗ RSCPL None
✔ Ruby
✔ SAX-PD
✗ Saxpath None
✔ SCEA
✔ Sendmail-8.23
✔ Sendmail
✔ SGI-B-1.0
✔ SGI-B-1.1
✔ SGI-B-2.0
✔ SHL-0.5
✔ SHL-0.51
✗ SHL-2.0 None
✗ SHL-2.1 None
✔ SimPL-2.0
✔ SISSL-1.2
✔ SISSL
✔ Sleepycat
✗ SMLNJ StandardML-NJ
✔ SMPPL
✗ SNIA None
✗ Spencer-86 None
✔ Spencer-94
✔ Spencer-99
✗ SPL-1.0 None
✗ SSH-OpenSSH None
✔ SSH-short
✗ SSPL-1.0 None
✗ SugarCRM-1.1.3 None
✗ Swift-exception None
✔ SWL
✗ TAPR-OHL-1.0 None
✔ TCL
✔ TCP-wrappers
✔ TMate
✗ TORQUE-1.1 None
✔ TOSL
✔ TU-Berlin-1.0
✔ TU-Berlin-2.0
✗ u-boot-exception-2.0 None
✔ UCL-1.0
✔ Unicode-DFS-2015
✔ Unicode-DFS-2016
✔ Unicode-TOU
✗ Universal-FOSS-exception-1.0 None
✗ Unlicense None
✔ UPL-1.0
✔ Vim
✔ VOSTROM
✔ VSL-1.0
✔ W3C-19980720
✔ W3C-20150513
✔ W3C
✗ Watcom-1.0 None
✔ Wsuipa
✔ WTFPL
✗ WxWindows-exception-3.1 None
✔ X11
✔ Xerox
✗ XFree86-1.1 None
✔ xinetd
✔ Xnet
✗ xpp None
✔ XSkat
✔ YPL-1.0
✔ YPL-1.1
✔ Zed
✗ Zend-2.0 None
✔ Zimbra-1.3
✔ Zimbra-1.4
✗ zlib-acknowledgement None
✗ Zlib None
✗ ZPL-1.1 None
✔ ZPL-2.0
✔ ZPL-2.1

For comparison: Running https://github.com/nexB/scancode-toolkit on these examples has a success rate of 465/515 (90.29 %) and correctly detects ImageMagick as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants