Additional testing #14

goneall · 2021-03-11T18:51:27Z

I would like to use this code to replace the Java license matching used in the SPDX online tools.

Before making that change, I would like to test all of the SPDX listed licenses.

This can be done by downloaded all of the license templates from the License List Data templates repo and downloading text files from the License List XML test files repo.

If we do license compares against all the files in the test files and it matches all the templates from the templates directory that would demonstrate we have no false negatives.

We can also test for false positives by finding any matches against more than one template. There are some expected duplicates which are all documented in the License List XML expected-warnings file.

Note that to keep the test and template files consistent, you should download the same tagged version (e.g. v3.12 used in the above links).

rtgdk · 2021-03-21T12:57:37Z

@goneall Great initiative! Please let me know if I can be of any help with Python. I can write a python script utility to download specific license templates/xml file repo and run the tool on them. Maybe having this utility as part of test utils will help in generating more future tests like for v3.13 or v2.x.

goneall · 2021-03-21T23:43:13Z

I can write a python script utility to download specific license templates/xml file repo and run the tool on them.

@rtgdk that would be a great help if you could write such a utility

sanjibansg · 2021-03-27T10:11:45Z

I would like to contribute to this issue. Is this open currently? @rtgdk @goneall

goneall · 2021-03-27T15:30:38Z

@Kahanikaar Please feel free to contribute to this issue - thanks

stefan6419846 · 2023-06-05T12:12:40Z

Given that https://github.com/m1kit/yalm-resources/blob/874bc2162b3edab7fabb6ed0a76e97dc7c828530/meta.json declares to use the license list in version 3.14, I just did some quick testing and observed a success rate of 285/515 (55.34 %) for the exact target matches.

Further observations:

ImageMagick license matching seems to always timeout in yalm.licenses.SpdxLicense._test_regex for the (correct) regex file ImageMagick - even with 20 workers and a timeout of 180 minutes (which I consider rather large values for matching purposes).
199 cases do return None during matching (probably not due to timeout).
16 licenses start with depreciate_, while the results miss this prefix. Ignoring this prefix during comparison will slightly increase the success rate to 289/515 (56.12 %) for the exact target matches.

Testing code:

import json
from collections import defaultdict
from importlib import resources
from pathlib import Path

from yalm import detect_license, resources as data


duplicates = json.loads(resources.read_text(data, 'expected-duplicates.json'))
duplicate_mapping = defaultdict(set)
for entry in duplicates:
    duplicate_mapping[entry['from']].add(entry['to'])
duplicate_mapping = dict(duplicate_mapping)


correct, total, result_is_none = 0, 0, 0

for path in sorted(
        Path('license-list-XML-3.14', 'test', 'simpleTestForGenerator').glob('*.txt'),
        key=lambda x: x.name.lower()
):
    expected = path.stem
    if expected.startswith('depreciate_'):
        expected = expected[11:]
    result = detect_license(text=path.read_text(), timeout=60, num_workers=20)
    actual = result if not result else result.template.id
    if expected == actual or actual in duplicate_mapping.get(expected, set()):
        correct += 1
        print('✔', actual)
    else:
        print('✗', expected, actual)
    if actual is None:
        result_is_none += 1
    total += 1


print(f'{correct}/{total} ({correct / total:.2%}) detected correctly.')
# print(result_is_none)

Complete results:

✔ 0BSD
✗ 389-exception None
✔ AAL
✔ Abstyles
✔ Adobe-2006
✔ Adobe-Glyph
✔ ADSL
✔ AFL-1.1
✔ AFL-1.2
✔ AFL-2.0
✔ AFL-2.1
✗ AFL-3.0 None
✔ Afmparse
✗ AGPL-1.0-only None
✗ AGPL-1.0-or-later None
✗ AGPL-1.0 None
✗ AGPL-3.0-only None
✗ AGPL-3.0-or-later None
✗ AGPL-3.0 None
✔ Aladdin
✗ AMDPLPA None
✔ AML
✔ AMPAS
✗ ANTLR-PD-fallback None
✔ ANTLR-PD
✗ Apache-1.0 None
✗ Apache-1.1 None
✔ Apache-2.0
✔ APAFML
✗ APL-1.0 None
✗ APSL-1.0 None
✗ APSL-1.1 None
✗ APSL-1.2 None
✗ APSL-2.0 None
✔ Artistic-1.0-cl8
✔ Artistic-1.0-Perl
✔ Artistic-1.0
✔ Artistic-2.0
✗ Autoconf-exception-2.0 None
✗ Autoconf-exception-3.0 None
✔ Bahyph
✔ Barr
✔ Beerware
✗ Bison-exception-2.2 GPL-2.0-with-bison-exception
✗ BitTorrent-1.0 None
✗ BitTorrent-1.1 None
✔ blessing
✔ BlueOak-1.0.0
✗ Bootloader-exception None
✔ Borceux
✔ BSD-1-Clause
✔ BSD-2-Clause-FreeBSD
✗ BSD-2-Clause-NetBSD BSD-2-Clause
✔ BSD-2-Clause-Patent
✔ BSD-2-Clause-Views
✔ BSD-2-Clause
✗ BSD-3-Clause-Attribution None
✔ BSD-3-Clause-Clear
✔ BSD-3-Clause-LBNL
✔ BSD-3-Clause-Modification
✔ BSD-3-Clause-No-Military-License
✔ BSD-3-Clause-No-Nuclear-License-2014
✔ BSD-3-Clause-No-Nuclear-License
✔ BSD-3-Clause-No-Nuclear-Warranty
✔ BSD-3-Clause-Open-MPI
✔ BSD-3-Clause
✔ BSD-4-Clause-Shortened
✗ BSD-4-Clause-UC BSD-4-Clause
✔ BSD-4-Clause
✔ BSD-Protection
✔ BSD-Source-Code
✔ BSL-1.0
✗ BUSL-1.1 None
✗ bzip2-1.0.5 None
✗ bzip2-1.0.6 None
✗ C-UDA-1.0 None
✗ CAL-1.0-Combined-Work-Exception None
✗ CAL-1.0 None
✔ Caldera
✔ CATOSL-1.1
✔ CC-BY-1.0
✔ CC-BY-2.0
✗ CC-BY-2.5-AU None
✔ CC-BY-2.5
✔ CC-BY-3.0-AT
✔ CC-BY-3.0-DE
✔ CC-BY-3.0-NL
✔ CC-BY-3.0-US
✗ CC-BY-3.0 None
✔ CC-BY-4.0
✔ CC-BY-NC-1.0
✔ CC-BY-NC-2.0
✔ CC-BY-NC-2.5
✔ CC-BY-NC-3.0-DE
✔ CC-BY-NC-3.0
✔ CC-BY-NC-4.0
✔ CC-BY-NC-ND-1.0
✔ CC-BY-NC-ND-2.0
✔ CC-BY-NC-ND-2.5
✔ CC-BY-NC-ND-3.0-DE
✔ CC-BY-NC-ND-3.0-IGO
✔ CC-BY-NC-ND-3.0
✔ CC-BY-NC-ND-4.0
✔ CC-BY-NC-SA-1.0
✗ CC-BY-NC-SA-2.0-FR None
✗ CC-BY-NC-SA-2.0-UK None
✔ CC-BY-NC-SA-2.0
✔ CC-BY-NC-SA-2.5
✔ CC-BY-NC-SA-3.0-DE
✗ CC-BY-NC-SA-3.0-IGO None
✔ CC-BY-NC-SA-3.0
✔ CC-BY-NC-SA-4.0
✔ CC-BY-ND-1.0
✗ CC-BY-ND-2.0 None
✔ CC-BY-ND-2.5
✔ CC-BY-ND-3.0-DE
✔ CC-BY-ND-3.0
✔ CC-BY-ND-4.0
✔ CC-BY-SA-1.0
✗ CC-BY-SA-2.0-UK None
✔ CC-BY-SA-2.0
✔ CC-BY-SA-2.1-JP
✔ CC-BY-SA-2.5
✔ CC-BY-SA-3.0-AT
✔ CC-BY-SA-3.0-DE
✗ CC-BY-SA-3.0 None
✔ CC-BY-SA-4.0
✔ CC-PDDC
✔ CC0-1.0
✗ CDDL-1.0 None
✗ CDDL-1.1 None
✗ CDL-1.0 None
✗ CDLA-Permissive-1.0 None
✔ CDLA-Permissive-2.0
✗ CDLA-Sharing-1.0 None
✗ CECILL-1.0 None
✔ CECILL-1.1
✗ CECILL-2.0 None
✗ CECILL-2.1 None
✗ CECILL-B None
✗ CECILL-C None
✗ CERN-OHL-1.1 None
✗ CERN-OHL-1.2 None
✗ CERN-OHL-P-2.0 None
✗ CERN-OHL-S-2.0 None
✗ CERN-OHL-W-2.0 None
✔ ClArtistic
✗ Classpath-exception-2.0 None
✗ CLISP-exception-2.0 None
✔ CNRI-Jython
✔ CNRI-Python-GPL-Compatible
✔ CNRI-Python
✗ Condor-1.1 None
✗ copyleft-next-0.3.0 None
✗ copyleft-next-0.3.1 None
✗ CPAL-1.0 None
✔ CPL-1.0
✔ CPOL-1.02
✔ Crossword
✔ CrystalStacker
✗ CUA-OPL-1.0 None
✗ Cube None
✔ curl
✗ D-FSL-1.0 None
✔ eCos-2.0
✗ GPL-1.0+ GPL-1.0
✗ GPL-2.0+ None
✗ GPL-2.0-with-autoconf-exception None
✔ GPL-2.0-with-bison-exception
✗ GPL-2.0-with-classpath-exception None
✗ GPL-2.0-with-font-exception None
✗ GPL-2.0-with-GCC-exception None
✗ GPL-3.0+ None
✗ GPL-3.0-with-autoconf-exception None
✔ GPL-3.0-with-GCC-exception
✗ LGPL-2.0+ None
✗ LGPL-2.1+ None
✗ LGPL-3.0+ None
✔ StandardML-NJ
✗ WXwindows None
✔ diffmark
✗ DigiRule-FOSS-exception None
✔ DOC
✔ Dotseqn
✔ DRL-1.0
✔ DSDP
✔ dvipdfm
✔ ECL-1.0
✔ ECL-2.0
✗ eCos-exception-2.0 None
✔ EFL-1.0
✔ EFL-2.0
✔ eGenix
✗ Entessa None
✔ EPICS
✗ EPL-1.0 None
✗ EPL-2.0 None
✔ ErlPL-1.1
✗ etalab-2.0 None
✗ EUDatagrid None
✗ EUPL-1.0 None
✗ EUPL-1.1 None
✗ EUPL-1.2 None
✗ Eurosym None
✗ Fair None
✗ Fawkes-Runtime-exception None
✗ FLTK-exception None
✗ Font-exception-2.0 None
✔ Frameworx-1.0
✗ FreeBSD-DOC None
✔ FreeImage
✗ freertos-exception-2.0 None
✔ FSFAP
✔ FSFUL
✔ FSFULLR
✔ FTL
✗ GCC-exception-2.0 None
✗ GCC-exception-3.1 None
✗ GD None
✗ GFDL-1.1-invariants-only GFDL-1.1
✗ GFDL-1.1-invariants-or-later GFDL-1.1
✗ GFDL-1.1-no-invariants-only GFDL-1.1
✗ GFDL-1.1-no-invariants-or-later GFDL-1.1
✗ GFDL-1.1-only GFDL-1.1
✗ GFDL-1.1-or-later GFDL-1.1
✔ GFDL-1.1
✗ GFDL-1.2-invariants-only GFDL-1.2
✗ GFDL-1.2-invariants-or-later GFDL-1.2
✗ GFDL-1.2-no-invariants-only GFDL-1.2
✗ GFDL-1.2-no-invariants-or-later GFDL-1.2
✗ GFDL-1.2-only GFDL-1.2
✗ GFDL-1.2-or-later GFDL-1.2
✔ GFDL-1.2
✗ GFDL-1.3-invariants-only GFDL-1.3
✗ GFDL-1.3-invariants-or-later GFDL-1.3
✗ GFDL-1.3-no-invariants-only GFDL-1.3
✗ GFDL-1.3-no-invariants-or-later GFDL-1.3
✗ GFDL-1.3-only GFDL-1.3
✗ GFDL-1.3-or-later GFDL-1.3
✔ GFDL-1.3
✔ Giftware
✔ GL2PS
✔ Glide
✔ Glulxe
✔ GLWTPL
✗ gnu-javamail-exception None
✔ gnuplot
✗ GPL-1.0-only GPL-1.0
✗ GPL-1.0-or-later GPL-1.0
✔ GPL-1.0
✗ GPL-2.0-only None
✗ GPL-2.0-or-later None
✗ GPL-2.0 None
✗ GPL-3.0-linking-exception None
✗ GPL-3.0-linking-source-exception None
✗ GPL-3.0-only None
✗ GPL-3.0-or-later None
✗ GPL-3.0 None
✗ GPL-CC-1.0 None
✗ gSOAP-1.3b None
✔ HaskellReport
✔ Hippocratic-2.1
✔ HPND-sell-variant
✗ HPND None
✔ HTMLTIDY
✗ i2p-gpl-java-exception None
✔ IBM-pibs
✔ ICU
✔ IJG
✗ ImageMagick None
✔ iMatix
✗ Imlib2 None
✔ Info-ZIP
✔ Intel-ACPI
✔ Intel
✗ Interbase-1.0 None
✔ IPA
✔ IPL-1.0
✔ ISC
✔ JasPer-2.0
✔ JPNIC
✔ JSON
✗ LAL-1.2 None
✗ LAL-1.3 None
✔ Latex2e
✔ Leptonica
✗ LGPL-2.0-only None
✗ LGPL-2.0-or-later None
✗ LGPL-2.0 None
✗ LGPL-2.1-only None
✗ LGPL-2.1-or-later None
✗ LGPL-2.1 None
✗ LGPL-3.0-linking-exception None
✗ LGPL-3.0-only None
✗ LGPL-3.0-or-later None
✗ LGPL-3.0 None
✗ LGPLLR None
✗ libpng-2.0 None
✗ Libpng None
✔ libselinux-1.0
✔ libtiff
✗ Libtool-exception None
✗ LiLiQ-P-1.1 None
✗ LiLiQ-R-1.1 None
✗ LiLiQ-Rplus-1.1 None
✔ Linux-OpenIB
✗ Linux-syscall-note None
✗ LLVM-exception None
✔ LPL-1.0
✔ LPL-1.02
✗ LPPL-1.0 None
✔ LPPL-1.1
✔ LPPL-1.2
✔ LPPL-1.3a
✔ LPPL-1.3c
✗ LZMA-exception None
✔ MakeIndex
✗ mif-exception None
✔ MirOS
✔ MIT-0
✗ MIT-advertising None
✔ MIT-CMU
✗ MIT-enna None
✗ MIT-feh None
✔ MIT-Modern-Variant
✔ MIT-open-group
✔ MIT
✔ MITNFA
✗ Motosoto None
✔ mpich2
✔ MPL-1.0
✗ MPL-1.1 None
✗ MPL-2.0-no-copyleft-exception None
✗ MPL-2.0 None
✔ MS-PL
✔ MS-RL
✔ MTLL
✔ MulanPSL-1.0
✔ MulanPSL-2.0
✗ Multics None
✔ Mup
✔ NAIST-2003
✔ NASA-1.3
✔ Naumen
✔ NBPL-1.0
✗ NCGL-UK-2.0 None
✔ NCSA
✔ Net-SNMP
✔ NetCDF
✔ Newsletr
✔ NGPL
✔ NIST-PD-fallback
✔ NIST-PD
✗ NLOD-1.0 None
✗ NLOD-2.0 None
✔ NLPL
✗ Nokia-Qt-exception-1.1 None
✗ Nokia None
✗ NOSL None
✔ Noweb
✔ NPL-1.0
✗ NPL-1.1 None
✔ NPOSL-3.0
✔ NRL
✔ NTP-0
✔ NTP
✗ Nunit None
✔ O-UDA-1.0
✗ OCaml-LGPL-linking-exception None
✗ OCCT-exception-1.0 None
✗ OCCT-PL None
✗ OCLC-2.0 None
✗ ODbL-1.0 None
✗ ODC-By-1.0 None
✔ OFL-1.0
✔ OFL-1.0
✔ OFL-1.0
✗ OFL-1.1-no-RFN OFL-1.1
✔ OFL-1.1
✔ OFL-1.1
✔ OGC-1.0
✔ OGDL-Taiwan-1.0
✗ OGL-Canada-2.0 None
✗ OGL-UK-1.0 None
✗ OGL-UK-2.0 None
✗ OGL-UK-3.0 None
✔ OGTSL
✗ OLDAP-1.1 NBPL-1.0
✔ OLDAP-1.2
✔ OLDAP-1.3
✔ OLDAP-1.4
✔ OLDAP-2.0.1
✔ OLDAP-2.0
✔ OLDAP-2.1
✔ OLDAP-2.2.1
✔ OLDAP-2.2.2
✔ OLDAP-2.2
✔ OLDAP-2.3
✔ OLDAP-2.4
✔ OLDAP-2.5
✔ OLDAP-2.6
✔ OLDAP-2.7
✔ OLDAP-2.8
✔ OML
✗ OpenJDK-assembly-exception-1.0 None
✗ OpenSSL None
✗ openvpn-openssl-exception None
✗ OPL-1.0 None
✔ OPUBL-1.0
✗ OSET-PL-2.1 None
✔ OSL-1.0
✔ OSL-1.1
✔ OSL-2.0
✔ OSL-2.1
✔ OSL-3.0
✔ Parity-6.0.0
✗ Parity-7.0.0 None
✗ PDDL-1.0 None
✗ PHP-3.0 None
✗ PHP-3.01 None
✔ Plexus
✔ PolyForm-Noncommercial-1.0.0
✔ PolyForm-Small-Business-1.0.0
✔ PostgreSQL
✗ PS-or-PDF-font-exception-20170817 None
✗ PSF-2.0 None
✔ psfrag
✗ psutils None
✔ Python-2.0
✔ Qhull
✔ QPL-1.0
✗ Qt-GPL-exception-1.0 None
✗ Qt-LGPL-exception-1.1 None
✗ Qwt-exception-1.0 None
✔ Rdisc
✔ RHeCos-1.1
✗ RPL-1.1 None
✗ RPL-1.5 None
✔ RPSL-1.0
✔ RSA-MD
✗ RSCPL None
✔ Ruby
✔ SAX-PD
✗ Saxpath None
✔ SCEA
✔ Sendmail-8.23
✔ Sendmail
✔ SGI-B-1.0
✔ SGI-B-1.1
✔ SGI-B-2.0
✔ SHL-0.5
✔ SHL-0.51
✗ SHL-2.0 None
✗ SHL-2.1 None
✔ SimPL-2.0
✔ SISSL-1.2
✔ SISSL
✔ Sleepycat
✗ SMLNJ StandardML-NJ
✔ SMPPL
✗ SNIA None
✗ Spencer-86 None
✔ Spencer-94
✔ Spencer-99
✗ SPL-1.0 None
✗ SSH-OpenSSH None
✔ SSH-short
✗ SSPL-1.0 None
✗ SugarCRM-1.1.3 None
✗ Swift-exception None
✔ SWL
✗ TAPR-OHL-1.0 None
✔ TCL
✔ TCP-wrappers
✔ TMate
✗ TORQUE-1.1 None
✔ TOSL
✔ TU-Berlin-1.0
✔ TU-Berlin-2.0
✗ u-boot-exception-2.0 None
✔ UCL-1.0
✔ Unicode-DFS-2015
✔ Unicode-DFS-2016
✔ Unicode-TOU
✗ Universal-FOSS-exception-1.0 None
✗ Unlicense None
✔ UPL-1.0
✔ Vim
✔ VOSTROM
✔ VSL-1.0
✔ W3C-19980720
✔ W3C-20150513
✔ W3C
✗ Watcom-1.0 None
✔ Wsuipa
✔ WTFPL
✗ WxWindows-exception-3.1 None
✔ X11
✔ Xerox
✗ XFree86-1.1 None
✔ xinetd
✔ Xnet
✗ xpp None
✔ XSkat
✔ YPL-1.0
✔ YPL-1.1
✔ Zed
✗ Zend-2.0 None
✔ Zimbra-1.3
✔ Zimbra-1.4
✗ zlib-acknowledgement None
✗ Zlib None
✗ ZPL-1.1 None
✔ ZPL-2.0
✔ ZPL-2.1

For comparison: Running https://github.com/nexB/scancode-toolkit on these examples has a success rate of 465/515 (90.29 %) and correctly detects ImageMagick as well.

m1kit mentioned this issue Jun 8, 2021

Additional testing m1kit/yalm-python#11

Closed

m1kit mentioned this issue Aug 23, 2021

Sync up with GSoC work #17

Merged

goneall mentioned this issue Jun 2, 2023

Decrease distribution size #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional testing #14

Additional testing #14

goneall commented Mar 11, 2021

rtgdk commented Mar 21, 2021

goneall commented Mar 21, 2021

sanjibansg commented Mar 27, 2021 •

edited

Loading

goneall commented Mar 27, 2021

stefan6419846 commented Jun 5, 2023 •

edited

Loading

Additional testing #14

Additional testing #14

Comments

goneall commented Mar 11, 2021

rtgdk commented Mar 21, 2021

goneall commented Mar 21, 2021

sanjibansg commented Mar 27, 2021 • edited Loading

goneall commented Mar 27, 2021

stefan6419846 commented Jun 5, 2023 • edited Loading

sanjibansg commented Mar 27, 2021 •

edited

Loading

stefan6419846 commented Jun 5, 2023 •

edited

Loading