-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional testing #14
Comments
@goneall Great initiative! Please let me know if I can be of any help with Python. I can write a python script utility to download specific license templates/xml file repo and run the tool on them. Maybe having this utility as part of test utils will help in generating more future tests like for v3.13 or v2.x. |
@rtgdk that would be a great help if you could write such a utility |
@Kahanikaar Please feel free to contribute to this issue - thanks |
Given that https://github.com/m1kit/yalm-resources/blob/874bc2162b3edab7fabb6ed0a76e97dc7c828530/meta.json declares to use the license list in version 3.14, I just did some quick testing and observed a success rate of 285/515 (55.34 %) for the exact target matches. Further observations:
Testing code: import json
from collections import defaultdict
from importlib import resources
from pathlib import Path
from yalm import detect_license, resources as data
duplicates = json.loads(resources.read_text(data, 'expected-duplicates.json'))
duplicate_mapping = defaultdict(set)
for entry in duplicates:
duplicate_mapping[entry['from']].add(entry['to'])
duplicate_mapping = dict(duplicate_mapping)
correct, total, result_is_none = 0, 0, 0
for path in sorted(
Path('license-list-XML-3.14', 'test', 'simpleTestForGenerator').glob('*.txt'),
key=lambda x: x.name.lower()
):
expected = path.stem
if expected.startswith('depreciate_'):
expected = expected[11:]
result = detect_license(text=path.read_text(), timeout=60, num_workers=20)
actual = result if not result else result.template.id
if expected == actual or actual in duplicate_mapping.get(expected, set()):
correct += 1
print('✔', actual)
else:
print('✗', expected, actual)
if actual is None:
result_is_none += 1
total += 1
print(f'{correct}/{total} ({correct / total:.2%}) detected correctly.')
# print(result_is_none) Complete results:
For comparison: Running https://github.com/nexB/scancode-toolkit on these examples has a success rate of 465/515 (90.29 %) and correctly detects ImageMagick as well. |
I would like to use this code to replace the Java license matching used in the SPDX online tools.
Before making that change, I would like to test all of the SPDX listed licenses.
This can be done by downloaded all of the license templates from the License List Data templates repo and downloading text files from the License List XML test files repo.
If we do license compares against all the files in the test files and it matches all the templates from the templates directory that would demonstrate we have no false negatives.
We can also test for false positives by finding any matches against more than one template. There are some expected duplicates which are all documented in the License List XML expected-warnings file.
Note that to keep the test and template files consistent, you should download the same tagged version (e.g. v3.12 used in the above links).
The text was updated successfully, but these errors were encountered: