-
-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve debian license detection #2390 #2558
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Uses filter_licenses flag to return all license matches and not only unique ones in the case of unstructured copyright files tests, which have with_details as True. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Add functions to check consistency of a debian copuright file, which if enabled, raises an exception in the following cases: - Unstructured File - Has other paragraphs detected - Has dupicate license paragraphs - Has paragraphs with license but not license name - All licenses in license paragraphs are not used - License expressions are not parsable Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
AyanSinhaMahapatra
force-pushed
the
2390-improve-debian-license-detection
branch
2 times, most recently
from
June 17, 2021 07:29
942f592
to
b81cb96
Compare
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
AyanSinhaMahapatra
force-pushed
the
2390-improve-debian-license-detection
branch
from
June 23, 2021 11:29
b81cb96
to
cb074fc
Compare
Adds a filter for unstructured debian copyright file, where if license intros are perfectly matched, they are discarded, as in the context of a debian copyright file, the license texts/notices are also there and not just the intro. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds tests which fail if there is a unknown license detection or a license detection issue with low match coverage present in the test cases. Also traces the detections in case of failures. Fixes some text expectations. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Add new rules and modify existing rules to get debian-slim license detections correct. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Modify test expectations after license detection improvements in debian-slim copyright files. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Improves license detection by modifying unstructured license intro detection with full coverage. Fixes matched_text bug in unknown debian license by setting the lines. Removes unknown copyright detections. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds license_matches property to get the LicenseMatch objects out of LicenseDetection objects directly as a property. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Structured debian copyright files are deteected by the 'format: ' first line, and this adds more format links commonly encountered in debian copyright files, thus detecting structured copyright files better. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds rules to remove unknown-license-references in common debian copyright files. Regenerates test files with removed unknowns. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Use the licensing.dedup function from license-expression library to simplify the licenses without losing any license-expression specific information. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
For structured debian copyright files, return an attribute with the primary license detected, and for unstructured files return None. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds the debian copyright file which caused an exception, as it says it's a structured debian copyright file, but doesn't have structured paragraphs in it. See - aboutcode-org/scancode.io#219 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
AyanSinhaMahapatra
force-pushed
the
2390-improve-debian-license-detection
branch
from
June 28, 2021 16:28
8b050bc
to
95568bc
Compare
Fixes bug in get_license_expression by not adding a LicenseDetection object when there are no license matches in an other paragraph, in a structured debian copyright file. See - aboutcode-org/scancode.io#219 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Modify get_license_expression to not pass None values to combine_expression, and also handle if all license detections are None, by getting a expression from the license_matches, or raising an Error if no license_matches. See - aboutcode-org/scancode.io#219 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
AyanSinhaMahapatra
force-pushed
the
2390-improve-debian-license-detection
branch
from
June 28, 2021 16:40
95568bc
to
ffdec4e
Compare
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Remove filter_licenses flag from get_license_expression functions as licensing.dedup() makes this function redundant. Also renames filter_licenses to filter_duplicates. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Remove copyrights which starts with "none", like `Copyright: none`. Regenerate tests to remove nones from results. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Move consistency error function into the EnhancedDebianCopyright class and have the error textx added directly instead of a dict lookup. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Rename the rules appropriately, with some modifications and additions. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds a function to check whether the keys in debian expression can be substituted successfully, and adds an UnknownMatch if there are inconsistencies. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
AyanSinhaMahapatra
force-pushed
the
2390-improve-debian-license-detection
branch
from
July 2, 2021 06:53
4624992
to
467d23d
Compare
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds a function to do license detection on the license name if it is not in the seen keys. Regenerates test and adds rules. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Removes old debian copyright parsing functions and also removes debian_licenses.txt. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@pombredanne Tests are all green now. |
pombredanne
commented
Jul 2, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! merging ... thank you ++ that's a big one.
Separately we will need a CHANGELOG entry 👍
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improves Debian License Detection by making the following enhancements:
unknown-license-reference
)none
andunknown
copyrightsextra_data
debian_licenses.txt
See:
Tasks
Run tests locally to check for errors.