-
-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve license detection bugs #1963
Solve license detection bugs #1963
Conversation
Also I've reproduced and checked other bugs #1933 #1932 #1931 #1930 #1929 #1928 #1927 #1926 #1925 #1924 #1923 #1922 #1921 #1920 #1919 #1918 #1917 #1915. And all these are already solved by you @pombredanne in the 12-03-license-updates branch. I've collected all these files to reproduce these bugs, initial scan reproduces them - scan results file. Then all the rules added by @pombredanne are indexed and the scan results are here - scan results files . Comparing you can see that the bugs are solved. Some them were remaining, are solved in this PR - #1908 #1910 #1911 #1912 #1914. |
Codecov Report
@@ Coverage Diff @@
## 12-03-license-updates #1963 +/- ##
=========================================================
- Coverage 79.61% 78.94% -0.67%
=========================================================
Files 131 131
Lines 17642 16946 -696
=========================================================
- Hits 14045 13378 -667
+ Misses 3597 3568 -29
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AyanSinhaMahapatra This is looking great! You really grok license detection now!
See my comments in line.
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
See the License for more information. | ||
|
||
Copyright (C) 1984, 1989, 1990, 2000, 2001, 2002, 2003, 2004, 2005, 2006 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases , we are trying to avoid keeping copyright statements in license detection rules. So IMHO you could remove these two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works perfectly. Pushing changes with removed lines.
@@ -0,0 +1,20 @@ | |||
GNU Libltdl is free software; you can redistribute it and/or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to also have another rule (and may be have only that rule) WITHOUT the libltdl reference?
For instance rather than start with:
GNU Libltdl is free software; you can redistribute it and/or
use this:
GNU is free software; you can redistribute it and/or
and remove the libtdl word elsewhere?
Because of the way the license detection ignores words it doe not know about (such as libtdl) this will not have any impact and will be detected with the automaton... And it can also detected exactly several other variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works perfectly for this case, i.e. (having only that rule) it would also detect if the text's exactly same apart from "Libltdl". I was wondering if I can keep the original rule with a higher relevance, and the non-liblltdl one with a lower relevance score, so it detects other similar stuff but in case of multiple matches would prioritize other more relevant matches. I'm pushing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO I think there's some work to be done in standardizing relevance scores
and minimum coverages
across all the rules, after performing statistics on false positive/unknown matches, as these are kind of happening regularly in bugs. This would be one of the smaller important tasks of the GSoC project, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some doc on how relevance scores
are given out to rules? I see the general trend in rule names/texts vs their relevance/relevance scores, and how they affect license scores, but are there any strict guidelines I mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
liblltdl
IMHO only the one without liblltdl would be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO I think there's some work to be done in standardizing
relevance scores
andminimum coverages
across all the rules, after performing statistics on false positive/unknown matches, as these are kind of happening regularly in bugs. This would be one of the smaller important tasks of the GSoC project, what do you think?
excellent idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some doc on how
relevance scores
are given out to rules? I see the general trend in rule names/texts vs their relevance/relevance scores, and how they affect license scores, but are there any strict guidelines I mean?
The relevance is computed dynamically based on the size of a rule OR stored and assigned manually as a curated value based on judgment.
See
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO only the one without liblltdl would be enough
Done.
Even if the |
@AyanSinhaMahapatra in order to have an efficient discussion on each issue it may be simpler to have the comments you pasted above each in their own respective ticket, otherwise this is going to be a tad hairy to track comments and replies.
|
@pombredanne Added commits solving #1908 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AyanSinhaMahapatra may be you could find a slightly more descriptive commit message?
Add requested changes
is not super useful: think about it this way: 6 months from now, you are looking at this same message. What information does this message tells you? IMHO not much.
Try to find something more explicit... tell a (mini) story or enough for this to be useful now for others and for you and others in the future.
@@ -1,7 +1,7 @@ | |||
license_expression: lgpl-2.0-plus WITH libtool-exception-2.0 | |||
is_license_notice: yes | |||
minimum_coverage: 90 | |||
relevance: 100 | |||
relevance: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not 100% relevant? is there any ambiguity that is a lgpl-2.0-plus WITH libtool-exception-2.0
? I do not think so, furthermore, this is a large enough rule, so you could/should eschew storing a relevance
there IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to remove this change, i.e. after doing this. Fixing this and the commit message too. Thanks for pointing out.
16cc484
to
3cfea5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ++
We are almost there!
@@ -0,0 +1,4 @@ | |||
license_expression: gpl-2.0-plus WITH bison-exception-2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should instead:
license_expression: bsd-new AND gpl-2.0-plus WITH bison-exception-2.2
is_license_notice: yes
minimum_coverage: 99
referenced_filenames:
- Copyright.txt
You may also want to create a second rule with only the BSD parts:
Distributed under the OSI-approved BSD License (the "License");
see accompanying file Copyright.txt for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've created a second rule with the BSD parts beforehand only, in this commit but didn't add bsd-new AND
to the license_expression
of gpl-2.0-plus WITH bison-exception-2.2
, which I've pushed now. Does this work?
@@ -0,0 +1 @@ | |||
label = "GPL2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is very close to a possible license tag of sorts, could there be a word before or after that could be included to make this more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Adding a line before.
@@ -0,0 +1 @@ | |||
label = "GPL3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for the label = gpl2
rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will add a line before.
@@ -0,0 +1 @@ | |||
S5PC100_GPL4(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this really detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only GPL1, GPL2 and GPL3 were detected. On second thought this is really unnecessary, removing and renaming these.
@@ -0,0 +1 @@ | |||
S5PC100_GPL0(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this really detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. On second thought this is really unnecessary, removing these.
@@ -0,0 +1 @@ | |||
label = "GPL4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this really detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. On second thought this is really unnecessary, removing these.
@@ -0,0 +1 @@ | |||
label = "GPL0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this really detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. On second thought this is really unnecessary, removing these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks... Almost there: there is still a minor nitpicking/rename needed.
@@ -0,0 +1,34 @@ | |||
Distributed under the OSI-approved BSD License (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a convention we name the rule after the license expression...
So may be use something like bsd-new_and_gpl-2.0-plus_with_bison-exception_....
(within reason, unless there many many licenses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ea3be0c
to
3e09782
Compare
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Solves aboutcode-org#1912 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Solves aboutcode-org#1914 aboutcode-org#1911 Add negative and gpl with exceptions rules Add rules and minimum coverage for false positive case. Add new GPL rule Add new bsd license rules Add bzip2 Rule Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Add bsd-new to gpl with bison exception yml file. Also add referenced copyright file. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
3e09782
to
7984a2e
Compare
This looks ready to merge to me. The tests failing are not related to this. |
Adds functions that takes input data from Scancode Scan Results, in JSON, and structures them similarly as the ClearlyDefined DataBase Data, so the same fuctions can be used on them. Adds Jupyter Notebook to explain the Fuction Calls, and Data, and JSON files as sample, from the issues in aboutcode-org/scancode-toolkit#1963. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Fixes #1907 #1910 #1911 #1912 #1914
Tasks
Run tests locally to check for errors.