Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

References to MIT / GPL dual-licensing not detected #257

Closed
sschuberth opened this issue May 4, 2016 · 13 comments · Fixed by #258
Closed

References to MIT / GPL dual-licensing not detected #257

sschuberth opened this issue May 4, 2016 · 13 comments · Fixed by #258

Comments

@sschuberth
Copy link
Collaborator

In the libjson package, the sh*.css and sh*.js file say

 * @license
 * Dual licensed under the MIT and GPL licenses.

It's probably hard to detect as these are only references that do not include the license texts, but still it'd be great if ScanCode would detect these.

@pombredanne
Copy link
Member

@sschuberth Thanks. This is a data bug in the current develop or master branch warranting a new rule.
In the branch for #86 this would likely have been caught without even having a new rule.

pombredanne added a commit that referenced this issue May 4, 2016
Reported-by: Sebastian Schuberth
pombredanne added a commit that referenced this issue May 4, 2016
#257 Detect dual mit or gpl in javascript
@pombredanne
Copy link
Member

@sschuberth the latest commit fixes this in develop. Thank you ++ for reporting this!
You can grab the develop zip to use this until the next release, though there are also many new things in develop and it is not guaranteed to be 100% stable yet.

@sschuberth
Copy link
Collaborator Author

Thank a lot for the quick fix! I'll try out the develop branch, though I'm running into issue #259.

@sschuberth
Copy link
Collaborator Author

Your fix basically seems to work, except that the GPL entry in the JSON report does not have the spdx_license_key set. But I guess this is because the SPDX key requires you to know about the version (as it is part of the key name), and you do not know about the version in this case, right?

However, you probably could set spdx_license_key to GPL-1.0+ in that case, or?

@pombredanne
Copy link
Member

pombredanne commented May 4, 2016

Well, this is something that demands a little thoughts: in this very case, the syntax highlighter is under MIT or GPL-2.0 .... but technically outside of this context, you are right that any GPL version could be picked.

I was thinking in these cases to get away in the future from using a bare GPL and instead use GPL 2.0 or later . The bare GPL is somewhat of a kludge IMHO.

Also we have full support for license expressions in the making in a near future including the ability to reason about licenses equivalence. In that context, we may even get something better coming up.
And more abilities to consider a whole licensing context will come up afterwards to consider not only the detected license in code, but also the main license files that may exist nearby or be referenced.
With all that said, this is not yet there.

So short term, I am fine with spdx_license_key: GPL-1.0+ for the bare gpl. Would you mind to submit a quick PR? I will merge this in.

@sschuberth
Copy link
Collaborator Author

I'd like to give this some more thought over the weekend. Because another issue is that GPL-1.0+ (including the literal +) is not a valid SPDX key in recent versions of the specification anymore. Instead, the GPL-1.0 part is the key, and the + is a modifier to it. I'm not sure whether you're willing to make these modifiers part of spdx_license_key ...

@pombredanne
Copy link
Member

The future parser for license expression is going to be based on https://github.com/bastikr/boolean.py that I co-maintain. It will be able to deal flexibly with any form of expressions including the + quirks of SPDX...
and mixing license keys from SPDX and ScanCode. So no problem there.
What this means technically is that GPL-1.0+ will be understood as GPL-1.0 or later and logically equivalent to GPL-1.0 or GPL-2.0 or GPL-3.0

@sschuberth
Copy link
Collaborator Author

Please see PR #261.

@pombredanne
Copy link
Member

@sschuberth #261 has been merged. Thank you!

@sschuberth
Copy link
Collaborator Author

Thanks. I've verified it to be working as expected. So from my point of view this issue could be closed.

The only thing that leaves me a bit concerned is that it seems as if your added rule is very specific, i.e. it needs to match the license text exactly. Isn't there a way to make the test at least slightly more generic WRT the wording? Or will all of that be part of the approximate license detection?

@pombredanne
Copy link
Member

@sschuberth Approximate license detection will be handling the fuzzy parts. I am planning to merge this in develop this week... hopefully!

And as a general "rule" the more specific a rule is, the better. This sounds counter intuitive at first. But the general approach of approximate detection is an index-assisted sequence alignment kinda similar to BLAST or an inverted index-assisted diff. Exact matches will be found exactly and approximate matches are approximate ... but still found. Trying to guess what part should be "generic" works against the algorithm... so much so that I may drop entirely the need and the support for "templatized" rules parts with {{some text that can change}} markup. This does no longer seem to be of much value in that new context.
`

@pombredanne
Copy link
Member

FWIW, the last task on approximate detection ATM for me is refining the tests and rule data: it detects so much better that there are a lot of tests that proved to be incomplete, where expectations were not comprehensive enough.

@balusarakesh
Copy link
Collaborator

issue fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants