-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing a single word in a license shows as exact match when it should show a difference #140
Comments
I did some testing and the spdx_license_matcher/computation.py score calculated on line 31 is 1.0 even though there are some changes. One possible solution is to run the SPDX tools java compare even on the "perfect matches" to make sure it really matches. This could be simplified by just treating all matches as close matches and checking all licenses. Another approach would be to improve the accuracy so that a single word change causes the score to be less than 1.0 and changing the limit to 1.0. @Ugtan Let me know what you think. |
@goneall I haven't got a chance to look at it but I will definitely do after the end of this week(I am having exams this week). But as far as I could remember the copyright text is not considered as substantial based on the SPDX guidelines that might be the reason as the copyright is ignored and hence the changes are ignored as well. But I do think there is a possibility of improving the process of matching and I will try to look at the best possible solution in some time soon. Thanks |
True - the problem can also be duplicated by changing one word in the body of the license text outside the copyright. The code will come back with a 1.0 score and a match even though there is a difference in the license text. Since we are dealing with licenses where a single word can make a very big difference (e.g. inserting the word "not"), we need to make sure they don't show up as matching. BTW - the matching algorithm depends on the license XML marking copyright text as |
You are right. The matching algorithm definite needs some necessary tweaks to be able give accurate results.
This might require a bit more involvement but I would definitely take the problem into consideration as the I will try my best to work on the issue(it will require a bit more involvement) but the problem I'm currently facing is the lack of time. So, it might take a while. @goneall |
The Java license matching already takes into account the alt/optional tags. You are correct in that it is a lot more involved - it took me a few tries to get the Java version right. If you can fix the code so that it would return a score of less than 1.0 when even one word changes, it should work. This is because the Java matching code would be called if it has a high enough score but is less than 1.0. The Java code will only match if it complies with the alt/optional markup. With the current implementation returning a score of 1.0 when there is a single word different, it matches and the Java code doesn't get invoked. |
Oh.. Makes sense. I will try to improve the matching algorithm and I will ping you If I need help with something. Thanks for the clarification. :) |
@Ugtan Ping - do you have some bandwidth to work on this issue? I want to get this issue resolved before deploying. |
Hello @goneall, I do want to work on the issue. Tomorrow is my last exam of the semester and I think I will get enough time to work on the issue from the day after tomorrow. |
@Ugtan Pinging you on status of the issue - let me know how it is going. |
@Ugtan Are you working on this one? Or should we open it for new contributors? |
Hello @rtgdk. I'm indeed working on the issue. Sorry, it took me a while to write it all over again. I will try to push the changes by tomorrow. |
@Ugtan Pinging you again on the issue - we could open this for others to work on if you don't have the bandwidth |
@rtgdk - I take it we should leave this one open? |
To reproduce, create a new license request and copy/paste the text for the Apple MIT License. Then change the copyright from
Copyright: Copyright (c) 2006 by Apple Computer, Inc., All Rights Reserved.
toCopyright: Me, All Rights Reserved.
. It should show a close match and give you the option of submitting an issue or a new license request.The text was updated successfully, but these errors were encountered: