Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new flag in License Data Model definition #2548

Merged
merged 24 commits into from
Aug 6, 2021

Conversation

akugarg
Copy link
Collaborator

@akugarg akugarg commented Jun 11, 2021

Signed-off-by: akugarg akanksha.garg2k@gmail.com

Fixes #2549

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

@AyanSinhaMahapatra
Copy link
Member

@akugarg btw when you link the issue using the keywords it'll be closed. This is a general big-picture issue, so feel free to create smaller issues and link to them.

Also, the commit messages are the same here and they should be more specific and different IMO. Atleast:

  • Add is_unknown flag in data model
  • Add is_unknown flag to data files of unknown rules

If you could rebase to change the messages that'd be great.

akugarg added 2 commits June 11, 2021 18:23
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@akugarg
Copy link
Collaborator Author

akugarg commented Jun 11, 2021

@AyanSinhaMahapatra I have made the requested changes.

@AyanSinhaMahapatra
Copy link
Member

AyanSinhaMahapatra commented Jun 11, 2021

  1. Did you do this with the script checking if the license-expression in .yml file has unknown in it?

From discuss@gitter
Akanksha Garg:
are there instances where license can be considered unknown and may not be indicated in file name?

Ayan Sinha Mahapatra:
File name is just a naming convention. The key value pairs in the .yml files is what are used in scancode, and not the filename. And these are structured .yml files so faily easy to read and search in them specificly.

I see two issues:

  • I see cases of adding is_unknown flag in false-positive rules (i.e. is_false_positive flag set to true True), which should not be the case.
  • There are files which don't have unknown in their names, but have unknown in their license expressions also. These are files with is_license_intro set to True and some other cases.
  1. Paste the script here for future references.

Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@akugarg
Copy link
Collaborator Author

akugarg commented Jun 11, 2021

import os, glob

os.chdir("git/scancode-toolkit/src/licensedcode/data/rules")
print("Changed")
to_find="unknown"
fp="is_false_positive: yes"
for file in glob.glob("*.yml"):
f=0
file1 = open(file,"r+")
for line in file1:
if fp in line:
break
else:
if to_find in line:
file1.write("is_unknown: yes")
break

Used the above script for finding the "unknown" tag in license expression and for ignoring the false-positive case.
@AyanSinhaMahapatra Hope this works.
I have not yet added for category "unknown-spdx"

Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@AyanSinhaMahapatra
Copy link
Member

@akugarg btw, in addition to processing the unknown-spdx case, we also need to

  1. add the is_unknown flag to the License class also, like in the Rule, and add them to the respective .yml files in the licenses folder. Like here
  2. Add a test to the validate_licenses funtion, (which is run when loading the licenses/rules) as a sanity check to make sure every rule with is_unknown set true has unknown in it's license expression. This means adding the test to the Rule.validate() function here: https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/models.py#L866

@akugarg
Copy link
Collaborator Author

akugarg commented Jun 14, 2021

For adding to license class I have to add similar tag like ones present here https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/models.py#L102 set to default = false
right?

@AyanSinhaMahapatra
Copy link
Member

Yes!

akugarg added 2 commits June 14, 2021 13:27
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@akugarg
Copy link
Collaborator Author

akugarg commented Jun 15, 2021

When I am trying to add flag for license class and trying to run a scan for samples directory it is showing this error..
error
For this https://github.com/nexB/scancode-toolkit/blob/develop/samples/JGroups/LICENSE

@AyanSinhaMahapatra
Copy link
Member

@akugarg you could push your changes for addition of the flag to License Model if you want us to have a look at it (there's no point guessing where the failure is without the code), and always run the specific tests for licensedcode first and fix failures there before actually running a whole scan.

Also, you could also try to fix your tests first for the addition of is_exception to the Rule model only, instead of trying to debug both tasks at once.

@akugarg
Copy link
Collaborator Author

akugarg commented Jun 15, 2021

OKay! I will first fix the test cases for flag addition to rule class only.

Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@akugarg
Copy link
Collaborator Author

akugarg commented Jun 16, 2021

@AyanSinhaMahapatra Please have a look at this once!

akugarg added 2 commits June 17, 2021 14:59
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@AyanSinhaMahapatra
Copy link
Member

@akugarg The test expectations need to be changed in many cases like this. Btw, note that you can set regen=True in the tests to regenerate and then check the diff carefully, to see if the changes make sense, it's faster. Be careful to not commit regen as True, and do this one function at a time.

There's also another test to be added, which fetches the corresponding license object for is_unknown rules and checks if the is_unknown falg for the license is set True (and category: Unstated License),

We're nearly there! 👍

@akugarg akugarg force-pushed the add_new_flag branch 2 times, most recently from 1100c2e to e62be42 Compare June 20, 2021 04:41
akugarg and others added 3 commits June 20, 2021 10:13
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@akugarg
Copy link
Collaborator Author

akugarg commented Jul 1, 2021

In order means that instead of having the flag added at end in YAML files as done here we should have inserted in between (may be after license expression or some flags) according to as defined in object definition right?
Also can I proceed as follows :
add this flag above other flags instead in definition itself.
then add this flag after license expressions for the required files.

@AyanSinhaMahapatra
Copy link
Member

AyanSinhaMahapatra commented Jul 6, 2021

@akugarg I think you should instead use load_rules() and then check for unknown in license expression explicitly, set the is_unknown flag to be True for them, and then use Rule.dump() to overwrite the rules that should have is_unknown as True. This should take care of both the order and missing line returns.

There are similar functions for licenses also.

You should also merge to latest before doing this (delete the commits adding the is_unknown flag to .yml files by rebasing, merge to latest develop, and then add the changes as suggested), some more unknown rules were added.

akugarg and others added 2 commits July 7, 2021 12:32
Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
@pombredanne
Copy link
Member

@akugarg this is ready to merge imho but you have one failing test

@akugarg
Copy link
Collaborator Author

akugarg commented Jul 15, 2021

@pombredanne I am not able to figure out what's causing this failing check or if it is something related to what I added.

@akugarg akugarg force-pushed the add_new_flag branch 6 times, most recently from db4dcd7 to a4194e6 Compare July 22, 2021 12:11
@akugarg
Copy link
Collaborator Author

akugarg commented Jul 22, 2021

@AyanSinhaMahapatra
All green! :)

@akugarg akugarg requested a review from pombredanne July 22, 2021 12:59
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now.

src/scancode/api.py Outdated Show resolved Hide resolved
src/scancode/api.py Show resolved Hide resolved
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few nits for your consideration, but this is good to go otherwise! Thanks

src/licensedcode/models.py Outdated Show resolved Hide resolved
src/scancode/api.py Outdated Show resolved Hide resolved
src/scancode/api.py Show resolved Hide resolved
This reverts commit a1843cf.

Signed-off-by: akugarg <akanksha.garg2k@gmail.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!
Merging

@pombredanne pombredanne merged commit 0dd52df into aboutcode-org:develop Aug 6, 2021
@akugarg akugarg deleted the add_new_flag branch August 8, 2021 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new is_unknown flag in data model definition
3 participants