-
-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom License Rules folder #2471
Comments
This feature should be useful for many ScanCode users. |
This makes sense. @richardfontana requested this feature in #480 and I reckon I have been slow to act as I was fearing fragmenting the database of licenses. In hindsight, this is unlikely a (unexpressed) valid concern I had then. Now if these are just a few proprietary license and headers, it could be well worth adding them to scancode anyway. And to implement this feature here are some thoughts: A) the base approach to get these the extra rules in scancode:
I am leaning towards 2. as otherwise this may be complicated to deploy this. B) how these rules and license would be consumed:
I am not sure which is best. @tardyp We could have a quick live session to iron out a path! |
I didn't see #480, as I only focused my search on the keyword RULES. I like what I see there, especially the idea from @DennisClark to automatically create this custom folder based on Unknown License findings. In my first scans with scancode, we end up with big pile of unknown license, which is normal as we want to use scancode to make sure our proprietary software is not mixed up with open-source, and that our devs use packaging techniques to compose software. I spent some time yesterday to experiment with the source code of scancode, and indeed dicovered the huge license library and the need to cache the index. I am not sure if for custom license there is really a usecase where those number will be so big that they need to be cached as well. The needed cache module refactoring seems quite scary to me. What I like with secondary index is that we could skip primary matching all together if the secondary index match score is high enough. This could open the path to a quick scan mode that we could put in the pre-commit CI. |
FWIW, any incorrect detection is treated as a bug (so tickets are mucho welcome!) AND @AyanSinhaMahapatra 's https://github.com/nexB/scancode-analyzer/ is a new, emerging tool to spot and potentially fix these issues using multiple approaches including some ML.
No worries there, it's not that complicated
Question: if you were to use a secondary index in your case, would you see an exclusive us of that index for a given scan run and not the main one? or would you see the use of boths at the same time? |
I don't say it is incorrect detection, as those are mostly files, which are our proprietary license, and I don't expect scancode to magically detect it.
I would use both. For each file, if the secondary index detects with 100% score that this is our copyright, don't bother run the rest of the rules. |
Hey May I know how scancode-toolkit create dataset for agent |
Hi @codeakki , |
I am closing this in favor of the older #480 |
Short Description
Our internal code has copyright headers that we would like to properly categorize.
We don't think it make sense to upstream those rules, and we want to avoid forking scancode.
Thus we would like to add an option to scan code to provide a folder path which would contain custom .yml + RULE files.
Possible Labels
Select Category
How This Feature will help you/your organization
This would help us to use scancode to categorize proprietary code we get from subcontractors
Possible Solution/Implementation Details
User would say
scancode -clip --json-pp --custom_licenses=/path/to/licenses --custom_rules=/path/to/rules - path/to/code
Can you help with this Feature
We are willing to provide a PR for this feature
The text was updated successfully, but these errors were encountered: