-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended Metadata for License Policy Development #230
Comments
I'll work on this |
@vishwak381 Thanks for picking this up. I think the first step will be to have a look at the references provided in the original issue description and identify what all license related metadata can be aggregated from open sources. For example, can we find out if
We need to first define what are the different useful metadata for a license and then gather the data as a JSON document. This will lay the foundation for enabling richer policy decision based on this data. |
license_metadata.jsonI defined some different useful metadata for a license and then gathered the data as a JSON document. |
@vishwak381 Thanks. Can you define the following please?
I think we need some information about it before we can publish this. |
OSI Approved LicensesOpen source licenses are licenses that comply with the Open Source Definition – in brief, they allow software to be freely used, modified, and shared. To be approved by the Open Source Initiative (also known as the OSI) a license must go through the Open Source Initiative’s license review process. FSF Approved LicensesThe FSF classifies licenses according to these criteria:
FSF-approved only means that the license qualifies as free software license (according to the Free Software Definition). Compatibility with other licensesIf we want to combine two free programs into one, or merge code from one into the other, we can think that whether their licenses either allow combining them, nor prohibit combining them. For (e.g). we can see that the apache license 2.0 is incompatible with GPL-2.0.This means that you can not combine or redistribute code licensed underneath the Apache License 2.0 with code licensed below GPL-2.0. The phrases of the two licenses conflict, making it legally problematic to apply them together. Sourceshttps://opensource.org/osd |
@vishwak381 Thanks. This is good. I think we are good to go ahead in creating the same for all licenses as available here: https://github.com/spdx/license-list-data/blob/main/json/licenses.json Let us go ahead and create a single JSON file similar to the above example but with our additional metadata. I am guessing we may discover additional metadata that may be useful so our JSON schema will extend in future. Thats the reason to have our own JSON schema extending the SPDX license JSON. I created a spec for the JSON. You can write Python code to generate the JSON as per the spec. This will ensure our JSON is always correct and avoid mistakes. See here as example and let me know in case of any issue: https://github.com/safedep/examples/tree/main/sdk/license-meta-python |
@vishwak381 Can you check GitHub licenses API? They seem to provide various metadata related to open source licenses. May be you can verify your generated JSON using data from there https://docs.github.com/en/rest/licenses/licenses?apiVersion=2022-11-28#get-a-license |
@abhisek can this be assigned to me ? |
@insaaniManav I have assigned this to you. I think this issue will require multiple PRs. We should probably start by doing a survey of different data sources that can help us build License metadata JSON. We need rich information about a license code such as
This will help users of |
@abhisek |
@insaaniManav Consider this: I am adding a new library in my application. My application is a SaaS application.
How can What do you think? |
That sounds really nice , let me get a preview of such a json {
"license_name": "AFL-1.2",
"spdx_id": "AFL-1.2",
"summary": "Academic Free License 1.2 allows distribution and modification with some conditions on sublicensing and attribution.",
"permissions": {
"commercial_use": true,
"modification": true,
"distribution": true,
"sublicensing": false,
"private_use": true,
"reverse_engineering": false
},
"limitations": {
"liability": true,
"warranty": true,
"patent_use": false,
"trademark_use": false
},
"obligations": {
"attribution": true,
"disclosure_source": true,
"same_license": false,
"network_use_disclosure": false,
"state_changes": true
},
"usage_context": {
"recommended_for_saas": false,
"risk_level": "Medium",
"notes": "AFL-1.2 includes provisions that may restrict certain SaaS applications, particularly around sublicensing. Consult an expert if unsure."
},
"popular_use_cases": [
"Academic software",
"Personal projects",
"Non-SaaS commercial applications"
],
"links": {
"full_text": "https://opensource.org/licenses/AFL-1.2",
"choosealicense": "https://choosealicense.com/licenses/afl-1.2/",
"further_reading": [
"https://opensource.org/licenses/AFL-1.2",
"https://tldrlegal.com/license/academic-free-license-v1.2-(afl-1.2)"
]
},
"compliance_tips": [
"Provide attribution in application documentation or UI.",
"Disclose source code if modifications are made and distributed.",
"Avoid sublicensing under a different license without consulting legal advice."
]
} @abhisek how does this JSON look ? provided we can enrich this for popular licenses |
@insaaniManav Looks great. Is this a sample or did you find any data source for this? For {
"academic_software": true,
"personal_projects": true,
"commercial_saas": false
} However, if It should also be consistent with |
@abhisek this is just a sample there is no rock solid data source that could provide all the fields to us , I will need to manually list all the licenses and fill all these fields as per my understanding of the text, what list of licenses should we support ?
yup that I understand that it should be consistent this was just a sample list of attributes wanted an idea if the keys are okay |
@insaaniManav Do not do this manually. Too much effort and not scalable i.e. cannot repeat when we need more attributes. Do a survey of data sources and see what we can automate for now even if all data points are not available. Also we cannot manually fill these based on license text. It requires legal expertise. |
Then I think our best bet is the github licenses api , spdx provides us with text info but the most structured info is provided by the github api itself , or maybe we can somehow combined the spdx list with github list |
@abhisek we can also look at https://github.com/github/choosealicense.com/tree/gh-pages/_licenses for some extra metadata |
@insaaniManav I agree. Good start is to aggregate data sources like GitHub license API and others. See how close we can reach to our target JSON based on these data sources. Once we have reasonable data sources that solves user problems, at least partially we will automate the process of generating our JSON based on such data sources. |
Great source available from Scan Code Project |
Current
vet
supports writing simple policies to block packages with licenses identified by the SPDX code e.g.GPL-2.0
. However, background knowledge is required to identify permissiveness of different licenses. Also there is a "context" in which a license is applicable. For example, certain licenses may be suitable for use in SaaS applications but may not be suitable for use in a statically linked single artefact binary, commercially shipped to end customers.To make licensing checks related use-cases more useful, we need to
References to look at
How to go about it
vet
or fetched from API in futureThe text was updated successfully, but these errors were encountered: