Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended Metadata for License Policy Development #230

Open
abhisek opened this issue Jul 12, 2024 · 19 comments
Open

Extended Metadata for License Policy Development #230

abhisek opened this issue Jul 12, 2024 · 19 comments
Assignees
Labels
enhancement New feature or request

Comments

@abhisek
Copy link
Member

abhisek commented Jul 12, 2024

Current vet supports writing simple policies to block packages with licenses identified by the SPDX code e.g. GPL-2.0. However, background knowledge is required to identify permissiveness of different licenses. Also there is a "context" in which a license is applicable. For example, certain licenses may be suitable for use in SaaS applications but may not be suitable for use in a statically linked single artefact binary, commercially shipped to end customers.

To make licensing checks related use-cases more useful, we need to

  1. Provide metadata along with SPDX code of the license
  2. Provide common use-cases (e.g. SaaS) metadata related to permissiveness of the license

References to look at

How to go about it

  • Create a protobuf message to represent a license with its metadata
  • Extend the protobuf spec for filter input with license metadata
  • Embed JSON or other data files to be shipped along with vet or fetched from API in future
  • Create a package to access available licenses (repository pattern) data without coupling with data source
@abhisek abhisek added the enhancement New feature or request label Jul 12, 2024
@vishwak381
Copy link

I'll work on this

@abhisek
Copy link
Member Author

abhisek commented Jul 31, 2024

@vishwak381 Thanks for picking this up.

I think the first step will be to have a look at the references provided in the original issue description and identify what all license related metadata can be aggregated from open sources. For example, can we find out if Apache License 2.0 is

  • Compatible with SaaS
  • Compatible with commercial usage
  • Restrictions
  • Etc.

We need to first define what are the different useful metadata for a license and then gather the data as a JSON document. This will lay the foundation for enabling richer policy decision based on this data.

@vishwak381
Copy link

eb2a0b1f-cff5-42c5-b6a9-5ed63eb5252f

license_metadata.json

I defined some different useful metadata for a license and then gathered the data as a JSON document.

@abhisek
Copy link
Member Author

abhisek commented Aug 4, 2024

@vishwak381 Thanks. Can you define the following please?

  1. What does is_osi_approved and is_fsf_approved means?
  2. What does compatibility with other licenses mean?

I think we need some information about it before we can publish this.

@vishwak381
Copy link

vishwak381 commented Aug 4, 2024

OSI Approved Licenses

Open source licenses are licenses that comply with the Open Source Definition – in brief, they allow software to be freely used, modified, and shared. To be approved by the Open Source Initiative (also known as the OSI) a license must go through the Open Source Initiative’s license review process.

FSF Approved Licenses

The FSF classifies licenses according to these criteria:

  • Whether it qualifies as a free software license.

  • Whether it is a copyleft license.

  • Whether it is compatible with the GNU GPL.Unless otherwise specified, compatible licenses are compatible with both GPLv2 and GPLv3.

  • Whether it causes any particular practical problems.

FSF-approved only means that the license qualifies as free software license (according to the Free Software Definition).

Compatibility with other licenses

If we want to combine two free programs into one, or merge code from one into the other, we can think that whether their licenses either allow combining them, nor prohibit combining them.
There is no problem merging programs that have the same license, if it is a reasonably behaved license, as nearly all free licenses are.
What then when the licenses are different? In general we say that several licenses are compatible if there is a way to merge code under those various licenses while complying with all of them. The result, often, is a program with parts under various different compatible licenses—but not always. Such combinability, or the absence of it, is a characteristic of a given set of licenses, and is not dependent on what order you mention them in. The set of licenses also controls which license is required for the combined program.

For (e.g). we can see that the apache license 2.0 is incompatible with GPL-2.0.This means that you can not combine or redistribute code licensed underneath the Apache License 2.0 with code licensed below GPL-2.0. The phrases of the two licenses conflict, making it legally problematic to apply them together.

Sources

https://opensource.org/osd
https://www.gnu.org/licenses/license-list.en.html
https://www.apache.org/licenses/LICENSE-2.0
https://www.gnu.org/licenses/gpl-3.0.en.html

@abhisek
Copy link
Member Author

abhisek commented Aug 5, 2024

@vishwak381 Thanks. This is good. I think we are good to go ahead in creating the same for all licenses as available here:

https://github.com/spdx/license-list-data/blob/main/json/licenses.json

Let us go ahead and create a single JSON file similar to the above example but with our additional metadata. I am guessing we may discover additional metadata that may be useful so our JSON schema will extend in future. Thats the reason to have our own JSON schema extending the SPDX license JSON.

I created a spec for the JSON. You can write Python code to generate the JSON as per the spec. This will ensure our JSON is always correct and avoid mistakes. See here as example and let me know in case of any issue:

https://github.com/safedep/examples/tree/main/sdk/license-meta-python

@abhisek
Copy link
Member Author

abhisek commented Aug 30, 2024

@vishwak381 Can you check GitHub licenses API? They seem to provide various metadata related to open source licenses. May be you can verify your generated JSON using data from there

https://docs.github.com/en/rest/licenses/licenses?apiVersion=2022-11-28#get-a-license

@insaaniManav
Copy link
Member

@abhisek can this be assigned to me ?

@abhisek abhisek assigned insaaniManav and unassigned abhisek and vishwak381 Nov 4, 2024
@abhisek
Copy link
Member Author

abhisek commented Nov 4, 2024

@insaaniManav I have assigned this to you. I think this issue will require multiple PRs. We should probably start by doing a survey of different data sources that can help us build License metadata JSON. We need rich information about a license code such as Apache-2.0 that can be used for a better policy decisions. For example:

  • Can I use libraries with License X in my SaaS backend?
  • Can I use libraries with License X in my frontend?
  • Can I statically compile libraries with License X (typical case is Go/Rust where most of the libraries are built from source)

This will help users of vet to write better policies and not have to be an expert in OSS licenses to decide if it is acceptable in their own software project.

@insaaniManav
Copy link
Member

insaaniManav commented Nov 4, 2024

@abhisek
so current scope is the github list of licenses , for every license lets first look at the fields there will in the json and then we can create a script to generate that ? works ?

@abhisek
Copy link
Member Author

abhisek commented Nov 4, 2024

@insaaniManav Consider this:

I am adding a new library in my application. My application is a SaaS application. vet finds that the library uses license AFL-1.2 (available in SPDX license list). I don't know if AFL-1.2 can be used in my SaaS application. Now I have following choices

  1. Read the entire license text and figure out if it is allowed in my app
  2. Consult some website like https://choosealicense.com/
  3. Ask someone who is an expert in OSS licenses

How can vet make life easier for this persona by having the necessary information for AFL-1.2 available to make a decision without being a licensing expert. Ofcourse we cannot have all possible data to answer all possible questions. So we should probably do a bit of survey and see what all information is publicly available and aggregate them in a single JSON for use by vet.

What do you think?

@insaaniManav
Copy link
Member

insaaniManav commented Nov 6, 2024

That sounds really nice , let me get a preview of such a json

{
  "license_name": "AFL-1.2",
  "spdx_id": "AFL-1.2",
  "summary": "Academic Free License 1.2 allows distribution and modification with some conditions on sublicensing and attribution.",
  "permissions": {
    "commercial_use": true,
    "modification": true,
    "distribution": true,
    "sublicensing": false,
    "private_use": true,
    "reverse_engineering": false
  },
  "limitations": {
    "liability": true,
    "warranty": true,
    "patent_use": false,
    "trademark_use": false
  },
  "obligations": {
    "attribution": true,
    "disclosure_source": true,
    "same_license": false,
    "network_use_disclosure": false,
    "state_changes": true
  },
  "usage_context": {
    "recommended_for_saas": false,
    "risk_level": "Medium",
    "notes": "AFL-1.2 includes provisions that may restrict certain SaaS applications, particularly around sublicensing. Consult an expert if unsure."
  },
  "popular_use_cases": [
    "Academic software",
    "Personal projects",
    "Non-SaaS commercial applications"
  ],
  "links": {
    "full_text": "https://opensource.org/licenses/AFL-1.2",
    "choosealicense": "https://choosealicense.com/licenses/afl-1.2/",
    "further_reading": [
      "https://opensource.org/licenses/AFL-1.2",
      "https://tldrlegal.com/license/academic-free-license-v1.2-(afl-1.2)"
    ]
  },
  "compliance_tips": [
    "Provide attribution in application documentation or UI.",
    "Disclose source code if modifications are made and distributed.",
    "Avoid sublicensing under a different license without consulting legal advice."
  ]
}

@abhisek how does this JSON look ? provided we can enrich this for popular licenses

@abhisek
Copy link
Member Author

abhisek commented Nov 6, 2024

@insaaniManav Looks great. Is this a sample or did you find any data source for this?

For popular_use_cases, it will be good to standardise the keys if there is a comprehensive list of such keys. Something like:

{
   "academic_software": true,
  "personal_projects": true,
  "commercial_saas": false
}

However, if popular_use_cases are just suggestions, then labels are good enough.

It should also be consistent with permissions attributes. For example, popular_use_cases should not mention Commercial SaaS if permissions.commercial_use is false

@insaaniManav
Copy link
Member

insaaniManav commented Nov 6, 2024

@abhisek this is just a sample there is no rock solid data source that could provide all the fields to us , I will need to manually list all the licenses and fill all these fields as per my understanding of the text, what list of licenses should we support ?
popular_use_cases - is just a suggestion list

It should also be consistent with permissions attributes. For example, popular_use_cases should not mention Commercial SaaS if permissions.commercial_use is false

yup that I understand that it should be consistent this was just a sample list of attributes wanted an idea if the keys are okay

@abhisek
Copy link
Member Author

abhisek commented Nov 6, 2024

@insaaniManav Do not do this manually. Too much effort and not scalable i.e. cannot repeat when we need more attributes. Do a survey of data sources and see what we can automate for now even if all data points are not available.

Also we cannot manually fill these based on license text. It requires legal expertise.

@insaaniManav
Copy link
Member

Then I think our best bet is the github licenses api , spdx provides us with text info but the most structured info is provided by the github api itself , or maybe we can somehow combined the spdx list with github list

@insaaniManav
Copy link
Member

@abhisek we can also look at https://github.com/github/choosealicense.com/tree/gh-pages/_licenses for some extra metadata

@abhisek
Copy link
Member Author

abhisek commented Nov 7, 2024

@insaaniManav I agree. Good start is to aggregate data sources like GitHub license API and others. See how close we can reach to our target JSON based on these data sources.

Once we have reasonable data sources that solves user problems, at least partially we will automate the process of generating our JSON based on such data sources.

@abhisek
Copy link
Member Author

abhisek commented Nov 21, 2024

Great source available from Scan Code Project
https://github.com/aboutcode-org/scancode-licensedb/tree/main/docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants