Skip to content

Latest commit

 

History

History
111 lines (81 loc) · 7.12 KB

README.md

File metadata and controls

111 lines (81 loc) · 7.12 KB

FSF License Metadata API

The FSF is interested in having the SPDX expose some of its metadata in the SPDX license list. The cleanest way to do that is to have the FSF provide their annotated license list in a format that is more convenient for automated tools. For example, the OSI provides an API which, while currently non-canonical, provides convenient access to OSI license annotations.

This repository scrapes the FSF list and provides the scraped data in a JSON API for others to consume. Ideally we'll hand this repository over to the FSF once they're ready to maintain it, or we'll deprecate this repository if they decide to provide a different API.

Endpoints

You can pull an array of identifiers from https://wking.github.io/fsf-api/licenses.json.

You can pull an object with all the license data https://wking.github.io/fsf-api/licenses-full.json.

You can pull an individual license from a few places:

License properties

Licenses have the following properties:

  • id: a short slug identifying the license. In licenses-full.json, this is information is in the in root object key and not duplicated in the value.

  • name: a short string naming the license.

  • uris: an array of URIs for the license. The first entry in this array will always be an entry on the the FSF's HTML page. The order of the remaining entries is not significant.

  • tags: an array of FSF categories for the license. The FSF currently defines the following categories:

  • identifiers: an object with mappings to other license lists. This API currently attempts to maintain the following mappings:

    • spdx: For licenses with SPDX IDs, the spdx value will hold an array of SPDX identifiers. Licenses may have multiple SPDX entries when SPDX list defines per-grant IDs that share the same license (e.g. GPL-3.0-only and GPL-3.0-or-later). The first entry in the SPDX array is the one that most closely matches the FSF license. For example, the FSF's GNUGPLv3 text has:

      However, most software released under GPLv2 allows you to use the terms of later versions of the GPL as well.

      and the GPLv3 text suggests an “any later version” grant, so GPL-3.0-or-later is the first SPDX identifier, GPL-3.0-only is the second, and the deprecated GPL-3.0 is the third.

Caveats

There are currently some hacks in the pulling script:

  • SPLITS, which:

  • IDENTIFIERS, which maps FSF identifiers to other schemes. Ideally this would be based on automated license-text comparison, but in order for that to work this API would have to expose the license text that the FSF considered for each ID. Currently, the FSF's HTML page links to license source, but not in a consistent enough way for me to extract the text.

  • TAG_OVERRIDES, which sets tags where the human-readable text on the FSF's annotated list has more detail than the easily-machine-readable content. For example, the FSF currently only distinguishes between gpl-2-compatible and gpl-3-compatible in text, so licenses that are only compatible with one or the other need tag overrides.

Until these hacks are addressed, license IDs and the tags and identifiers fields should be taken with a grain of salt.

Contributing

Contributions are welcome!