-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store tags in the licenses.json index? #1
Comments
On Fri, Oct 20, 2017 at 08:19:27PM +0000, goneall wrote:
Please add a field on the JSON representing if an FSF license is
considered Free / Libre.
Already there [1], for example:
$ curl -s https://wking.github.io/fsf-api/Expat.json | jq .tags
[
"glp-compatible",
"libre"
]
That information isn't in licenses.json, since that's trimmed down to
only provide an index [2]. That keeps the size manageable as the
per-license metadata grows (e.g. to include the full license text,
if/when we start providing that).
[1]: https://github.com/wking/fsf-api/blob/bf13c0196caf5711242081a221ab6563fa5a055c/pull.py#L20-L26
[2]: https://github.com/wking/fsf-api/blob/bf13c0196caf5711242081a221ab6563fa5a055c/pull.py#L157-L164
|
Please re-open if there's still something missing, or if you just want to kick around how/where to store this information. |
Got it - I was just looking at the index. Would you be OK adding the tags to the index? It would save me quite a few file opens and a bit of code complexity. |
Now, if only I could figure out how to re-open ;) |
From the Stack Overflow post it looks like I can't re-open the issue if I didn't close it. |
On Fri, Oct 20, 2017 at 02:42:03PM -0700, goneall wrote:
Got it - I was just looking at the index. Would you be OK adding
the tags to the index? It would save me quite a few file opens and
a bit of code complexity.
This is for a pull into spdx/license-list-data, right? Can you point
me at your WIP code so I can take a stab at the “fetch tags from the
per-license JSON” approach? Then if you don't like what I come up
with, I'll add them to the index here.
|
I'm planning on updating the License Generator to read the JSON file from the repository. The files is in the middle of some major refactoring (it has gotten a bit out of hand with the addition of several new output formats). I'll add a comment once I check in the refactored code. I'll also add a hook for updating the FSF Libre flag. |
Just updated the code repository for the tools. I added a class specifically to parse any FSF data FsfDataParser. Not much in the class yet - just a hook. I also added hooks to pull external data for both licenses and exceptions during the license processing, which should make it easier to add OSI checking later. |
BTW - I don't mind doing a file by file parsing if you're opposed to adding the tags in the index. It would just be easier for me to process one file (and it may be easier for other users of the metadata as well). |
I've filed spdx/tools#112 filling that in.
I agree, and as you point out in #5, the size increase by including tags is only around 55%. I'm less concerned about that specific bump, and more concerned about starting down a slippery slope. Scoping the index as “only what you need to find FSF ID(s) (for a given name or SPDX ID)” is more objective than trying to decide on a per-property basis which properties will be needed by aggregators. Of course, I may have already started down that slippery slope by including |
Note: Moving conversation from spdx/tools#112 (comment) I understand the desire to have a "clean" implementation of an API, but the same argument could even be made for storing the SPDX ID. That being said, my opinion is that we should include the flag in the index for practical reasons. Can you describe any practical issues with including the flag in the index (e.g. performance or maintenance issues)? And do you believe these issues outweigh the practical issues with not including the information in the index? There are several examples of indexes and file system directory structures where metadata is stored in the index itself. One other consideration, the fsfLibre is a core piece of FSF information and, in my opinion, the main reason for the FSF list. Therefore it should be included in the index. |
On Thu, Oct 26, 2017 at 01:06:49PM -0700, goneall wrote:
Can you describe any practical issues with including the flag in the
index (e.g. performance or maintenance issues)?
I don't see a practical issue with including tags in the index, but I
do see size/efficiency issues if we grow the fat index used by the OSI
[1,2,3,4,5]. license-list-XML (with all our data) currently has
around 4.5MB in src/. The FSF is unlikely to have opinions on as many
licenses as we carry, but still, a fat index will be much bigger than
#6's 2.2kB.
My issue with including tags (or as in #6, even names and SPDX IDs),
is that it's very hard to draw a clear line about what information is
important enough to go into the index, and what information is
peripheral enough to be kept out. If we let tags in now, we have to
repeat this discussion for each property when someone else comes along
asking to have it added to the index.
And do you believe these issues outweigh the practical issues with
not including the information in the index?
With #6, figuring out the FSF metadata for any license is a single
request (e.g. to [6]). I expect the main index consumers will be:
* Pages like [7], if the FSF decides to render that from the API
(currently we render the API from their list, but with a canonical
API it would be DRYer to go the other way).
* Tools constructing FSF ↔ whatever ID mappings.
That latter case will need access to the full text of the license, and
that's what I'm concerned about stuffing into the index. Folks who
are interested in a fat index can still reconstruct it on their side
with:
FIRST=1;
echo '{' >fat-index.json
for LICENSE in $(curl -s https://wking.github.io/fsf-api/licenses.json | jq -r '.[]')
do
if test -z "${FIRST}"
then
echo ',' >>fat-index.json
fi
FIRST=
echo -n "\"${LICENSE}\": " >>fat-index.json
curl -s "https://wking.github.io/fsf-api/${LICENSE}.json" >>fat-index.json
done
echo '}' >>fat-index.json
or similar, which allows them to trim out anything they aren't
interested in. That's a few hundred requests, but you only have to
make them once (whenever you decide to compile your local fat index).
A more modern approach to this would be to serve the metadata with
GraphQL [8] or similar, so consumers could ask for what they want
explicitly. But that seems over-engineered for data the size of this
license list.
[1]: #6 (comment)
[2]: https://github.com/OpenSourceOrg/api/blob/c903651ef26c35202d6561b61b97d29ead1e08c5/doc/endpoints.md#licenses
[3]: https://github.com/OpenSourceOrg/api/blob/c903651ef26c35202d6561b61b97d29ead1e08c5/api.go#L52
[4]: https://github.com/OpenSourceOrg/api/blob/c903651ef26c35202d6561b61b97d29ead1e08c5/reload.go#L28
[5]: https://github.com/OpenSourceOrg/api/blob/c903651ef26c35202d6561b61b97d29ead1e08c5/license/license.go#L67
[6]: https://wking.github.io/fsf-api/spdx/MIT.json
[7]: https://www.gnu.org/licenses/license-list.html
[8]: http://graphql.org/
|
I don't think drawing the line is that difficult. If the information is small and is high use, it should be included in the index (and also in the details). If the information is large, it should only be present in the details. I concede that small and large are judgement calls, but I don't see that being a big issue. License text is clearly a large amount of data and should not be in an index. This is the approach we took on the license list JSON format and I have not heard any of the users of that format raising concerns on the information stored at the index or detail level once the detail level was added for capturing the license text and template data. We could take another approach and have 2 indexes - one small and one large containing the extra data. I still would not include the license text in the larger file, but all other metadata could be added. |
Done via #6. |
Add BSD-2-Clause to the FreeBSD GNU Identifier
Please add a field on the JSON representing if an FSF license is considered Free / Libre.
The licenses listed on the Free Software Foundation License List have the following designations:
The Free / Libre designation would be a license identified as 1, 2, or 3 in the above list.
The text was updated successfully, but these errors were encountered: