-
Notifications
You must be signed in to change notification settings - Fork 591
license field needs more guidance #196
Comments
Agreed. In particular, in the 'Further Metadata Field Guidance,' material for 'accepted values, usage notes, and example. @benbalter @waldoj - any thoughts? |
Huh. Funny that we hadn't considered this previously. I'm reminded of the town I lived in as a kid, a planned community near D.C. Famously, the Rouse Brothers, who planned the community, didn't include any cemeteries. It hadn't occurred to them that people would die. Even our example for this isn't good. If this is a URL then it seems like we'd want to point to the terms of use. I mean, if the data is actually copyrighted, period, then presumably it wouldn't be on an API. There must be some terms of uses, a license under which people can use that data, even if it's not a license in the cookie-cutter sense like Apache, GPL, MIT, Creative Commons, etc. It seems to me that we wouldn't want agencies sharing copyrighted data via an API that has no license allowing it to be reused. And if this is a text field, then I imagine we just need to agree on and document a string (e.g., |
Taking a page from CKAN, we've actually been leaning towards text box for name of license. Given the pros and cons both ways, my suggestion is to clarify in the instructions that it's a text field where the provider can write in the license or paste a URL instead. I imagine that seems heretically open to some but any thoughts on if there's another path that is a clear consensus winner that we should adopt instead? @konklone, @JoshData - any thoughts? |
If you really have to stick with just one field, then a URL seems like the right thing to ask for. You could also split it out into |
I think in the August meeting we said we'd provide (a link to?) some non-exhaustive list of possible values. Since @waldoj mentioned "terms of use" generally, I had a bit of pause before replying. What if the terms are "by downloading this data you agree to quack like a duck"? If we allow arbitrary terms, we should not include an accessURL for the dataset. If an agency wants to create a click-wrap style agreement, they should be responsible for ensuring that the user actually agrees before getting the data. Because otherwise, who's really going to view the license page to see what they are agreeing to? IANAL but this seems to differ from a copyright license agreement (or a public domain dedication). In these cases no terms apply until the user makes a copy. (Let's assume that the act of downloading doesn't constitute making a copy.) There are no terms just to download/view/analyze. So I guess what I'm saying is, the guidance should be specific about this. You can point to a URL containing a copyright license agreement. |
Thanks everyone for the helpful discussion. Sorry for the gap in the guidance here and the accidental "non-example." 😳 Seems that there is a relative consensus forming around the URL approach, which I would agree with. @waldoj, care to do a pass at pull request clarifying this in the further metadata field guidance? Thanks! |
For starters, I don't see any way around that this needs to be a URL. Licenses must be unambiguous, and anything short of an actual copy of the license (or, rather, a pointer to where it can be found) is ambiguous.
The downside of a URL is that it makes it more difficult for machine recognition of the license that's in use. That is, "GPL" is an easily recognized string, if ambiguous. But most major licenses have canonical locations. GPL 2.0? Link to |
If we accept that we need to use URLs, then that solves the issue raised by this ticket, because a copyrighted dataset can have a URL that points to the actual license. (Of course, this necessitates that the license be available online, but that's going to be functionally true no matter what—simply writing "copyrighted" in this field, if we didn't allow URLs, would raise many questions but answer none, making the data useless to people.) But it does raise a new problem of what's to be done with work that has no license—that is, it's in the public domain. What is the canonical URL for that? Should the field be defined but left blank (the embodiment of a non-license), or is that in practice going to be ambiguous (I think so)? Would it be inappropriate to encourage that non-standard licenses (i.e., copyrighted datasets) be marked up with Creative Commons Rights Expression Language RDF? That would make it possible to harvest license data that would otherwise be a black box. If anybody has any feelings about any of the points I've raised in this pair of comments, please don't be shy. Otherwise, after I let this sit for a couple of weeks, I'm inclined to start filing pull requests to act on some of these ideas, and I want to make sure that every perspective has had a fair airing first. |
Hello I am new to this discussion as I just started supporting the CDC in this effort; however I come from a pretty deep private sector data governance / data quality background so tend to view through that lens. My comment is that a blank itself is ambiguous. a blank really means that the field has not been populated with a value. That might be because the data was evaluated and there is no license ( as in common rights) but more often will mean that the data steward did not populate that field (Didn't know, wasn't available at time, the URL moved...) As I have looked at the available data from the three main open data sites I am seeing this blank more often than not. |
Agreed with @raking08. CCREL is interesting but.... I think we'd need to write more guidance on how to actually do that. It's something that I think would appear on the target URL and not in the data.json file itself, so we could add that suggestion later on too without affecting the POD schema too. Back to @waldoj's question about non-licensed federal government data. We usually talk about this data as being in the public domain, but it's only in the domestic public domain. Calling it a 17 USC § 105 exemption would be more accurate. So I think we have two options:
I'm agnostic between the choices. I have a feeling @gbinal would find the first easier to have implemented? |
Project Open Data could also make a URL whose sole purpose is to describe the state of non-licensed federal government data -- Even separately from improving the data.json schema, this would be a nice permalink for the country to have. |
I'd be a little hesitant to coin URLs on a github domain though. |
Agreed, Josh. |
That is exactly what inspired me to file #216! |
There seems to be broad consensus around this field being a URL. In the above commit, I'm proposing that the field be clarified to be a string 'URL' and link to the list of example open data licenses as potential options. I think this would address this issue specifically and would suggest a new issue expanding the list of licenses (along with their authoritative URLs) on http://project-open-data.github.io/license-examples/ |
Changes that still need to be addressed are changes in structure and should we add usage notes additions here or no?: * Adds optional describedByType field at the dataset and distribution level (#291, #332) * Changes contactPoint field to an object that contains the name (fn) and email address (hasEmail) (#358) * Adds fn field as part of contactPoint replacing earlier use of contactPoint (#358) * Changes publisher field to an object that allows multiple levels of organizations (#296) * Changes accessURL field to represent indirect access and to exist only within distribution (#217, #335) * Changes format field to a human readable description and to exist only within distribution (#272, #293) * Adds optional description field for use within distribution (#248) * Adds optional title field for use within distribution (#248) * Changes accrualPeriodicity field to use ISO 8601 date syntax (#292) * Changes distribution field to become required-if-applicable and to always contain the accessURL or downloadURL fields (#217) * Changes license field to be a URL (#196)
I'm interested in continuing this conversation. A URL is a good first step - however it's unclear what should be at the other side of that URL. As a developer, when enumerating over data, I want to have structured license information that enables me to understand the capabilities this license permits or restricts. So can the license point to a JSON such as: {
"title": "CC0",
"version": "1.0",
"description": "The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.",
"capabilities": {
"commercial_use": true,
"attribution_required": false,
"allows_redistribution": true,
"share_modifications": false
},
"logo": "https://creativecommons.org/images/deed/nolaw.png",
"link": "https://creativecommons.org/publicdomain/zero/1.0/"
} does something like that exist? |
Huh. That's such a totally obvious-in-retrospect idea that it seems like it has to exist. And yet I've never encountered it. |
The package manifest in Node-land, |
@ajturner @waldoj @konklone actually @aaronsw's spirit lives on throughout the web in part because he helped establish the licensing metadata standard for this (as a teenager have you) and it's now part of the foundation of Creative Commons that's already widely in use. @waldoj as I understand it this is the CCREL approach you and @JoshData were referring to. Given the era this came from it was originally done as RDF and is now RDFa instead of JSON, but it appears embedded in the pages found with the URLs we use for Creative Commons License Deeds. For example, the CC-BY Deed includes the following RDFa: <h3 resource="http://creativecommons.org/ns#Reproduction"
rel="cc:permits">You are free to:</h3>
<ul class="license-properties">
<li class="license share"
rel="cc:permits"
resource="http://creativecommons.org/ns#Distribution">
<strong>Share</strong> — copy and redistribute the material in any medium or format
</li>
<li class="license remix"
rel="cc:permits"
resource="http://creativecommons.org/ns#DerivativeWorks">
<strong>Adapt</strong> — remix, transform, and build upon the material
</li>
<li class="license commercial">
for any purpose, even commercially.
</li>
<li id="more-container"
class="license-hidden">
<span id="devnations-container" />
</li>
</ul> That said, it would be nice to not have to parse the RDFa out of the HTML here - and not to even have to parse the HTML in the first place. Within the HTML there is also a link to the raw RDF XML of the deed: <link rel="alternate" type="application/rdf+xml" href="rdf" /> In context, that href resolves to https://creativecommons.org/licenses/by/4.0/rdf but ideally this would be provided in alternate formats including JSON or JSON-LD and ideally could be requested directly with HTTP using Content Negotiation based on an
As @JoshData said, I think this is still pretty obscure and non-obvious, so guidance would be needed, but in practice I think most of these URLs will be used more as commonly understood unique identifiers and human readable landing pages rather than a bespoke set of conditions that need to be parsed. Of course, that thought implies that we actually would have a canonical URL for something like "U.S. Public Domain" At this point, I think we should have something like that (in addition to guidance on using alternatives like CC0 for expanding use internationally) and that it should function in a very similar manner as the Creative Commons Public Domain Mark, but be specific and limited to the U.S. jurisdiction as is the case with U.S. Government Works (Title 17 § 105) and other reasons like Title 37 § 202.1, Title 17 § 1302, or expired copyrights (Title 17 § 303). This should have a simple human readable page like Creative Commons deeds, include machine readable rights, and also link to the full legal documentation. I don't think this URL should live on the Project Open Data site since this isn't specific to data, but as we look to establish this somewhere we may need to provide an interim URL such as http://www.usa.gov/copyright.shtml or something on project-open-data.cio.gov I'm cc'ing @peterspdx here in reference to https://github.com/creativecommons/Localized-Public-Domain-Mark |
Well, this is delightful. :) Thanks for a very informative and engaging explanation, @philipashlock! |
Wow. A whole bunch of great stuff there, @philipashlock. I had no idea about license metadata, or what Creative Commons' plans for a version 2 are. Also https://github.com/creativecommons/Localized-Public-Domain-Mark looks like exactly what we'd want (though it's new and not finalized)? |
@konklone @philipashlock @JoshData yeah I believe that's in more of an idea stage right now. Diane Peters (@peterspdx) is on point. |
ICYMI, I wanted to flag the recent licensing guidance updates, created with #454 + #456. You'll notice that a new section for U.S. Government works has been added, highlighting the importance of a worldwide public domain dedication (CC-0) and providing a default URL for U.S. Government works: Interestingly, I learned today that the White House's Flickr page has separately and of their own accord been referencing the same U.S. Government Works page, see this example. Great minds point to a common temporary solution! The content of the U.S. Government Works page is a work in progress and will ideally incorporate a Creative Commons-like solution to a U.S. specific version of the Public Domain Mark or to an equivalent legal/human-readable/machine-readable solution. |
One minor point: RDFa is an RDF serialization. But providing a JSON-LD alternative would help. Maybe someone can add information on the CC Wiki. There is a page on RDFa and RDF/XML material is available but no info about JSON-LD. |
@philipashlock, I would like to expound a bit on @akuckartz's comment regarding RDF. You said, "originally done as RDF and is now RDFa instead of JSON", and @akuckartz rightly pointed out that RDFa is an RDF serialization. RDF is a format-independent data model. There are, as of last year, 7 standardized syntaxes for that data model, including a JSON one known as JSON-LD. Please see http://json-ld.org/ for more information, or http://www.w3.org/TR/json-ld/ for the standard. A more accurate way to phrase your statement would have been something like, "was written in the RDF data model and serialized as RDF/XML. We all know that RDF/XML sucked, so is now available in other standard RDF formats, including RDFa, which is easily convertible using widely available tooling to JSON-LD. That gets us where we need to go." :-) There is no reason to jettison the work done in describing or representing licenses on the Web because you want to use JSON. JSON-LD is JSON. That allows you to make use of the Creative Commons work, and still integrate easily into existing Web tools. Please don't reinvent the wheel on licensing for the Web. The CC folks have put a lot of very valuable thought into this, as have other parts of the Web community. |
Some datasets are copyrighted -- can't address that in current field or point to specific license guidance via URL.
The text was updated successfully, but these errors were encountered: