-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CC0-1.0 accommodation for text used in ~30k repos on github.com #489
Conversation
Evidence: https://github.com/search?utf8=%E2%9C%93&q=filename%3ALICENSE+%22For+more+information%2C+please+see+%3Chttp%3A%2F%2Fcreativecommons.org%2Fpublicdomain%2Fzero%2F1.0%2F%3E%22&type=Code The changes stem from the CC0-1.0 text at https://choosealicense.com/licenses/cc0-1.0/ This should be replaced with the text from CC or SPDX, but it _might_ be worth matching texts with the trivial changes included.
src/CC0-1.0.xml
Outdated
@@ -6,7 +6,7 @@ | |||
<crossRef>http://creativecommons.org/publicdomain/zero/1.0/legalcode</crossRef> | |||
</crossRefs> | |||
<titleText> | |||
<p>Creative Commons CC0 1.0 Universal</p> | |||
<p><optional>Creative Commons </optional>CC0 1.0 Universal</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for making “Creative Commons” optional. Following their legalcode.txt
, we may want:
<titleText>
<optional><p>Creative Commons<optional> Legal Code</optional></p><optional>
<p>CC0 1.0 Universal</p>
</titleText>
Since at least 2009-03-14, their HTML legal code has used <h1>
and <h2>
for those titles, but we don't have sized headers in our XML format (yet?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://spdx.org/licenses/CC0-1.0.html has "Creative Commons CC0 1.0 Universal" all on one line though.
src/CC0-1.0.xml
Outdated
@@ -143,6 +143,6 @@ | |||
</list> | |||
</item> | |||
</list> | |||
|
|||
<optional><p>For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/></p></optional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't know where this line comes from (I can't find it anywhere on creativecommons.org, anyway), I'd rather use an <alt>
to keep it out of the canonical text. Something like:
<alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was introduced at github/choosealicense.com#126 with no explanation of that line but I guess it was an attempt to provide something equivalent to the "Back to Commons Deed" link present at the time. Anyway, you're right, I'll use your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems <p>
still warranted . One way to get it would be to wrap in <optional>
, but this probably isn't ideal:
<optional><p><alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt></p></optional>
It looks to me as if <alt match>
hasn't been used with mixed content so far. Can it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - text between <alt ...
and </alt>
will be considered the "original text" and rendered in the web page and any generated text. You can include formatting for that text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems
<p>
still warranted .
Why? The canonical form has no content there; certainly not an empty paragraph. And the match
content need only match the whitespace-normalized license content (per the matching guidelines). So I don't think we need <p>
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it didn't occur to me that valid HTML output is irrelevant for text not included in the output. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… it didn't occur to me that valid HTML output is irrelevant for text not included in the output.
I think there's a larger issue here about “how do we inform folks reading HTML renderings of the set of allowed alternatives?”. The current previews just use:
<span class="replacable-license-text"><copyright holder></span>
and similar. I think we should at least add a title
attribute with the regexp, although I'm not very excited about exposing the regexps directly to end-users, many of whom may not be familiar with regexps.
But that's all orthogonal to this PR ;).
src/CC0-1.0.xml
Outdated
@@ -143,6 +143,6 @@ | |||
</list> | |||
</item> | |||
</list> | |||
|
|||
<alt><p>For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/></p></alt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does <alt>
work without a name
and match
? And while I think we want to match files which include this line, I don't think we want to include it in our canonical text. So I still prefer the phrasing I'd suggested here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I didn't fully understand -- it's including text in a match
that keeps it out of canonical text, not wrapping it in <alt>
-- I'll go with your suggestion, thanks for explaining.
I think this should use the optional tag as the text in question could be there or missing and still be a match. This is how we've defined/used the optional tag otherwise, so we should be consistent |
this also will allow the text to be displayed on the website as optional, like this: https://spdx.org/licenses/Apache-1.1.html (in blue text) |
@jlovejoy I started with optional and switched to alt at @wking's suggestion in order to keep the text out of the text displayed on the website, see #489 (comment) ... the text should be matched as there are lots of legitimate copies of the license with it, but there's no reason for a licensor to include the nonstandard text. Alt hides it while enabling matching, IIUC. Happy to be wrong! |
as this ties into a bigger conversation as to how/when we use these tags, we are going to boot this for further discussion as it could have a knock-on effect with other licenses/markup |
On Thu, Dec 21, 2017 at 07:03:33PM +0000, Jilayne Lovejoy wrote:
as this ties into a bigger conversation as to how/when we use these
tags…
I think we've *been* using alt vs. optional based on which form we
consider canonical (e.g. see #429, which very narrowly bears on this
issue). There is work on display for empty-canonical-alts in
spdx/tools#120, but I think that's a tooling issue and not a “what
should we put in the template XML?” issue.
… we are going to boot this for further discussion as it could have
a knock-on effect with other licenses/markup
What sort of knock-on effect do you foresee? I don't see a reason to
punt on this at the moment.
|
src/CC0-1.0.xml
Outdated
@@ -147,6 +147,7 @@ | |||
</list> | |||
</item> | |||
</list> | |||
<alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need <
, etc. escapes in the match
value to get valid XML.
The remaining failure https://travis-ci.org/spdx/license-list-XML/jobs/334218553#L471-L472 looks to be related to #487 rather than this change:
|
src/CC0-1.0.xml
Outdated
@@ -147,6 +147,7 @@ | |||
</list> | |||
</item> | |||
</list> | |||
<alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need >
-> >
in the match
value too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary, and the validate-schema test passes. Happy to escape it though if that's stylistically preferred. The remaining failing test isn't contingent on the >
being escaped or not (tried locally).
That's actually two distinct errors:
This can be "fixed" by replacing the
This can be "fixed" with something like <titleText>
<optional><p>Creative Commons Legal Code</p></optional>
<p><optional>Creative Commons </optional>CC0 1.0 Universal</p>
</titleText> but that allows for more variations than seen in the wild. I suspect both of these have to do with the matcher implementation being slightly less flexible than the schema would imply, but I haven't tried to confirm this by looking at the code. I'll play with it some more, just leaving this as a note to self, but hints welcome. 😄 |
I think the first is a knock-on from the second, because master passes. |
I don't understand the knock-on causal relation, but you're right. Fixed the second by wrapping the |
Just catching up on the conversation. It looks like we ended with a reasonable solution IMHO. BTW - the error reporting for parsing errors is not very satisfactory for the tool that compare the code to the canonical text. I've been working on trying to improve it, but it can get quite complicated when there are nested alt's and optionals and it is not clear where the error occurs. Suggestions and tool improvements are welcome. |
On Sat, Jan 27, 2018 at 09:37:32PM -0800, goneall wrote:
It looks like we ended with a reasonable solution IMHO.
If the matching implementation has trouble with empty alt elements,
perhaps it could wrap them in optional tags as a pre-processing step
so we didn't have to remember to do that here? On the other hand,
test failures will remind us, so not a big deal if we leave things as
they stand.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the need for the wrapping <optional>
is unfortunate, but I'm fine with this PR landing as it stands.
Doesn't it? I thought the issue was that choosealicense had been including the line that this PR is adding. It's not part of the upstream text, and with the empty |
I didn't grok this well enough to resolve that this was the outstanding issue. Basically if the previous commenters you, @mlinksva, and @goneall are happy (that that's the question, and it's adequately resolved), then I'm happy. |
The title question was addressed in #487, which has been merged. This PR is also good to go. |
Based on today's discussion here and on @goneall review on Jan 28, I understand this is ready to merge. Glad we'll be able to match on a commonly used variant; thanks all. |
Evidence: https://github.com/search?utf8=%E2%9C%93&q=filename%3ALICENSE+%22For+more+information%2C+please+see+%3Chttp%3A%2F%2Fcreativecommons.org%2Fpublicdomain%2Fzero%2F1.0%2F%3E%22&type=Code
The changes stem from the CC0-1.0 text at https://choosealicense.com/licenses/cc0-1.0/
The text at choosealicense.com should be replaced with the text from CC or SPDX, but it might be worth matching texts with the trivial changes so widely deployed.
(This edit is independent of/should not conflict with #487, which addresses a different line present in the text from CC, but not in the text in this repo or at choosealicense.com.)