Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC0-1.0 accommodation for text used in ~30k repos on github.com #489

Merged
merged 8 commits into from
Apr 5, 2018

Conversation

mlinksva
Copy link
Contributor

Evidence: https://github.com/search?utf8=%E2%9C%93&q=filename%3ALICENSE+%22For+more+information%2C+please+see+%3Chttp%3A%2F%2Fcreativecommons.org%2Fpublicdomain%2Fzero%2F1.0%2F%3E%22&type=Code

The changes stem from the CC0-1.0 text at https://choosealicense.com/licenses/cc0-1.0/

The text at choosealicense.com should be replaced with the text from CC or SPDX, but it might be worth matching texts with the trivial changes so widely deployed.

(This edit is independent of/should not conflict with #487, which addresses a different line present in the text from CC, but not in the text in this repo or at choosealicense.com.)

src/CC0-1.0.xml Outdated
@@ -6,7 +6,7 @@
<crossRef>http://creativecommons.org/publicdomain/zero/1.0/legalcode</crossRef>
</crossRefs>
<titleText>
<p>Creative Commons CC0 1.0 Universal</p>
<p><optional>Creative Commons </optional>CC0 1.0 Universal</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making “Creative Commons” optional. Following their legalcode.txt, we may want:

<titleText>
  <optional><p>Creative Commons<optional> Legal Code</optional></p><optional>
  <p>CC0 1.0 Universal</p>
</titleText>

Since at least 2009-03-14, their HTML legal code has used <h1> and <h2> for those titles, but we don't have sized headers in our XML format (yet?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://spdx.org/licenses/CC0-1.0.html has "Creative Commons CC0 1.0 Universal" all on one line though.

src/CC0-1.0.xml Outdated
@@ -143,6 +143,6 @@
</list>
</item>
</list>

<optional><p>For more information, please see &lt;http://creativecommons.org/publicdomain/zero/1.0/&gt;</p></optional>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't know where this line comes from (I can't find it anywhere on creativecommons.org, anyway), I'd rather use an <alt> to keep it out of the canonical text. Something like:

<alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was introduced at github/choosealicense.com#126 with no explanation of that line but I guess it was an attempt to provide something equivalent to the "Back to Commons Deed" link present at the time. Anyway, you're right, I'll use your suggestion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems <p> still warranted . One way to get it would be to wrap in <optional>, but this probably isn't ideal:

      <optional><p><alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt></p></optional>

It looks to me as if <alt match> hasn't been used with mixed content so far. Can it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - text between <alt ... and </alt> will be considered the "original text" and rendered in the web page and any generated text. You can include formatting for that text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems <p> still warranted .

Why? The canonical form has no content there; certainly not an empty paragraph. And the match content need only match the whitespace-normalized license content (per the matching guidelines). So I don't think we need <p> at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it didn't occur to me that valid HTML output is irrelevant for text not included in the output. 😄

Copy link
Contributor

@wking wking Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… it didn't occur to me that valid HTML output is irrelevant for text not included in the output.

I think there's a larger issue here about “how do we inform folks reading HTML renderings of the set of allowed alternatives?”. The current previews just use:

<span class="replacable-license-text">&lt;copyright holder&gt;</span>

and similar. I think we should at least add a title attribute with the regexp, although I'm not very excited about exposing the regexps directly to end-users, many of whom may not be familiar with regexps.

But that's all orthogonal to this PR ;).

src/CC0-1.0.xml Outdated
@@ -143,6 +143,6 @@
</list>
</item>
</list>

<alt><p>For more information, please see &lt;http://creativecommons.org/publicdomain/zero/1.0/&gt;</p></alt>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does <alt> work without a name and match? And while I think we want to match files which include this line, I don't think we want to include it in our canonical text. So I still prefer the phrasing I'd suggested here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't fully understand -- it's including text in a match that keeps it out of canonical text, not wrapping it in <alt> -- I'll go with your suggestion, thanks for explaining.

@jlovejoy
Copy link
Member

I think this should use the optional tag as the text in question could be there or missing and still be a match. This is how we've defined/used the optional tag otherwise, so we should be consistent

@jlovejoy
Copy link
Member

this also will allow the text to be displayed on the website as optional, like this: https://spdx.org/licenses/Apache-1.1.html (in blue text)

@mlinksva
Copy link
Contributor Author

@jlovejoy I started with optional and switched to alt at @wking's suggestion in order to keep the text out of the text displayed on the website, see #489 (comment) ... the text should be matched as there are lots of legitimate copies of the license with it, but there's no reason for a licensor to include the nonstandard text. Alt hides it while enabling matching, IIUC. Happy to be wrong!

@jlovejoy
Copy link
Member

as this ties into a bigger conversation as to how/when we use these tags, we are going to boot this for further discussion as it could have a knock-on effect with other licenses/markup

@wking
Copy link
Contributor

wking commented Dec 26, 2017 via email

src/CC0-1.0.xml Outdated
@@ -147,6 +147,7 @@
</list>
</item>
</list>
<alt match="For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need &lt;, etc. escapes in the match value to get valid XML.

@mlinksva
Copy link
Contributor Author

The remaining failure https://travis-ci.org/spdx/license-list-XML/jobs/334218553#L471-L472 looks to be related to #487 rather than this change:

Test for license CC0-1.0 failed: Test for license ID CC0-1.0 failed due to difference found Variable text rule upstreamLink did not match the compare text at end of text while processing rule var: upstreamLink. Last optional text was not found due to the optional difference:
Normal text of license does not match starting at line 1 column 21 "1" when comparing to template text " Legal Code"

src/CC0-1.0.xml Outdated
@@ -147,6 +147,7 @@
</list>
</item>
</list>
<alt match="For more information, please see &lt;http://creativecommons.org/publicdomain/zero/1.0/>" name="upstreamLink"></alt>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need > -> &gt; in the match value too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary, and the validate-schema test passes. Happy to escape it though if that's stylistically preferred. The remaining failing test isn't contingent on the > being escaped or not (tried locally).

@mlinksva
Copy link
Contributor Author

mlinksva commented Jan 28, 2018

That's actually two distinct errors:

Test for license CC0-1.0 failed: Test for license ID CC0-1.0 failed due to difference found Variable text rule upstreamLink did not match the compare text at end of text while processing rule var: upstreamLink

This can be "fixed" by replacing the <alt match... with <optional.... But that then fails to distinguish between canonical and non-canonical text.

Last optional text was not found due to the optional difference:
Normal text of license does not match starting at line 1 column 21 "1" when comparing to template text " Creative Commons Legal Code"

This can be "fixed" with something like

      <titleText>
         <optional><p>Creative Commons Legal Code</p></optional>
         <p><optional>Creative Commons </optional>CC0 1.0 Universal</p>
      </titleText>

but that allows for more variations than seen in the wild.

I suspect both of these have to do with the matcher implementation being slightly less flexible than the schema would imply, but I haven't tried to confirm this by looking at the code. I'll play with it some more, just leaving this as a note to self, but hints welcome. 😄

@wking
Copy link
Contributor

wking commented Jan 28, 2018

That's actually two distinct errors...

I think the first is a knock-on from the second, because master passes.

@mlinksva
Copy link
Contributor Author

I don't understand the knock-on causal relation, but you're right. Fixed the second by wrapping the alt with empty default in optional. Not certain that's the right thing, but the tests pass now...

@goneall
Copy link
Member

goneall commented Jan 28, 2018

Just catching up on the conversation. It looks like we ended with a reasonable solution IMHO.

BTW - the error reporting for parsing errors is not very satisfactory for the tool that compare the code to the canonical text. I've been working on trying to improve it, but it can get quite complicated when there are nested alt's and optionals and it is not clear where the error occurs. Suggestions and tool improvements are welcome.

@wking
Copy link
Contributor

wking commented Jan 28, 2018 via email

@bradleeedmondson
Copy link
Contributor

Putting a ping on @goneall @mlinksva @wking -- did we reach a decision on this? Current result of this PR seems to substantively add only one line, and not address the title question.

Copy link
Contributor

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the need for the wrapping <optional> is unfortunate, but I'm fine with this PR landing as it stands.

@wking
Copy link
Contributor

wking commented Apr 5, 2018

Current result of this PR seems … not address the title question.

Doesn't it? I thought the issue was that choosealicense had been including the line that this PR is adding. It's not part of the upstream text, and with the empty <alt> here the new line is not included in our canonical wording. But the new empty <alt> does allow for matching choosealicense's previous wording, which is what the title is claiming.

@bradleeedmondson
Copy link
Contributor

Doesn't it? I thought the issue was that choosealicense had been including the line that this PR is adding. It's not part of the upstream text, and with the empty here the new line is not included in our canonical wording. But the new empty does allow for matching choosealicense's previous wording, which is what the title is claiming.

I didn't grok this well enough to resolve that this was the outstanding issue. Basically if the previous commenters you, @mlinksva, and @goneall are happy (that that's the question, and it's adequately resolved), then I'm happy.

@mlinksva
Copy link
Contributor Author

mlinksva commented Apr 5, 2018

The title question was addressed in #487, which has been merged. This PR is also good to go.

@bradleeedmondson
Copy link
Contributor

Based on today's discussion here and on @goneall review on Jan 28, I understand this is ready to merge. Glad we'll be able to match on a commonly used variant; thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants