-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip Markdown markup prior to comparision #247
Comments
The issue is that the linked file is Markdown formatted, whereas the known license is not. We can theoretically strip markdown formatting from licenses prior to comparison, as it would not be legally significant. If you clone this repo locally and run |
We'll likely want to remove the non-word characters someplace in https://github.com/benbalter/licensee/blob/master/lib/licensee/content_helper.rb. |
What's the point in suggesting that the filename can be named
Indeed, it seems that it is partially done already: https://github.com/benbalter/licensee/blob/master/lib/licensee/content_helper.rb#L145-L152
I tried, but it is not easy to see it:
|
A file that has a |
Upstream is not consistent about this: $ curl -s https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html | grep USA 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. $ curl -s https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt | grep USA 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. so support both forms. I've stuck with our old comma version as canonical. Reported by 1138-4EB [1]. [1]: licensee/licensee#247 (comment)
On Thu, Dec 14, 2017 at 04:12:12PM +0000, 1138-4EB wrote:
51 franklin street, fifth floor, boston, ma [-02110-1301-]{+02110-1301,+} usa ev
Looks like the FSF is not consistent about this comma. I've filed
spdx/license-list-XML#514 to track the difference in SPDX.
changing it is not allowed. {+###+} preamble the licenses for most software are
This `###` markup should be stripped in Licensee.
follow. [-gnu general public license-]{+###+} terms and conditions for copying,
This looks like a title (shich should be optional?) being dropped, and
Markdown's header markup (which should be ignored) being added.
[-0.-]{+**0.**+} this license applies to any program or other work which contain
This Markdown bullet bolding should be ignored.
possibility of such damages. {+### end of terms and conditions ### how to apply+}
…
This is a post-license footer. Licensee should ignore it, possibly
switching on “end of terms and conditions”.
So things look close. Perhaps it's just the comma that's causing
trouble, and the other changes are showing up as “we wouldn't
ordinarily care about these differences, but since the comma broke
matching, here they are…”?
|
The WHATWG has run into what is probably this issue, during the transition of its specifications for HTML, etc. from CC0 to CC-BY (whatwg/sg#51). They were unable to get GitHub to correctly display those specifications’ repository licenses as not CC0 using Markdown license files; after realizing what was happening, they are switching to plain-text license files instead. |
Upstream is not consistent about this: $ curl -s https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html | grep USA 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. $ curl -s https://www.gnu.org/licenses/old-licenses/gpl-2.0.txt | grep USA 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. so support both forms. I've stuck with our old comma version as canonical. Reported by 1138-4EB [1]. [1]: licensee/licensee#247 (comment)
Thinking through how to implement this, rendering the markdown to HTML and stripping tags feels to heavyweight and using regex feels to flimsy. I believe we can strip all non-word characters, as they shouldn't have legal significance for length comparison purposes (and are stripped prior to the wordset anyway). |
With #249, this is now detected, but with two odd changes:
|
Thanks @benbalter! Just a couple of questions:
|
Any license
We'll need to update Licensee on GitHub.com, which we do on a regular basis. |
Great. Thanks again, for fast and effective response. |
Has this gone live on Github?! I have those licenses in markdown format on some of my projects: Or any tip on making licensee recognise my GPL-v3 markdown-formatted license? |
@Elioty, you should have a look at idleberg/Creative-Commons-Markdown#10. It seems that, although licensee can properly detect the license in some repos, GitHub is failing to display it. However, this is not the case for your markdown formatted GPL-v3. On the one hand, the format you use is not exactly the same as the markdown version available at gnu.org:
On the other hand, licensee does not currently detect neither of them if executed in the root of the repo. But the one from GNU is properly detected if provided as an argument (quite weird):
|
It seems that gpl-2.0.txt is properly detected (e.g. torvalds/linux/blob/master/COPYING), but gpl-2.0.md is not (e.g. 1138-4EB/license-test/blob/master/LICENSE.md).
Hope I am doing nothing wrong, as
LICENSE.md
is a valid name according to help.github.com/articles/adding-a-license-to-a-repository, and the content is an exact match to a known license.The text was updated successfully, but these errors were encountered: