-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Easier handling of snippets in source code files #464
Conversation
Hi @mxmehl, thanks for getting this discussion going. Personally, I'm indifferent on whether to add Snippet-specific tags at all. I guess I'm skeptical about whether they will be adopted in significant measure, but of course I could be wrong about this. That said, I'm (at least currently) opposed to incorporating a different meaning for I'd suggest that if this Snippet proposal is adopted (which as noted, I'm not convinced on), then something like My final comment is just to note that as part of the 3.0 spec, the formats and names of the license-related fields will likely be changing to some degree, as part of the new 'licensing' profile and the 3.0 revamp. So whatever the decision is on this, I'd suggest not planning to put it into use until 3.0 is more settled since things may change significantly. |
Simply also allowing +1 to @swinslow reservations about wide adoption of this and patience for the 3.0 naming. |
I get your point with the context problem, and so Regarding measureable use, at REUSE we receive quite some demand for this feature. Re-using code from stackoverflow (under CC-BY-SA-4.0) seems to be a common practice, and allowing for a simply way to mark that would surely increase proper usage once more. Regarding nesting, numbering these would also be fine, but I see two problems:
|
Unsurprisingly, as I suggested this issue in REUSE, I’m in favour of this being adopted in SPDX, if it does not break anything. Especially in front-end web development, I see snippets being used a lot, both when writing, as well as when concatenating JS and CSS files. Regarding nesting, I am slightly against it in order to keep it simple. But on the other hand, @zvr’s suggestion of numbered (or other unique IDs) snippets does have its benefits. Also of more easily debugging if one snippet is missing a beginning or ending tag. |
Wauw, I like this idea a lot, regardless of the SPDX technicalities. I regularly add a couple of lines with a source and copyright annotation, but the annotation in SPDX is currently quite coarse. |
239189b
to
a05c12a
Compare
Based on ACT tooling meeting discussion - For consideration earlier in 2.3 vs. 3.0 - to be discussed. |
@kestewart , was there an outcome whether this would be in 2.3 or 3.0 yet? In any case, did we decide whether we’ll use non-nesting generic I’d like to use this ASAP even if it’s not part of the spec/standard yet. |
@swinslow - Discussed on the tech call - we'd like your review. If you're good with this PR, we'll merge and include it into 2.3 |
Thanks all. @goneall three comments from me, one procedural and two substantive: Procedurally, I don't think I'm the sole voice on behalf of the Legal Team community about this, so I'll email the Legal Team mailing list about this shortly (similarly to the email I sent earlier today on the license namespaces draft annex). cc @jlovejoy and @pmadick as well in case they have direct input here. Substantively, as currently drafted I'm a strong -1 on this and I don't think it should be merged. As I mentioned above, this appears to give a second meaning to If the identifier for this were changed to My other substantive question (and a more minor one) is, why is this limited to use of "third-party" code? I could see that use of third-party code could be the most common use case, but I could also see a situation where e.g. someone is combining their own code from multiple projects under multiple licenses into a single file, and wants to call them out as separately-licensed snippets. I guess I'd think that the format would work for designating separate snippets of code, whether that's for a third-party or for one's own code. |
Another follow-up question here: As drafted, this is ambiguous as to whether the license identifier and/or the copyright text tags are each mandatory and/or each optional within a designated snippet. Personally I think they should both be optional and presumed to have the meaning |
I’m OK with either option. Since AFAIK the default when finding several Let’s assume a project with
As said, personally, I don’t see a reason to add another +1 for both of the tag options from my side.
On this point I fully agree. I don’t know how the “third party” limitation crept in, I don’t see any reason for such limitation either. |
I agree to @silverhook's analysis regarding the As soon as we've clarified the tag question I can update the PR and remove the third-party thingy. No idea what I was thinking when adding it, probably it was rather meant as an example. |
@silverhook This is a really good point. So, looking at the current spec and the way it describes short identifiers in source files:
@silverhook I think your statement is to say that, in the absence of other information, when there's multiple And so, the proposal here wouldn't change the effect of that approach for a person or tool that ignores the Does all that make sense, and fit with your take on this? If so, I might be starting to be persuaded that I'm wrong on this one :) I do think though that if this approach were taken, using the same I plan to bring this up on the Legal Team call this Thursday, for anyone who's available to join and participate in the discussion. Even if I'm getting on board with this, I'd like to see whether there's broader consensus from Legal Team participants on making this change. |
If you are effectively tagging multiple source code snippets from different origins which is intrusive since it does modifies the code, could this exceptional use be better handled by suggesting to go a bit further in the modification and to split things in different files, one for each origin and license? In doing so, IMHO there is no need to further specify anything new, just some guidelines for this use case. What about this simpler way? |
You described exactly my train of thought, yes :)
Having them at the top does makes sense, but the question is of the should vs must here I think. If it’s a “should” (= in general, but there may be reasonable exceptions) it’s fine as is. If it’s a “must” then the spec is already different than real life examples.
I’d love to, but SPDX for the past few years regularly clash with some work meetings I have :/ |
@pombredanne , you’re of course right! But what we want to do here is not to encourage people to copy-paste stuff when it’s better to have it separate, but to help properly mark/document when this was already done. I’m pretty sure you saw several files by now as well that were basically:
…I would just like some order in that (and as @swinslow said, in JS this gets really ugly really fast) |
At first I liked the idea of bringing increased semantic understanding for SPDX-License-Identifier tags. However, i hate this idea. Why? Because one license per file is the easiest to manage, and #include is a trivial way to nest files with multiple licenses (and a case that anybody doing compliance will have to cope with anybody due to the wide-spread use of #include). Given such a trivial workaround, that puts my initial analysis at "no way, unless there's something compelling here". And yes, this is dictating policy, but sometimes that's a good thing. And I don't think there is. It would create more confusion in the day to day maintenance of these files. Software isn't created once, but multiple times as each leaf grows or branch is pruned. Will all these markers get properly moved around? Will they all get updated correctly? Do I update the whole file copyright or not? And is such a trivial inclusion even covered by copyright? It isn't always since its use may be fair, or the code in question may not have nay protectable content. As we delve down to sub-file levels of licensing, how does one opt-out of using that? How does one even know that the code compiles in and isn't eliminated by optimization. Etc. So, not only does it complicate understanding what the license is, it does so in a way that's horrible to maintain. I'm strongly against this and it would only create problems for FreeBSD. |
What do you intend to update? If you’re referring to copyright notices, may I refer you to this post I wrote?
In what way? You can easily ignore them and simply not use the snippet-level tags. Requiring all code (also already written) to be (re-)written to use the include directive seems like a much bigger change and one that I don’t see happening. I do agree that whenever possible, it perhaps should be preferable, but it’s not always possible. |
hmmm... having now read this whole thread, I'm not sure how I feel. My gut is - why do we need this? and isn't this just over-complicating things? My leaning is not to add another license tag. Please no. I also note the earlier comments about holding off on this until 3.0 but the tech team discussed and seems ok with the possibility for 2.3?? |
I fear we’ve overcomplicated this proposal quite a bit through these years of slow discussion. The original proposal and its reasoning started very simple really. Let’s step back a bit. Also let’s please decide on the tags first and then we’ll modify the proposed draft PR. I volunteer to also either help draft a new PR if it is needed or draft the new one myself (of course in agreement with the rest of the REUSE team). Since this is blocking REUSE spec’s new version release and has been stuck for about 2 years, I’d really appreciate it if we can push this forward. For this reason and to save everyone’s time, I’ve spend a whole day to summarise all that I could find and can think of on this topic so far. IntroductionIn REUSE our goal is to provide a spec that makes it easy to mark (your own) source code with appropriate license and copyright info. We try to rely on SPDX as much as possible in order to a) leverage and promote the power of the awesome standard it is (to the extent it’s easy to generate an SPDX document from a REUSE-compliant source code package), and b) not to introduce any clashes or ambiguities between the two specs. This all began in 2019 with fsfe/reuse-docs#34 where, after stumbling upon this problem/need several times at work as well as noticing others in and around REUSE mentioning similar issues that could elegantly be solved by introduction of snippet-level tags, I suggested that we figure out a way how to properly equip snippets in source code, where their license and copyright info differs from the rest of the file, with such info. I have seen use cases in the wild that cover both snippets of third party as well as first party code. And since SPDX already had optional tags for snippets since 2.1 version of its spec (i.e. 2016, IIRC), it seemed reasonable to try to use SPDX as much as possible. Especially since one of the main goals of REUSE is the license and copyright info in a REUSE-compliant source code can very easily be translated to a valid SPDX Document, without the need of tools that have to make assumptions, but simple and reliable one-on-one tag translation. So we wanted was to make sure that the snippet-level info could be just as easily translated to an SPDX Document. In 2020 the discussion started on both SPDX Legal and REUSE mailing lists and also this issue we are all reading now. What the proposal actually was/isIn REUSE we want to introduce a way to mark snippets – we hear and see the need for that. In order to achieve that we need to identify three things:
We saw SPDX spec already has Clause 9: Snippet Information. But it does not answer all of our questions. With the basics out of the way let’s dive into the individual options. License tag optionsSDPX offers the following options, that we could use in SPDX by leveraging
Personally, I’m mostly in favour of 1), as I see several practical benefits, and no practical downsides, bar some theoretical ones, which are already solved through Annex H: SPDX File Tags. But at this stage, I’d just want to settle for a tag we can use. Copyright tag optionsSDPX offers the following options, that we could use in SPDX by leveraging
Here I think we can agree that Snippet beginning and end tagsThis is the only part of the proposal where a new tag might need to be introduced into SPDX spec. Obviously we need a way to mark the beginning and the end of a snippet in source code for any of this to be useful. Furthermore and as a bonus to SPDX, what we propose to include is markers in source code, which a tool could directly translate into the already existing So there are two ways we can go about this in REUSE (we’ll replace “SPDX-” with “REUSE-” in the tags if SPDX wants none of this – thought this would be unfortunate, since it would mean the first tag in REUSE that would not be an SPDX- tag):
I have a preference for 2) since IMHO it solves more problems and provides additional info to be translated into SPDX Documents, while the added complexity is not too much to ask. Bonus question: How to mark origin/provinenceThis is not part of this proposal, but something we are wondering within REUSE as well. Namely, if the snippet is of a different origin, how to best provide the origin, as a link or description. Reading the SPDX spec, it seems like Not breaking SPDXThis should not break anything in SPDX, and so far I’ve not seen an argument that showed what it would break. If there is something, please let me know. We really don’t want to break anything in SPDX. Whichever new (if any) tags SDPX adopts for this, they should be optional, just as snippet-level tags already are. If you don’t care about providing (or reading) snippet-level info, simply don’t, as If we want to manage snippets in source code files, regardless of which tag options we decide upon, we might also want to make the following line in H.2 Format a bit clearer still by emphasising that the file tags are most often at the top, but esp. if there are snippets, they could be elsewhere as well:
To re-iterate again, this is not just about third party snippets, and our intention is not to promote the method of copy-pasting third party code (just as Clause 9: Snippet Information of SDPX doesn’t), but instead to properly mark snippets that carry a different license and/or copyright notice than the rest of the file (just as Clause 9: Snippet Information of SPDX does). |
Thanks @silverhook for the clear summary. Recent arguments seem to indicate the dilemma whether SPDX should support snippets at all or should nudge projects to branch off copied snippets to separate files. In my experience snippets are so small that the overhead of creating a dedicated file and doing the import is too much overhead and not worth it. Typical use-cases I encountered are complex regexes or clever one-liners. So I certainly see value in snippets and hope this idea makes it into SPDX. I don't see a need for nested snippets. It is not a use-case I personally encountered. Also as source code is sequential by nature you can always have a sequence of snippet sections to indicate the different origins. Nesting can be harder to parse, so if that complexity can be avoided I think that would help the SPDX ecosystem. |
Thanks, @silverhook for the analysis. My "votes":
Please note that the In contrast with @nicorikken comment above, I have encountered nested snippets: for example, a 20-line inside a 100-line snippet. |
Was discussed in the SPDX Tech Team call (minutes) and the SPDX Legal Team call. TL;DR:
|
@silverhook - Have you updated or created the PR for this issue? If so, can you link here. |
@goneall Working on it. I’m battling with the question whether to introduce a new H.3 subsection and push the existing one into H.4 or if I need to be more creative in order not to break the existing order. |
Thanks @silverhook |
Also created #719 with an alternative, IMHO better, structure. |
Can be closed in favour of the approved #719 |
I'm going to go ahead and close this to avoid possible confusion. |
This PR relates to a discussion on spdx-legal about snippet handling in SPDX. It is supposed to go hand in hand with a specification update of REUSE for which we seek to make marking licensing and copyright of snippets in source code files easier.
The format we would aim for is the following, e.g. for a shell file:
Note on
SnippetCopyrightText
andCopyright
: per REUSE's specification, both is perfectly fine, so both should also be supported in snippets.Following @goneall's request, I've added this to Annex E. However, I am wondering whether snippet-information.md also needs some update, e.g. to make clear that nesting is not allowed (see ML thread).
To be honest, I am bit lost here how SPDX specifies things, and whether they are different for SPDX documents than in source code files. Help and guidance is most welcome!
Variants
There are some variants we could discuss:
SPDX-License-Identifier
is a beautiful, well-known tag which I would prefer to also be valid for snippets. However, since we already haveSnippetCopyrightText
, one could think aboutSPDX-Snippet-License-Identifier
. But I would see a big problem caused by the missing/additional dash in the tag name, something that is already quite confusing.SnippetBegin
orSnippet-Begin
. AFAIU, camel-case is preferred in SPDX, so I took that.SnippetCopyrightText
byFileCopyrightText
, but it's quite obvious that the scope would be wrong. As said above, it's always possible to use the traditional Copyright lines, and tools would just have to interpret it correctly, knowing that it related to the full file or just a snippet.