Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORT reports error, if copyrightText in SPDX Document is empty #7222

Closed
hanna-modica opened this issue Jun 29, 2023 · 14 comments
Closed

ORT reports error, if copyrightText in SPDX Document is empty #7222

hanna-modica opened this issue Jun 29, 2023 · 14 comments
Labels
needs info An issue where further information is required

Comments

@hanna-modica
Copy link
Contributor

ORT is currently only supporting SPDX version 2.2, where is not clearly specified what to do if the copyrightText of a file is empty.

In Version 2.3 it is now defined to use NOASSERTION if the copyrightText is empty. Since this is not a major version update it should be the same in version 2.2.

See https://github.com/spdx/spdx-spec/blob/development/v2.3/schemas/spdx-schema.json#L520 and https://github.com/spdx/spdx-spec/blob/development/v2.2/schemas/spdx-schema.json#L441 to compare.

Another option would be to simply assume NONE if the string is empty. Either way, ORT should not report an error.

This seems to be similar to #6985

Excerpt from one of our users spdx json file:

"files": [
  {
   "fileName": "/etc/X11/Xsession.d/90gpg-agent",
   "SPDXID": "SPDXRef-File-etc-X11-Xsession.d-90gpg-agent-a17b5f10826a8aca",
   "checksums": [
    {
     "algorithm": "SHA1",
     "checksumValue": "0000000000000000000000000000000000000000"
    }
   ],
   "licenseConcluded": "NOASSERTION",
   "copyrightText": "",
   "comment": "layerID: sha256:f9eef85611c1fa226ded9a9d4fcc070448e050b7282bf662f32167ade49c4a03"
  },
  {
   "fileName": "/etc/alternatives/README",
   "SPDXID": "SPDXRef-File-etc-alternatives-README-2dd9e31817f87f65",
   "checksums": [
    {
     "algorithm": "SHA1",
     "checksumValue": "0000000000000000000000000000000000000000"
    }
   ],
   "licenseConcluded": "NOASSERTION",
   "copyrightText": "",
   "comment": "layerID: sha256:cdd7c73923174e45ea648d66996665c288e1b17a0f45efdbeca860f6dafdf731"
  },
@sschuberth
Copy link
Member

sschuberth commented Jun 29, 2023

So, this is about reading SPDX files generated by another party via the SpdxDocumentFile "package manager", right? I.e. it is not about SPDX files written by ORT, which indeed already now should never have a blank string as copyrightText.

If I'm reading your request right, you say that ORT should not complain about copyrightText being set to a blank string when reading such SPDX files. However, the specs you quote only refer to the case when the "field is not present", not to the case when it is present but set to a blank string.

The specs over here are also not super clear about how to deal with blank strings, but given that the field description says "Any text related to a copyright notice, even if not complete" I'd personally say that a blank string is not related to a copyright notice, and it's also not an incomplete copyright notice (but simply no copyright notice at all).

TL;DR, in my opinion the above SPDX example is invalid according to the specs, and ORT rightfully refuses to parse it. Unfortunately, https://tools.spdx.org/ is currently down so I cannot easily run the validator on it.

@sschuberth
Copy link
Member

However, I believe #7224 is a correct thing to do.

@sschuberth
Copy link
Member

TL;DR, in my opinion the above SPDX example is invalid according to the specs, and ORT rightfully refuses to parse it.

Moreover, the semantics of a blank copyrightText are really unclear: Should it correspond to NOASSERTION or NONE, or should the blank string be taken as a literal copyright statement (which does not really make sense, as it adds no value)? That's one more argument why I believe a blank string should be invalid here.

@sschuberth sschuberth added needs info An issue where further information is required spdx-utils About the SPDX utility library labels Jun 29, 2023
@hanna-modica
Copy link
Contributor Author

So, this is about reading SPDX files generated by another party via the SpdxDocumentFile "package manager", right? I.e. it is not about SPDX files written by ORT, which indeed already now should never have a blank string as copyrightText.

If I'm reading your request right, you say that ORT should not complain about copyrightText being set to a blank string when reading such SPDX files.

Yes, that is correct. I have provided your answer to the user who raised this issue with us and will get back to you.

@MP91
Copy link

MP91 commented Jun 30, 2023

Hey @sschuberth @hanna-modica, I raised this issue and I totally agree that this is definitely not clear in any documentation.

I just ran our spdx.json with these entries trough the validator and it says it is valid. I'm definitely no expert in spdx and don't know what exactly is validated there or which version is taken into account...

But since everybody and every tool generating spdx can have different interpretations and ORT I guess wants to be a general analyzer, I would like to see ORT handling everything gracefully.

I personally would vote for an empty string being the same as if the field is not present and therefore means NOASSERTION. This would, at least in our case, also fit the values for "licenseConcluded" and could also fit the description of NOASSERTION
"if

  • the SPDX document creator has made no attempt to determine this field; or

  • the SPDX document creator has intentionally provided no information (no meaning should be implied from the absence of an assertion)."

(https://spdx.github.io/spdx-spec/v2.3/file-information/#88-copyright-text-field)

So an empty string means the creator intentionally did not specify this.

But again there is plenty of room for different interpretations...

Would there even be a difference between NONE and NOASSERTION for the copyrightText? I thought the license is more important. In version 2.3 the copyrightText is even not mandatory anymore...

@sschuberth
Copy link
Member

sschuberth commented Jun 30, 2023

I guess wants to be a general analyzer, I would like to see ORT handling everything gracefully.

Actually, to me an important aspect of ORT also is to enforce best practices and standards. Sure, we could be lenient when reading and strict when writing, but that probably does not help much to question ambiguous SPDX documents that are in the wild, in order to bubble up to the tools that created them and fix these.

also fit the values for "licenseConcluded"

I believe you, like @hanna-modica, are mixing up the cases where licenseConcluded is not specified at all vs. set to an empty string: It's out of question now that an absent licenseConcluded should be interpreted as NOASSERTION. But what if someone would set a licenseConcluded to the empty string? It's not specified that this should map to NOASSERTION; NONE would just make as much sense, IMO.

Would there even be a difference between NONE and NOASSERTION for the copyrightText?

Yes: NONE means that you have tried to find out the copyright, and came to the conclusion that there is none. NOASSERTION can either mean that you didn't try in the first place, or that you tried but came to no conclusion.

To me the question boils down to: Why risk that something could be misinterpreted by using unspecified values? Instead, as an SPDX document author I'd simply play safe and avoid empty / blank strings being written for licenseConcluded, and be explicit about what I mean by using NOASSERTIONor NONE.

@MP91
Copy link

MP91 commented Jun 30, 2023

Actually, to me an important aspect of ORT also is to enforce best practices and standards. Sure, we could be lenient when reading and strict when writing, but that probably does not help much to question ambiguous SPDX documents that are in the wild, in order to bubble up to the tools that created them and fix these.

I would expect the SPDX community to enforce best practices and standards. At the end ORT and all other tools are just users of SPDX and as already mentioned can have different interpretations of unclear statements.

I believe you, like @hanna-modica, are mixing up the cases where licenseConcluded is not specified at all vs. set to an empty string: It's out of question now that an absent licenseConcluded should be interpreted as NOASSERTION. But what if someone would set a licenseConcluded to the empty string? It's not specified that this should map to NOASSERTION; NONE would just make as much sense, IMO.

You are totally right that it is the same case for licenseConcluded, but in the example above licenseConcluded is set to NOASSERTION but for the copyrightText we have the empty string. That's why I meant in this case one could assume the empty string also means NOASSERTION as it would match the license.

Instead, as an SPDX document author I'd simply play safe and avoid empty / blank strings being written for licenseConcluded, and be explicit about what I mean by using NOASSERTIONor NONE.

The licenseConcluded is set to NOASSERTION but the copyrightText is not.
Interestingly on the linked specs from the website this field is marked as mandatory: https://spdx.github.io/spdx-spec/v2.2.2/file-information/#88-copyright-text-field

Whereas it is not mandatory in the linked specs from the GitHub repo: https://spdx.github.io/spdx-spec/v2-draft/file-information/#88-copyright-text-field

If it is not mandatory I would even say that ORT could totally ignore it if there is no valid entry.

Lets see what the SPDX guys tell us.

@sschuberth
Copy link
Member

Lets see what the SPDX guys tell us.

👍🏻

@sschuberth
Copy link
Member

I just ran our spdx.json with these entries trough the validator and it says it is valid.

Interesting. When I set the text for an existing copyrightText to the empty string in https://github.com/spdx/tools-java/blob/master/testResources/SPDXJSONExample-v2.2.spdx.json, then https://tools.spdx.org/app/validate/ tells me

The following warning(s) were raised: [Relationship error: Relationship error: Missing required copyright text for ./src/org/spdx/parser/DOAPProject.java in ./src/org/spdx/parser/DOAPProject.java in glibc in glibc in SPDX-Tools-v2.0]

@MP91
Copy link

MP91 commented Jul 3, 2023

Indeed interesting, if I just add an file from our spdx to your example it is still valid. Thats what I added to files section

    {
      "fileName": "/etc/X11/Xsession.d/90gpg-agent",
      "SPDXID": "SPDXRef-File-etc-X11-Xsession.d-90gpg-agent-a17b5f10826a8aca",
      "checksums": [
        {
          "algorithm": "SHA1",
          "checksumValue": "0000000000000000000000000000000000000000"
        }
      ],
      "licenseConcluded": "NOASSERTION",
      "copyrightText": "",
      "comment": "layerID: sha256:f9eef85611c1fa226ded9a9d4fcc070448e050b7282bf662f32167ade49c4a03"
    },

But I also tried just removing the copyrightText from one file directly in the example and got the same warning. So there has to be some other setting or link or something that triggers that...

@goneall
Copy link

goneall commented Jul 3, 2023

Interesting. When I set the text for an existing copyrightText to the empty string in https://github.com/spdx/tools-java/blob/master/testResources/SPDXJSONExample-v2.2.spdx.json, then https://tools.spdx.org/app/validate/ tells me

This is because in version 2.2 of the spec, the copyrighText was required. In version 2.3, this restriction was removed. If you try an empty copyright in the 2.3 examples, it should return valid.

@sschuberth
Copy link
Member

This is because in version 2.2 of the spec, the copyrighText was required. In version 2.3, this restriction was removed.

But note that "not present" is not the same thing as "present and set to empty".

@MP91
Copy link

MP91 commented Jul 4, 2023

This is because in version 2.2 of the spec, the copyrighText was required. In version 2.3, this restriction was removed. If you try an empty copyright in the 2.3 examples, it should return valid.

See my comment above. It is valid if I copy the mentioned text into the SPDX2.2 example.

@sschuberth
Copy link
Member

ORT is currently only supporting SPDX version 2.2, where is not clearly specified what to do if the copyrightText of a file is empty.

With the statement from here, it has been clarified that proving a blank string for the copyrightText fields as part of an SPDX 2.2 document (which is the only version ORT supports currently, also see this, although it refers more to output than input of SPDX) is invalid, and the SPDX producer should use NOASSERTION instead.

In that regard ORT's behavior is strict, but correct, so I'm closing this. For making SPDX parsing optionally less strict also see #8052 (at the example of licenseInfoInFiles).

@sschuberth sschuberth closed this as not planned Won't fix, can't repro, duplicate, stale Feb 19, 2024
@sschuberth sschuberth removed the spdx-utils About the SPDX utility library label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info An issue where further information is required
Projects
None yet
Development

No branches or pull requests

4 participants