Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More useful information on expected-warnings #117

Open
m1kit opened this issue Sep 2, 2021 · 9 comments
Open

More useful information on expected-warnings #117

m1kit opened this issue Sep 2, 2021 · 9 comments
Assignees

Comments

@m1kit
Copy link
Collaborator

m1kit commented Sep 2, 2021

This file is useful not only during testing, but in the context of license matching.

Though the file owner is spdx/license-list-XML, I think the main user of the file is this repo.
I'd like to leave some discussion here.

See the comments here for the details.

@m1kit m1kit self-assigned this Sep 2, 2021
@goneall
Copy link
Member

goneall commented Sep 2, 2021

I like the idea of adding a more structured file for expected duplicates.

Since the LicenseListPublisher is used by only a small number of organization, we don't need to worry too much about compatibility.

One question - should we:
A) "expected duplicates" JSON file that this utility would check and not generate any warnings for duplicate licenses listed in the JSON file, or should we
B) take a more general approach of changing the format of the expected warnings file to be a JSON file which would contain the expected duplicates but also contain other sections of expected warnings?

Approach A) may be more usable by other utilities whereas B) make more sense for the LicenseListPublisher.

I'm leaning to A) to make the file format more usable to other utilities.

@m1kit - what do you think?

@m1kit
Copy link
Collaborator Author

m1kit commented Sep 2, 2021

Oh, I was also writing some similar ideas at the same time😂
Thanks anyway, @goneall !

I have three ideas in my mind.

JSON (Similar to your plan B)

One possible format is JSON like this

[{
  "type": "duplicated-license",
  "license-ids": [
	"LGPL-2.1",
	"LGPL-2.1-only"
  ],
  "prefer": "LGPL-2.1-only"
},
 {
  // more expected warnings here
}]

This format is flexible to any future updates (new expected warning types).
We may add some data for simplicity in the publisher like:

[{
  "type": "duplicated-license",
  "license-ids": [
	"LGPL-2.1",
	"LGPL-2.1-only"
  ],
  "warnings": [
    "Duplicates licenses: LGPL-2.1, LGPL-2.1-only",
    "Duplicates licenses: LGPL-2.1-only, LGPL-2.1",
  ]
  "prefer": "LGPL-2.1-only"
}]

It's like a hybrid of your Plan A and B.

CSV (just another format of JSON)

Maybe it is not easy to parse JSON in Java.

We may store data in CSV format like... (but not flexible)

"message","from","to","prefer"
"Duplicates licenses: LGPL-2.1, LGPL-2.1-only","LGPL-2.1","LGPL-2.1-only","LGPL-2.1-only"
"Duplicates licenses: LGPL-2.1-only, LGPL-2.1","LGPL-2.1-only","LGPL-2.1","LGPL-2.1-only"

XML (Similar to your plan A)

I think the data here is related to obsoletedBys in license-list-XML.
I wonder to define similality of templates somehow in the XML.

Then we can pull data from XML and generate expected-warnings dynamically in a format specific to LicenseListPublisher.

@m1kit
Copy link
Collaborator Author

m1kit commented Sep 2, 2021

I forgot to mention my preference.

I think adding some info on XML is the best, if possible.

Or, we can make some generic expected duplicate in separate file somewhere in license-list-XML and dynamically generate a file for this library.

@goneall
Copy link
Member

goneall commented Sep 3, 2021

Or, we can make some generic expected duplicate in separate file somewhere in license-list-XML and dynamically generate a file for this library.

I like this idea as it would make the information more generally accessible and usable. We could replace the current expectewarnings file with an "KnownDuplicates.xml".

Although I tend to like JSON better than XML due to readability, the fact that the license-list-XML repo is primarily XML format would favor the XML format over JSON.

We can update this library to read the XML file and process it directly.

I'm tempted to just remove the expected warnings functionality since it is only currently used for known duplicates.

I would like the XML to deserialize into a Java object using one of the standard libraries without too much effort. Here's what I'm thinking might work (although I would want to test this out in code before finalizing):

<expectedDuplicates>
   <duplicatedLicenseSet>
       <licenseIds>
          <licenseId>LGPL-2.1</licenseId>
          <licenseId>LGPL-2.1-only</licenseId>
          <licenseId>LGPL-2.1-or-later</licenseId>
      </licenseIds>
      <prefer>LGPL-2.1</licenseId>
      <comment>The LGPL-2.1-only should be used if only the 2.1 version of the license is allowed, the LGPL-2.1-or-later should be used if any later version of 2.1 may be used.  If unsure which applies, the LGPL-2.1 identifier should be used</comment>
    </duplicatedLicenseSet>
</expectedDuplicates>

@m1kit
Copy link
Collaborator Author

m1kit commented Sep 11, 2021

Hi, I agree with "KnownDuplicates.xml" idea.

I'd like to work on this - introduce the file on license-list-XML.
I have a few questions about how-to.

  • Do I have to write .xsd to define the schema?
    • If so, what is a recommended way to write a .xsd file? (I'm unfamiliar with it)

@goneall
Copy link
Member

goneall commented Sep 11, 2021

I'd like to work on this - introduce the file on license-list-XML.

That would be great :)

I have a few additional suggestions on the file I've been thinking about - I'll add those as separate comments.

Do I have to write .xsd to define the schema?

A schema would be really nice to have for validating and even generating code.

If so, what is a recommended way to write a .xsd file?

There are a number of ways to create the XSD file. Since we need to change the Java application to use the XSD file, I have a suggested approach:

  • Write a simple Java class (POJO) which represents the XSD structure. This can be used by the LicenseRDFaGenerator.
  • Annotate the Java class with @XmlType annotation and use JAXB to generate the schema - note that there are a lot of JAXB based tools, some built into IDE's like intellij or eclipse
  • Review the XSD to make sure it looks reasonable
  • Add code to validate and deserialize the XML file - see this code for an example

@goneall
Copy link
Member

goneall commented Sep 11, 2021

I would like to suggest we broaden the scope of the XML file to include other potential license issues which generate warnings in the LicenseRDFaGenerator. If we merge in PR #20 , there will be more expected warnings where the OSI approved flag doesn't match the OSI data.

I would like to name the file something different from "expectedwarnings" since I would like the file to be usable for a number of other purposes. Perhaps something like "KnownLicenseIssues.xml"?

@goneall
Copy link
Member

goneall commented Sep 11, 2021

I did some quick analysis of warning sources to see if we want to include any additional sections in the XML file for expected license issues.

The only one I think we should add is something to describe a list of license ID's where the OSI Approved flag doesn't match the OSI provided data (see PR #20 for context).

Below are other warnings which can be added as sections, but are not as likely to occur:

@goneall
Copy link
Member

goneall commented Apr 9, 2023

@m1kit - It's been a while for this issue - are you still interested in contributing? If not, I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants