Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace / by OR in the license field of Rust crates #2990

Merged
merged 1 commit into from
Sep 1, 2020
Merged

Replace / by OR in the license field of Rust crates #2990

merged 1 commit into from
Sep 1, 2020

Conversation

gferon
Copy link

@gferon gferon commented Aug 26, 2020

cargo supports arbitrary formats to specify multiple licenses, but / should be interpreted as OR.

It looks like the consensus is that crates.io should only accept SPDX 2.1 license expressions, but this is not the reality for all crates as it's not validated.

Related issue in cargo: rust-lang/cargo#2039

@sschuberth
Copy link
Member

Thanks for this @gferon, but can you please put the text from your comment in the commit message body?

@gferon
Copy link
Author

gferon commented Aug 26, 2020

Thanks for this @gferon, but can you please put the text from your comment in the commit message body?

done. it looks like there's an unrelated test failure in CI (in the Sbt analyzer).

.toSortedSet()
sortedSetOf<String>().apply {
add(node["license"].textValueOrEmpty().replace("/", " OR ").trim())
}
Copy link
Member

@sschuberth sschuberth Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could theoretically be written as

sortedSetOf(node["license"].textValueOrEmpty().replace("/", " OR ").trim())

but I actually liked the trimming of the individual elements in the original code.

Anyway, I believe we have a more fundamental API problem here: The return value of this function is used to populate the declaredLicenses field, and that's defined as val declaredLicenses: SortedSet<String>. Strictly speaking we somewhat have the same problem with the individual entries in the set: It's unclear whether set entries have an AND or OR relation among each other.

IIRC we deliberately did not want to chose any relation here, but declaredLicenses should literally take whatever string is used in the package manager's meta data. And as some package managers have multiple license fields, declaredLicenses is a collection / set instead of a single string.

Long story short, I'm not sure if even the original split by / was correct, as that's already not literally taking the meta data. I believe this needs some discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what is there to discuss? I just read the whole story and in the Rust ecosystem the / was used as a OR in a SPDX license expression, which is what my commit tries to address.

If you believe the toolkit should get the raw value, and it can handle it, I can change it so we don't process anything here. In the meantime, I'd argue that it's better to cover more cases (correctly) over anything else.

Copy link
Member

@sschuberth sschuberth Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what is there to discuss?

Put very briefly, whether a string in the declaredLicenses set should be allowed to be an SPDX expression or not. There was a time when we treated every single string as a single license, i.e. something like "Apache-2.0 OR MIT" was interpreted as a literal single license name, not as a license expression. But that should indeed be handled now by the introduction of declaredLicensesProcessed.

It's unclear whether set entries have an AND or OR relation among each other.

Actually, looking again at how we generate declaredLicensesProcessed, I realize we do relate set entries always with AND.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To conclude my thinking-out-aloud: I'd be fine with the intent of this change, but I'd implement it as

private fun extractDeclaredLicenses(node: JsonNode) =
    sortedSetOf(
        node["license"].textValueOrEmpty().split('/')
            .map { it.trim() }
            .filter { it.isNotEmpty() }
            .joinToString(" OR ")
    )

Let's see what others say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I agree with interpreting the / as or.
I wonder how the and operator looks like in Rust - any idea?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fviernau it doesn't exist, which is why the recommendation was to start using SPDX expressions instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sschuberth I have applied your suggestion.

sschuberth
sschuberth previously approved these changes Aug 27, 2020
@sschuberth
Copy link
Member

Looks like we have test failures as probably expected results now need to be updated. Could you please amend you commit with the necessary changes, @gferon?

@sschuberth
Copy link
Member

Thanks for the update @gferon, but please squash your commits as we aim for each commit in a PR to pass tests on its own.

@gferon
Copy link
Author

gferon commented Aug 27, 2020

Thanks for the update @gferon, but please squash your commits as we aim for each commit in a PR to pass tests on its own.

why not switch the contribution model to squash on merge then?

EDIT: done

@gferon gferon requested a review from sschuberth August 27, 2020 11:03
@sschuberth
Copy link
Member

why not switch the contribution model to squash on merge then?

Because we do want to keep individual commits in PRs and the final history (as long as they pass tests individually) for better / more fine grained traceability of the development process, and to improve running tools like git bisect.

@sschuberth
Copy link
Member

EDIT: done

Well, you of course need to adjust your commit message then to not contain two sign-offs... that is, just remove all of the second's commit message, and stick with only the message of the first commit.

And while at it, please also prefix the commit message with "Cargo: " to highlight the scope of modified files.

@gferon
Copy link
Author

gferon commented Aug 31, 2020

@sschuberth I just updated the test files, they should pass now. I also had to change the code a little, feel free to suggest something else, I don't really know Kotlin.

.map { it.trim() }
.filter { it.isNotEmpty() }
.toSortedSet()

return if(licenses.isEmpty()) sortedSetOf<String>() else sortedSetOf(licenses.joinToString(" OR "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be (note the space after if, and the removed explicit type):

if (licenses.isEmpty()) sortedSetOf() else sortedSetOf(licenses.joinToString(" OR "))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the case distinction actually needed, e.g. why not just sortedSetOf(licenses.joinToString(" OR "))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that would create a SortedSet containing an empty string if licenses is empty, and not the wanted empty set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, fully missed that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (licenses.isEmpty()) sortedSetOf() else sortedSetOf(licenses.joinToString(" OR "))

When I tried that, it didn't compile, I can try again, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also just tried, for me it compiles fine without the explicit type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

.map { it.trim() }
.filter { it.isNotEmpty() }
.toSortedSet()

return if(licenses.isEmpty()) sortedSetOf<String>() else sortedSetOf(licenses.joinToString(" OR "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the case distinction actually needed, e.g. why not just sortedSetOf(licenses.joinToString(" OR "))?

cargo supports the legacy '/' format to specify multiple licenses and
the consensus is that crates.io should only accept SPDX 2.1 license
expressions, but this is not enforced as of yet.

see: rust-lang/cargo#2039

Signed-off-by: Gabriel Féron <gabriel.feron@here.com>
@gferon
Copy link
Author

gferon commented Sep 1, 2020

took a while, but tests are finally passing 🎈

@sschuberth sschuberth dismissed fviernau’s stale review September 1, 2020 08:56

Misunderstandng clarified.

@sschuberth sschuberth merged commit c343abe into oss-review-toolkit:master Sep 1, 2020
@gferon gferon deleted the rust-spdx-or branch September 1, 2020 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants