Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo-license fails to normalize the license of libm 0.2.9 and 0.2.10` #78

Open
sosthene-nitrokey opened this issue Oct 28, 2024 · 5 comments · May be fixed by #79
Open

cargo-license fails to normalize the license of libm 0.2.9 and 0.2.10` #78

sosthene-nitrokey opened this issue Oct 28, 2024 · 5 comments · May be fixed by #79

Comments

@sosthene-nitrokey
Copy link
Contributor

sosthene-nitrokey commented Oct 28, 2024

The license of libm is now: MIT AND (MIT OR Apache-2.0).

After being normalized, it shows up as Apache-2.0) OR MIT AND (MIT, which does not make sense.

It looks like it's caused only by the sort_unstable call. Why is that required?

@sosthene-nitrokey sosthene-nitrokey changed the title cargo-license fails to normalize the license of libm 0.2.9 and 0.2.8` cargo-license fails to normalize the license of libm 0.2.9 and 0.2.10` Oct 28, 2024
@ijackson
Copy link

It looks like it's caused only by the sort_unstable call.

The underlying bug is the splitting on OR, which is fundamentally wrong since SPDX licence strings can contain parens. So a SPDX string must either be taken verbatim, or parsed properly.

Why is that required?

I presume that this is an attempt to normalise the licence string. If the SPDX strings are normalised, they can be deduplicated, so that a minimum number of different licences are reported.

But, if normalisation is desirable, the right way to implement it would be to properly parse the SPDX string. This is nontrivial (and would come with its own subtleties).

IMO this normalisation ought not to have been attempted with this fundamentally incorrect algorithm. But simply abolishing it would probably be too disruptive.

In the meantime I suggest the following bodge: if the string contains any ( or ), completely skip the normalisation (including the splitting and the sort_unstable). That would fix this particular case by outputting the original string, without mangling it.

(Also, sort_unstable ought to be sort. sort_unstable isn't appropriate here, because we want the output to remain the same across different builds of cargo license.)

@ijackson
Copy link

FTAOD my proposed bodge leaves the algorithm correct (in the sense that it would never corrupt a licence string); it's just not optimal and rather unprincipled. Currently the algorithm is broken.,

@nwhitehead
Copy link

Just chiming in that I saw this exact issue. I was putting together a NOTICE file for a project and did cargo license to get all the dependencies. My editor showed the mismatched parentheses for libm in red which made me notice there was something wrong.

@sosthene-nitrokey sosthene-nitrokey linked a pull request Feb 12, 2025 that will close this issue
@dalance
Copy link
Collaborator

dalance commented Feb 17, 2025

I checked #79.

I think using spdx instead of the ad-hoc normalization logic is a correct way, but spdx::Expression::canonicalize doesn't sort license order.
Sorting seems to be necessary to normalize licenses with different order like MIT OR APACHE-2.0 and APACHE-2.0 OR MIT.

If this PR is merged as is, the output changes significantly.
So I think it is better to keep the current logic for simple, sortable case, and add spdx canonicalize for complex case like including (), as the same as @ijackson 's idea.

@robin-nitrokey
Copy link

It would also be possible to use spdx::Expression::parse after calling canonicalize and then sort the parsed expression. This would stay close to the current behavior while fixing the bad cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants