Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add support for PEP 639 #681

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

dnicolodi
Copy link
Member

@dnicolodi dnicolodi commented Oct 12, 2024

Closes gh-270

Copy link
Contributor

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good from a quick review. I'll keep this in mind when I implement it (very soon) in scikit-build-core.

mesonpy/__init__.py Outdated Show resolved Hide resolved
@@ -39,6 +39,14 @@ versions.
Meson 1.3.0 or later is required for compiling extension modules
targeting the Python limited API.

.. option:: 1.6.0

Meson 1.6.0 or later is required to support ``license`` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to a PR or something for the 1.6 change? curious as to what needed updates in meson, was expecting it to be just meson-python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably due to depending on mesonbuild/meson#13783

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, as @eli-schwartz says. Meson support is required only to get the license and license files declared in meson.build when they are declared dynamic or when there is no project section in pyproject.toml.

@dnicolodi
Copy link
Member Author

Looks pretty good from a quick review. I'll keep this in mind when I implement it (very soon) in scikit-build-core.

It is a bit uglier than what I would like because I wanted to keep compatibility with older pyproject-metadata. meson-python supports anything back to version 0.7.1.

@QuLogic
Copy link
Member

QuLogic commented Oct 16, 2024

I gave this a shot in matplotlib/matplotlib#28982, and License-Expression is correctly pulled from meson.build, as are License-File from the main project, but what I'd be missing there is License-File from the subprojects.

@dnicolodi
Copy link
Member Author

I'm not very well versed in licensing issues, but I don't think the way PEP 639 supports declaring package licenses works well for this use case. The main problem is that it expects the license of the package's sdist and wheel to be exactly the same. For packages like matplotlib that link statically with some libraries, this is not the case: the license of the source code is the matplotlib license, the license of the wheel is the union of the matplotlib license with the license of all the libraries linked to it.

Concretely, the issue is that in the most common case the subprojects directory does not contain the source code of the subprojects but only references to them (in the matplotlib case in the form of meson wrap files). However, the License-File fields need to point to actual files distributed in the sdist and in the wheel. The sdist in these cases does not contain the license files, thus the License-File fields for the subprojects cannot be added. The metadata information for the sdist and the wheel need to be the same, thus the fields cannot be added to the wheel metadata either.

I don't know how the problem could be solved. The most formally right solution would to allow source and binary distributions to have different licenses. But this is a quite significant change in behavior for the package metadata system.

The pragmatic solution is to add the subprojects' license files to the source distribution and accept that the sdist as a whole might have a more restrictive license than what it could have.

@eli-schwartz
Copy link
Member

However, the License-File fields need to point to actual files distributed in the sdist and in the wheel. The sdist in these cases does not contain the license files, thus the License-File fields for the subprojects cannot be added. The metadata information for the sdist and the wheel need to be the same, thus the fields cannot be added to the wheel metadata either.

This is not true. Metadata can be dynamic and vary between the sdist and the wheel. It's already dynamic as a result of being calculated from meson introspection, so you just need to mark that in the sdist as well.

The decision to make sdist metadata static and forbidden to change is... definitely a decision of all time. But the good news is it comes with an escape hatch. This escape hatch is needed since the spec also says that it's illegal per the spec to apply open source patches to an sdist metadata if the fields are marked as static. 🤷

@henryiii
Copy link
Contributor

Can't this already be done using dynamic-metadata? The point of dynamic-metadata is to allow the wheels to have different metadata than the source distribution. And the new entries are valid listed in dynamic-metadata. It might not be shown effectively by PyPI, etc, but it should at least work. And METADATA 2.4 > 2.2, so it's always available to use.

@henryiii
Copy link
Contributor

henryiii commented Oct 26, 2024

Are you just making sure the license is already normalized, I guess? Without a packaging release, we don't have license-expression normalization yet. The PR is merged but there hasn't been a release yet.

@rgommers
Copy link
Contributor

Can't this already be done using dynamic-metadata? The point of dynamic-metadata is to allow the wheels to have different metadata than the source distribution.

Perhaps yes, although it's quite hard to come up with a reasonable way of using that, since meson-python cannot know whether or not auditwheel will be used after meson-python itself is done. More importantly, declaring metadata is dynamic in pyproject.toml is kinda awful, because (a) it removes useful data that static analysis tools would otherwise be able to read, and (b) it makes pip & co unable to use any static metadata from pyproject.toml directly (in practice today, not in principle), thereby potentially having to do much more work in some cases.

The better way would be to actually make it a first class use case with something like:

[project]
license = "MIT"

[pypi-wheels]
# platform specifiers need some more thought - maybe reuse the cibuildwheel syntax?
license = {
    win32 = "MIT AND Other",
    linux = "MIT AND LGPL-2.1-and-later",
}
# Add `license-files` in a similar way

This could perhaps be done in a tool-specific way first, with [tool.meson-python.pypi-wheels]. And if we'd come up with something that works for both meson-python and scikit-build-core, that would probably be a good basis for future standardization?

What we do right now in NumPy is quite a bit more hacky:
https://github.com/numpy/numpy/blob/88a6a0070d9d50f034c3f31f5e1d9ceea5f1065e/tools/wheels/cibw_before_build.sh#L9-L20

@eli-schwartz
Copy link
Member

More importantly, declaring metadata is dynamic in pyproject.toml is kinda awful, because (a) it removes useful data that static analysis tools would otherwise be able to read, and (b) it makes pip & co unable to use any static metadata from pyproject.toml directly (in practice today, not in principle), thereby potentially having to do much more work in some cases.

Both of these reasons are invalid.

  • static analysis tools per definition cannot know whether you are running auditwheel, what would they "analyze" about this anyway? Furthermore, static analysis tools that care about licenses do not depend on clumsy functionality such as "pypa metadata", which in nearly all cases isn't useful because it isn't even a python software being analyzed. e.g. https://www.fossology.org/ and friends. There are anyways existing standards that unlike "pypa metadata" actually allow you to declare your licenses in a machine-readable fashion, including breaking down licenses file by file and recording the licenses of vendored software, git submodules, and much much more
  • pip & co already have to deal with being unable to use dynamic metadata from pyproject.toml directly -- static metadata works quite fine -- and this doesn't matter in practice today OR in principle anytime given that most metadata fields are simply ignored by pip altogether due to not participating in install-time dependency resolution

Also none of this matters if you expect people to use wheels

Also also none of this matters if you expect people to use PyPI-based sdists, since you can emplace the license metadata into the sdist and mark it as "dynamic" at the same time -- it is trivially parseable and simply isn't accurate for anything beyond the sdist itself due to the fact that the sdist can be configured and built and download new (checksummed) dependencies over the internet and change its effective license in, depending on the project, maybe even dozens of different configurations that are only machine-knowable by asking the resulting configured build to announce itself via introspection.

@rgommers
Copy link
Contributor

Also none of this matters if you expect people to use wheels [...] Also also none of this matters if you expect people to use PyPI-based sdists,

Not quite, my problem is the opposite. I want metadata in pyproject.toml to be generically correct to the extent possible (PyPI is a special case with weird rules/behavior). The problem with declaring anything as dynamic is that you have to delete the static info completely. Removing license = ...' under [project]completely because you have to dodynamic = ['license']` just makes the metadata much less discoverable.

Maybe I misunderstood what @henryiii meant by "dynamic-metadata". If it's not dynamic = ['license'] but only marked dynamic in PKG-INFO then I retract my point. There's no way I think to actually use that though without talking about UX in pyproject.toml.

@eli-schwartz
Copy link
Member

eli-schwartz commented Oct 27, 2024

If you just want people to use pyproject.toml and nothing other than pyproject.toml then what is the purpose of PKG-INFO in a sdist?

That file contains possibly still-dynamic versions of the metadata that was discovered during sdist building -- it is machine readable and there's no real reason not to use it as a statically deducible alternative to dynamic = values for scenarios where you don't want to run the project metadata generation hook.

And for licenses, it's quite reasonable to say that a dynamic license in pyproject.toml which is also dynamic in PKG-INFO, is intended to be varying across sdists versus wheels.

@rgommers
Copy link
Contributor

If you just want people to use pyproject.toml and nothing other than pyproject.toml then what is the purpose of PKG-INFO in a sdist?

PKG-INFO is a generated file, so that's not something that package authors can use - there must be some UX either inside pyproject.toml or in some other way (e.g., a CLI flag to the build backend) to be able to put anything inside PKG-INFO.

PKG-INFO also isn't version-controlled, and I'm interested in tools that analyze a repo rather than an sdist.

@eli-schwartz
Copy link
Member

PKG-INFO also isn't version-controlled, and I'm interested in tools that analyze a repo rather than an sdist.

Ok so maybe what you want in order to analyze a repo for license information is actually https://www.fossology.org/ ?

@eli-schwartz
Copy link
Member

Anyway I think people should have the right to use dynamic metadata if they want to, and use pyproject.toml metadata if they want to.

With that in mind, the ability to have different metadata for sdist vs. wheel by making use of "Dynamic" entries in current versions of the Core Metadata, is a powerful escape hatch for those who choose to do so.

@rgommers
Copy link
Contributor

Anyway I think people should have the right to use dynamic metadata if they want to, and use pyproject.toml metadata if they want to.

+1 completely agreed. My only point is that as a Python package author I don't wanted to be forced to use dynamic metadata via dynamic = under [project] when I statically know what the licenses of the project, the sdist and the PyPI wheels I publish are.

With that in mind, the ability to have different metadata for sdist vs. wheel by making use of "Dynamic" entries in current versions of the Core Metadata, is a powerful escape hatch for those who choose to do so.

For build backend authors/features, sure. Not directly relevant to package authors though - Core Metadata is a thing between packaging tools and index servers, not something one can directly access or really ever thinks about at all as a package author. Hence my proposal for a UX that package authors can use to put in their statically known sdist and PyPI wheels metadata, which the build backend can then translate to a Dynamic entry in Core Metadata.

@henryiii
Copy link
Contributor

FYI, having something listed in the project table and also dynamic allowing the metadata to be added to is still something I want to write a PEP for.

@dnicolodi
Copy link
Member Author

dnicolodi commented Oct 29, 2024

One change I'd suggest in this PR is to unconditionally require pyproject-metadata>=0.9.0, since we don't want projects to have to manage a transitive dependency like that when starting to use PEP 639 metadata.

I added a commit on top that bumps the required pyproject-metadata to 0.9.0.

@rgommers Is there anything else to be done before we can merge this?

I have no idea of what the timeline to get support for PEP 639 in PyPI is and I don't know if waiting to land this till that is available is a good idea or not.

@rgommers rgommers added this to the v0.18.0 milestone Oct 29, 2024
@rgommers
Copy link
Contributor

@rgommers Is there anything else to be done before we can merge this?

It's looking pretty good, but let me try to do a slightly more careful line-by-line review - so far I just quickly browsed the diff and tested with PyWavelets.

I have no idea of what the timeline to get support for PEP 639 in PyPI is and I don't know if waiting to land this till that is available is a good idea or not.

That's a good point. pypi/warehouse#16949 is moving fairly quickly it looks like, but it may be wise to ensure that this PR lines up with what lands in PyPI somehow before merging it. Not 100% sure about that, but without that it seems to me that there's a risk of some users trying out the new feature and ending up with wheels with license-related issues or that cannot be uploaded.

@rgommers
Copy link
Contributor

Something I noticed that is correct but mildly surprised me is that the .dist-info directory gets quite nested. E.g., for NumPy:

% tree -a numpy-2.2.0.dev0.dist-info/licenses/
numpy-2.2.0.dev0.dist-info/licenses/
├── .spin
│   └── LICENSE
├── LICENSE.txt
├── numpy
│   ├── _build_utils
│   │   └── tempita
│   │       └── LICENSE.txt
│   ├── _core
│   │   └── include
│   │       └── numpy
│   │           └── libdivide
│   │               └── LICENSE.txt
│   └── linalg
│       └── lapack_lite
│           └── LICENSE.txt
└── vendored-meson
    └── meson
        └── COPYING

Not a problem, and matches the PEP (as usual the PEP is short of examples though, so one needs to read carefully), so just noting here as "indeed looks as it should".

I noticed that canonicalize_license_expression relies on the not-yet-released packaging 24.2. I tested locally that it works, but that's another thing that could potentially change still. And rolling out PyPI support also depends on packaging 24.2. So probably best to merge this post packaging and PyPI feature availability.

Copy link
Contributor

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now went over this more carefully, and it all LGTM.

The one thing that came to mind was that including Apache/LGPL license files is probably going to trigger some license scanner tooling and give some distro maintainers work to silence the false positives, but not much that can be done about that.

Ready to merge I think - but I'd hold off until packaging and PyPI have support.

@dnicolodi
Copy link
Member Author

The one thing that came to mind was that including Apache/LGPL license files is probably going to trigger some license scanner tooling and give some distro maintainers work to silence the false positives, but not much that can be done about that.

There is no technical reason why these files need to be named like that or include the actual licenses text. I just copied them over from the source code of a package I had at hand. They can be replaced with placeholders.

@rgommers
Copy link
Contributor

There is no technical reason why these files need to be named like that or include the actual licenses text. I just copied them over from the source code of a package I had at hand. They can be replaced with placeholders.

Perhaps worth doing? Using MIT AND BSD-3-Clause might save some minor trouble here. Only if you feel like it though, I'm fine either way.

@henryiii
Copy link
Contributor

henryiii commented Nov 8, 2024

packaging 24.2 is out!

(Also, noticed there's no quick and easy link to the source from readthedocs - see https://github.com/scientific-python/repo-review/blob/7fe5485a8ef2aaba8539a18a49a186380e9b39f9/docs/conf.py#L45-L61 which I think I stole from something like pip or maybe furo's own docs)

@dnicolodi
Copy link
Member Author

Also, noticed there's no quick and easy link to the source from readthedocs

I'm not sure about what you mean with this. meson-python documentation is not on readthedocs and we have a "Source Code" link in the left sidebar (and typing this made me realize that the capitalization of the title of the link is inconsistent with the other titles).

@dnicolodi
Copy link
Member Author

dnicolodi commented Nov 8, 2024

packaging 24.2 is out!

Thanks. I'll add tests making sure the validation of the license field works as expected.

@henryiii
Copy link
Contributor

henryiii commented Nov 8, 2024

By readthedocs I meant https://mesonbuild.com/meson-python/. Okay, I was looking for a GitHub icon on the bottom or top or similar, but that's not bad.

Having PEP 639 metadata supported or not based on the version of a
transitive dependency would not make for a great user experience.
Supporting PEP 639 metadata requires pyproject-metadata 0.9.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow license to be listed in the dynamic section
5 participants