Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add License-File field to package metadata #2645

Merged
merged 9 commits into from
May 23, 2021
Merged

Conversation

cdce8p
Copy link
Contributor

@cdce8p cdce8p commented Apr 17, 2021

Summary of changes

This PR adds the License-File field to the package metadata. Although not yet part of the official metadata spec, including this field will enable third party tools easier access to all included license files through importlib.metadata. This is especially useful if a file doesn't have an easily recognizable name or is in an unusual location.

setuptools already parses the license_file and license_files options and cross-references them with all project files to see which ones should be included. With this change, the result of that would be made available for others to use as well.

Closes #2631

--
For reference: PEP 639, currently still in a draft, plans to add the License-File field to the metadata spec, link

--
Example use case

from importlib import metadata
meta = metadata.metadata('my-project')
project_files = metadata.files('my-project')

license_file_paths = [
    path for path in project_files
    if str(path) in meta.get_all('License-File')
]

Open questions

1. What should the value of License-Path be for files not in main project directory?
From a technical standpoint it would probably be best to use the actual relative path and mirror that in the .dist-info directory. However, at the moment wheel flattens the folder structure and just puts all license files into .dist-info, AFAIK. This could lead to name conflicts.

2. Should license files be copied into the .egg-info directory if the legacy install is used?

3. How should license_files and MANIFEST.in continue to work together?
At the moment, the MANIFEST.in is parse at the end and thus overwrites all previous file includes. However, license_files_computed will only look at the license_file and license_files options (unless we parse the manifest there too). That means that the License-File field might contain files that are later excluded by the manifest.
-> I would recommend to change the behavior so that the license_file and license_files options take precedent over the manifest. A license file is most likely excluded by accident anyway. Additionally, specifying license_files = (without any values) won't add any default patterns and thus not match any license files.

Pull Request Checklist

setuptools/dist.py Outdated Show resolved Hide resolved
if path not in files and os.path.isfile(path):
files.add(path)

self.metadata.license_files_computed = sorted(files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we retain order of license files rather than enforcing order? For the manifest, that's less important, but for metadata, it may prove useful to honor the user's supplied order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user only provides patterns which are used for glob. As such, I don't think retaining any order would make much sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I'll probably re-write it to avoid sorting or at least avoid sorting user-supplied values.

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going. Just a couple of concerns.

@cdce8p
Copy link
Contributor Author

cdce8p commented May 20, 2021

I've updated the PR. Additionally, I also added a commit to change when license files are added to the source list. As explained in the open questions section (3. How should license_files and MANIFEST.in continue to work together?), to make sure a license file which is listed in License-File is included, the license_files option needs to be able to overwrite the MANIFEST.in

@@ -406,7 +411,8 @@ class Distribution(_Distribution):
'long_description_content_type': None,
'project_urls': dict,
'provides_extras': ordered_set.OrderedSet,
'license_files': ordered_set.OrderedSet,
'license_file': None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had kind-of hoped this would go away, especially since the value is deprecated. I tried removing it and got an error, but I think it can be eliminated by being more defensive at the one point where it's queried.

Copy link
Contributor Author

@cdce8p cdce8p May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove it if you want. Personally, I would also be open to removing the option license_file entirely and return an error. Should be easy enough for the user to fix then.

@@ -406,7 +411,8 @@ class Distribution(_Distribution):
'long_description_content_type': None,
'project_urls': dict,
'provides_extras': ordered_set.OrderedSet,
'license_files': ordered_set.OrderedSet,
'license_file': None,
'license_files': None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it probably makes sense to just declare the default here rather than declare a null value and then set a default if the value is a null value.

Copy link
Contributor Author

@cdce8p cdce8p May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial idea as well. However, with that we can't determined if the user didn't specify the option (thus the defaults would be loaded) or left it empty (in which case no license files would be included). Both would equal an empty list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's very interesting. And the test cases capture this aspect. Consider me impressed.

@jaraco
Copy link
Member

jaraco commented May 21, 2021

I added a couple more comments, but I'll take the first stab at adjusting the behavior.

@cdce8p cdce8p force-pushed the license-file branch 2 times, most recently from 91f8e31 to af7b7b5 Compare May 22, 2021 10:48
@cdce8p
Copy link
Contributor Author

cdce8p commented May 22, 2021

@jaraco Pushed new change to address some of your points. First, I modified the logic to match license files so that the user sorting is kept as best as possible. Second, I removed the default license_file metadata attribute. Let me know if you what me to change anything else.

@cdce8p
Copy link
Contributor Author

cdce8p commented May 22, 2021

Small update: I needed to revert the removal of the license_file attribute. Without it the deprecation warning wouldn't be printed (even with -Wd).

@jaraco
Copy link
Member

jaraco commented May 22, 2021

I spent some time trying to simplify the logic by having the defaults for "license_files" be defined in the defaults for license files (rather than having a null value as the default and defining defaults in the finalizer), but when doing so, I encountered two problems:

At some point soon, I'd like to remove support for license_file and remove that check for overriding a non-False default. For now this should do.

I've added one additional commit that removes the sorting entirely and uses a generator to build the list. I see I don't have push access to the PR, so I'll try sending a PR to your branch. If that doesn't work, I'll probably just merge it all together manually.

@cdce8p
Copy link
Contributor Author

cdce8p commented May 22, 2021

I've added one additional commit that removes the sorting entirely and uses a generator to build the list. I see I don't have push access to the PR, so I'll try sending a PR to your branch. If that doesn't work, I'll probably just merge it all together manually.

Strange. I've allowed edits / commits by maintainers 🤔 Anyway, I merged your changes and, while I was add it, rebased the PR to resolve the merge conflict.

@jaraco jaraco merged commit 75e6496 into pypa:main May 23, 2021
@cdce8p cdce8p deleted the license-file branch May 23, 2021 00:11
@jaraco
Copy link
Member

jaraco commented May 23, 2021

Strange. I've allowed edits / commits by maintainers 🤔 Anyway, I merged your changes and, while I was add it, rebased the PR to resolve the merge conflict.

Seems the mistake is on my side, an artifact of having switched from hub to gh and gh not handling pr checkouts as elegantly (cli/cli#2189).

@hroncok
Copy link
Contributor

hroncok commented Jul 23, 2021

I think I might have found a problem with this, see #2739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Include License-File field in package metadata
3 participants