-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Extension.depends
in manifests and sdists by default
#4000
Include Extension.depends
in manifests and sdists by default
#4000
Conversation
Oooh! A nice and round PR number. Lucky! |
Thank you very much @RuRo. Nice way of using the This would be mostly related to header right?
Yeah, |
@abravalheri regarding absolute paths or paths outside the project root, I am honestly not sure. It is not clear to me, what such a path should even signify (in either When building
If you want, I can change it so that When building |
Perfect, I think that as long as we have something avoiding the user from adding a file from outside the repo, that should be fine. (As you said, it does not make much sense to have those things in a wheel or sdist). Thank you very much for checking. I was in doubt, because I have never used However, I did find a few examples that use files from outside the directory that contains |
As far as my understanding goes, stuff like
Actually, it seems that even absolute paths are sometimes used in
In this case, it doesn't seem like this package is intended to be distributed in a
I think, that the most safe behaviour would be:
This should at the very least not break any currently working builds (even if they do some dark magic/nonsense with P.S. Do you know, what is the canonical way to obtain the path to the project root (the directory with |
Thank you very much @RuRo, I agree overall with this approach.
I believe that is probably: Footnotes
|
I added the slightly more relaxed logic for better backwards compatibility and warning messages as discussed. Although, I used |
cdc0ddb
to
43abba2
Compare
Thank you very much @RuRo, I added a commit that simplifies a little bit the checks and another one changing the changelog entry. I hope that is OK with you. To be honest I prefer the log messages you have proposed because they give more clear indications to the user. However I am a bit worried that we can have a backlash of the community in the style "don't tell me what to do, I know what I am doing" (after all this is a feature used by power users). Another reason why I made it a oneliner is because projects that use external depends, are likely to have more than one (e.g. mypyc/lib-rt has 2, scipy can also have a couple), and oneliners are OK to repeat multiple times (imho). Trying to choose the battles to fight and save up the I plan to add a test that covers both "../xxxxx" paths and "/x/y/z" paths, just for the sake of completeness. But feel free to beat me to that if you have the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, if we are already using pathlib
here, we should probably go all the way with it:
project_root = Path(self.distribution.src_root or os.curdir).resolve()
and then
abs_path = (project_root / dep).resolve() # this handles both absolute and relative `dep`s
rel_path = abs_path.relative_to(project_root).as_posix()
yield rel_path
setuptools/command/build_ext.py
Outdated
is_excluded = os.path.commonpath([d_abs, root]) != root | ||
return is_absolute, is_excluded, d_rel | ||
|
||
path = os.path.abspath(dep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dep
is not an absolute path, abspath
will resolve it relative to the CWD. It should be treated as relative to project_root
instead.
I am not sure, under which circumstances can project_root
be different from os.curdir
, but given that we are using self.distribution.src_root
when it's not None
, I am assuming that this can in fact happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, thank you very much for the review @RuRo
setuptools/command/build_ext.py
Outdated
|
||
path = os.path.abspath(dep) | ||
rel_path = str(Path(path).relative_to(project_root)) | ||
assert ".." not in rel_path # abspath should have taken care of that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extreme nitpick: dotdot../..in/path..components
is a perfectly valid relative path with ..
in it.
setuptools/command/build_ext.py
Outdated
path = os.path.abspath(dep) | ||
rel_path = str(Path(path).relative_to(project_root)) | ||
assert ".." not in rel_path # abspath should have taken care of that | ||
yield rel_path.replace(os.sep, "/") # POSIX-style relative paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar extreme nitpick: ./filename_with_\_character
is a valid POSIX filename. I think, the correct thing to do here is rel_path = whatever.as_posix()
instead of str(whatever).replace(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is good suggestion to use .as_posix()
I have a hard time to keep track of which version of Python each method in pathlib.Path
was introduced, but I am assuming that .as_posix()
should be available now for all Python versions that are still active.
Just note that in a POSIX system "./filename_with_\_character".replace(os.sep, "/")
will just return "./filename_with_\_character"
(that is how as_posix()
is implemented anyway).
Regarding log messages, I am fine with whatever level of verbosity you deem appropriate. Regarding extra tests, I'm unfortunately a bit busy with other stuff right now, so feel free to implement these tests yourself. |
Hi @RuRo, thank you very much for the review comments, I will iterate over them once I have the time.
Regarding the suggestion to use |
Yes, and as the docs point out, attempting to do path traversal without resolving symlinks is actually invalid, since
Yeah, in this case, I see no reason not to include |
That is a tricky one... In your original implementation you also used Let me think about that... what would be the best approach here? Probably the test case should be something like: # rushed and probably wrong pseudo-code to express brain dump
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h
ln -s /tmp/headers myheaders
ext_modules=Extension(..., depends=["myheaders/dir1/../dir1/f.h"]) This should be fine, because when producing the sdist we can copy Maybe I am overthinking this (and that is probably true). But I am trying to be mindful to not introduce backwards incompatibilities. |
Yes, my original implementation didn't resolve symlinks before path traversal, and I now believe that to be a mistake (although not a very significant one).
With the current implementation,
I think that we should keep symlink handling mostly outside the scope of this PR for now. The simple rule of "completely resolve the path specified in P.S. symlink handling seems to already be a known problem (#415), so I'm leaning heavily towards "don't try to be too clever with symlinks" for now |
Hi @RuRo thank you very much for the comments. Please find my take, inline below.
I understand the concerns and I agree that we can leave the scope of the PR as slim as possible. But only as long as we don't introduce backwards incompatibility in this PR.
People might have different opinions if it is correct or not, but this is the behaviour we observe nowadays: > docker run --rm -it python:3.8-bullseye /bin/bash
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h
rm -rf /tmp/headers /tmp/myproj
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h
mkdir /tmp/myproj
cd /tmp/myproj
ln -s /tmp/headers myheaders
cat <<EOF > pyproject.toml
[build-system]
requires = ["setuptools", "Cython"]
build-backend = "setuptools.build_meta"
EOF
cat <<EOF > setup.py
from setuptools import Extension, setup
setup(
name="myproj",
version="42",
ext_modules=[
Extension("hello", sources=["hello.pyx"], depends=["myheaders/dir1/../dir1/f.h"])
],
)
EOF
echo 'print("hello")' > hello.pyx
cat <<EOF > MANIFEST.in
global-include *.h
EOF
python -m venv /tmp/venv
/tmp/venv/bin/python -m pip install build
# ...
# Successfully built myproj-42.tar.gz and myproj-42-cp38-cp38-linux_x86_64.whl
/tmp/venv/bin/python -m build
tar tf dist/*.tar.gz
# myproj-42/
# myproj-42/MANIFEST.in
# myproj-42/PKG-INFO
# myproj-42/hello.pyx
# myproj-42/myheaders/
# myproj-42/myheaders/dir1/
# myproj-42/myheaders/dir1/f.h
# myproj-42/myproj.egg-info/
# myproj-42/myproj.egg-info/PKG-INFO
# myproj-42/myproj.egg-info/SOURCES.txt
# myproj-42/myproj.egg-info/dependency_links.txt
# myproj-42/myproj.egg-info/top_level.txt
# myproj-42/pyproject.toml
# myproj-42/setup.cfg
# myproj-42/setup.py So they are included without an error.
I am very happy to do that, as long as when I think that before merging the PR, we should make sure to not introduce backwards incompatible behaviour. Maybe the approach proposed ("completely resolve the path ...") already covers that.
My take on #415 is that it is asking for a change in behaviour in setuptools. The current behaviour seem to have been implemented via PR, so it is probably deliberate (someone thought it was better that way)1. Changing it might be controversial between groups of users (by Hyrum's law that is very likely someone depends on the current behaviour). So my approach would be to be compatible with the existing behaviour. Future changes can be implemented when #415 is handled (the how will also be important, a valid outcome of #415 is to have an "opt-out" configuration flag for following symlinks). Footnotes
|
To be clear, I am not suggesting that we split backwards compatibility into a separate PR. I simply don't see, how the presence of symlinks affects backwards compatibility in this case. By just resolving all paths and not adding any special logic for symlinks, we are (AT WORST) not including some of these paths in the manifest by default (just like they aren't included now).
(snip)
(snip)
The only reason, why
Yes, I currently don't see any (reasonable) way for this approach to result in an error where none were raised previously. Do you have something specific in mind? Basically, this would have to be a configuration that produces valid sdists today, but that would be broken by adding some in-project file to the manifest. |
This is what I did to double check everything:
So far very good, they don't break any existing build, and I am happy with it.
We don't necessarily have to support this use case (since it is contrived), but it is a shortcoming of the implementation using
I think it is completely fine to stop at 4 (we can revert/reset the other commits). I am happy with it because we have regression tests that prove no error occurs. If we want to go the extra mile, some code is there (77a1295 to cf2bda6). We might want to review if it has other flaws. But that is not necessary. |
44d727f
to
379cbe3
Compare
Hi. Sorry to continue bikeshedding this PR, but I have just looked over all the implementation attempts so far (both yours and mine), and I am really not satisfied with any of them. This PR was supposed to be super small and simple (both to implement and to reason about), but I think that we've ended up overcomplicating it quite a bit. I would like to "start over" by explicitly listing the scope, expectations and requirements for this PR. After we agree on that, we can make sure that we have test coverage for all expected cases. This way, we can make sure that our implementation is actually sane and that we haven't overengineered or overlooked anything. Again, sorry to block this. I'll try writing up the "spec" later today. |
No problems. Thank you very much @RuRo for having a look on this. Please feel free to open a separated PR if you think that is a better way of starting from scratch. |
Okay, here goes. I think, that the main thing that I don't like about the current approaches is that we've identified a bunch of cases, where "We currently don't understand, why would anyone do this" and "This doesn't make sense" and yet we are attempting to support these "weird" cases. In my opinion, this approach is wrong because we have no way of evaluating if what we are doing makes sense. Even more importantly, by adding these files to the manifest and sdist, we are introducing new behaviour that people may start to rely on. If we later decide that we want a different behaviour for one of these "weird" cases, we would be forced to keep compatibility with current behaviour. Currently, none of the files in In particular, I'd say that "we are 100% sure" that we should include files that satisfy all of these conditions:
Note, that these are requirements for automatic inclusion in the manifest, not for any other functionality that Here are some concrete examples and my justifications for each of these restrictions:
We can revisit any of these restrictions if/when we find an actual example "in the wild" where someone wants to include such a non-compliant path in @abravalheri thoughts? |
@abravalheri I pushed an updated implementation with the described logic. I decided not to rewrite the history in case you still need it. Feel free to rebase/squash the commits as you see fit. I removed the I also added a couple of new tests that verify some of the edge cases, that I've outlined in the "spec". |
Thank you very much @RuRo. This approach should be fine. Also very happy with the extensive testing. |
Co-authored-by: Anderson Bravalheri <andersonbravalheri+github@gmail.com>
Ugh. Apparently the stdlib version of Edit: For now, I just marked the test as |
Thank you very much for the hard work and thoughtful conversations! |
if path.is_absolute(): | ||
skip(dep, "must be relative") | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question #4181
Summary of changes
Closes #2565.
According to the documentation,
Extension.depends
is a "list of files that the extension depends on". The old distutils docs expand on this slightly with a concrete example:Currently, this doesn't quite work out of the box, because the files listed in
depends
may not be included in thesdist
. This is becausebuild_ext.get_source_files
only considers theExtension.sources
attribute. I believe this to be a mistake.Consider the docstring of
get_source_files
on theSubCommand
protocol:Clearly,
Extension.depends
are also "necessary to build the distribution" and so they should be included in thesdist
.This PR extends
get_source_files
to also includeExtension.depends
.Pull Request Checklist
newsfragments/
.(See documentation for details)
P.S. Apparently, the local tests (running
tox
) fail, unless the localsetuptools
repo has anorigin
remote that useshttps
. Either not having a remote namedorigin
or using ssh (git@github.com:pypa/setuptools
) for the remote causes some weird errors inpytest-perf
. This should probably be fixed or at least documented in the Developer's Guide.