Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Extension.depends in manifests and sdists by default #4000

Merged
merged 24 commits into from
Aug 15, 2023

Conversation

RuRo
Copy link
Contributor

@RuRo RuRo commented Aug 4, 2023

Summary of changes

Closes #2565.

According to the documentation, Extension.depends is a "list of files that the extension depends on". The old distutils docs expand on this slightly with a concrete example:

The depends option is a list of files that the extension depends on (for example header files). The build command will call the compiler on the sources to rebuild extension if any on this files has been modified since the previous build.

Currently, this doesn't quite work out of the box, because the files listed in depends may not be included in the sdist. This is because build_ext.get_source_files only considers the Extension.sources attribute. I believe this to be a mistake.

Consider the docstring of get_source_files on the SubCommand protocol:

Return a list of all files that are used by the command to create the expected outputs. For example, if your build command transpiles Java files into Python, you should list here all the Java files. The primary purpose of this function is to help populating the sdist with all the files necessary to build the distribution. All files should be strings relative to the project root directory.

Clearly, Extension.depends are also "necessary to build the distribution" and so they should be included in the sdist.

This PR extends get_source_files to also include Extension.depends.

Pull Request Checklist

P.S. Apparently, the local tests (running tox) fail, unless the local setuptools repo has an origin remote that uses https. Either not having a remote named origin or using ssh (git@github.com:pypa/setuptools) for the remote causes some weird errors in pytest-perf. This should probably be fixed or at least documented in the Developer's Guide.

@RuRo
Copy link
Contributor Author

RuRo commented Aug 4, 2023

Oooh! A nice and round PR number. Lucky!

@abravalheri
Copy link
Contributor

Thank you very much @RuRo. Nice way of using the build.SubCommand protocol.

This would be mostly related to header right?
I haven't tested, but I imagine that the other parts of setuptools already take care if the user tries to use depends=... on a file that is outside of the project directory, right?

P.S. Apparently, the local tests (running tox) fail, unless the local setuptools repo has an origin remote that uses https. Either not having a remote named origin or using ssh (git@github.com:pypa/setuptools) for the remote causes some weird errors in pytest-perf. This should probably be fixed or at least documented in the Developer's Guide.

Yeah, pytest-perf is a bit tricky to use when contributing. I always run tox -- -p no:perf in my local computer to side step that. Maybe we should disable it by default and only use it in the CI...

@RuRo
Copy link
Contributor Author

RuRo commented Aug 5, 2023

@abravalheri regarding absolute paths or paths outside the project root, I am honestly not sure. It is not clear to me, what such a path should even signify (in either sources or depdends).

When building sdists such paths make absolutely no sense. There is no way to include these paths in the source distribution. I think, that all cases where absolute paths are needed are already covered by include_dirs, library_dirs, etc. Before this PR, adding absolute paths to sources produced an error, and all paths in depends were just ignored. With the changes proposed in this PR, depends will also start giving errors:

  error: Error: setup script specifies an absolute path:

      /path/to/some/dependency.h

  setup() arguments must *always* be /-separated paths relative to the
  setup.py directory, *never* absolute paths.

If you want, I can change it so that get_source_files only adds relative paths from Extension.depends instead of all paths.


When building wheels, such paths also don't make much sense, but I am pretty sure, that build_ext.build_extension and CCompiler.compile already have their own logic for handling sources and depends, and they don't use the build_ext.get_source_files function. So the behaviour for building wheels shouldn't be affected by this PR.

@abravalheri
Copy link
Contributor

abravalheri commented Aug 7, 2023

Before this PR, adding absolute paths to sources produced an error, and all paths in depends were just ignored. With the changes proposed in this PR, depends will also start giving errors:

   error: Error: setup script specifies an absolute path:

      /path/to/some/dependency.h

  setup() arguments must *always* be /-separated paths relative to the
  setup.py directory, *never* absolute paths.

Perfect, I think that as long as we have something avoiding the user from adding a file from outside the repo, that should be fine. (As you said, it does not make much sense to have those things in a wheel or sdist). Thank you very much for checking.

I was in doubt, because I have never used depends myself, so I was wondering if there could be chance something like /usr/include/math.h ends up in there. But I did a quick search and that does not seem to be the case (see https://grep.app/search?q=depends%3D&filter%5Bpath.pattern%5D%5B0%5D=setup.py).

However, I did find a few examples that use files from outside the directory that contains setup.py:
https://github.com/python/mypy/blob/5617cdd03d12ff73622c8d4b496979e0377b1675/mypyc/lib-rt/setup.py#L39. Not sure how to do with that, ideally we should not introduce backwards incompatible behaviour. Do you have any suggestion?

@RuRo
Copy link
Contributor Author

RuRo commented Aug 8, 2023

I was in doubt, because I have never used depends myself, so I was wondering if there could be chance something like /usr/include/math.h ends up in there.

As far as my understanding goes, stuff like /usr/include should be specified via include_dirs. The depends argument actually gets appended to the sources list, when passed to the CCompiler and unlike include_dirs, CCompiler seems to expect that all of its sources are "user owned". For example, it will rely on timestamps of these files to determine when to rebuild stuff and then attempt to write .o files next to the provided sources.

But I did a quick search and that does not seem to be the case (see https://grep.app/search?q=depends%3D&filter%5Bpath.pattern%5D%5B0%5D=setup.py).

Actually, it seems that even absolute paths are sometimes used in depends. For example, here in scipy. Although in this case, it's a path inside the project root that was simply coerced to an absolute path via abspath for some reason.

However, I did find a few examples that use files from outside the directory that contains setup.py: https://github.com/python/mypy/blob/5617cdd03d12ff73622c8d4b496979e0377b1675/mypyc/lib-rt/setup.py#L39.

In this case, it doesn't seem like this package is intended to be distributed in a sdist on its own, so I think that this kind of light abuse is outside the scope of this PR.

Not sure how to do with that, ideally we should not introduce backwards incompatible behaviour. Do you have any suggestion?

I think, that the most safe behaviour would be:

  1. during sdist creation (aka in get_source_files)

    • try to convert all paths to be relative to setup.py
    • include all the paths "inside" the project root in manifest and sdist
    • ignore all paths outside the project root (starting with ../)
  2. during actual build

    • keep current behaviour or in other words:
      • keep all the paths exactly as specified by user in setup.py
      • it is the users' responsibility to ensure that any paths "outside" of the project root exist

This should at the very least not break any currently working builds (even if they do some dark magic/nonsense with depends).

P.S. Do you know, what is the canonical way to obtain the path to the project root (the directory with setup.py/setup.cfg/pyproject.toml)?

@abravalheri
Copy link
Contributor

abravalheri commented Aug 8, 2023

I think, that the most safe behaviour would be:

  1. during sdist creation (aka in get_source_files)

    • try to convert all paths to be relative to setup.py
    • include all the paths "inside" the project root in manifest and sdist
    • ignore all paths outside the project root (starting with ../)

Thank you very much @RuRo, I agree overall with this approach.
For example, I was considering that build_ext.get_source_files could filter out any Extension.depends paths outside of the project root1.

P.S. Do you know, what is the canonical way to obtain the path to the project root (the directory with setup.py/setup.cfg/pyproject.toml)?

I believe that is probably: distribution.src_root or os.curdir.

Footnotes

  1. We can just log.debug saying that they were ignored for being outside of the project folder.

@RuRo
Copy link
Contributor Author

RuRo commented Aug 8, 2023

I added the slightly more relaxed logic for better backwards compatibility and warning messages as discussed. Although, I used log.warn instead of log.debug. Let me know, if you think, that I should change it to log.debug.

@RuRo RuRo force-pushed the bugfix/extension_depends_in_sdist branch from cdc0ddb to 43abba2 Compare August 9, 2023 17:27
@abravalheri
Copy link
Contributor

abravalheri commented Aug 10, 2023

Thank you very much @RuRo, I added a commit that simplifies a little bit the checks and another one changing the changelog entry. I hope that is OK with you.

To be honest I prefer the log messages you have proposed because they give more clear indications to the user. However I am a bit worried that we can have a backlash of the community in the style "don't tell me what to do, I know what I am doing" (after all this is a feature used by power users). Another reason why I made it a oneliner is because projects that use external depends, are likely to have more than one (e.g. mypyc/lib-rt has 2, scipy can also have a couple), and oneliners are OK to repeat multiple times (imho).

Trying to choose the battles to fight and save up the setuptools churn budget for other moments... In this case I don't think we have to prompt the user to change, I just want the logs to have a clear indication of what is happening so people have the chance to debug. In the end of the day, what you said previously is important "building sdists such paths make absolutely no sense".

I plan to add a test that covers both "../xxxxx" paths and "/x/y/z" paths, just for the sake of completeness. But feel free to beat me to that if you have the time.

Copy link
Contributor Author

@RuRo RuRo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, if we are already using pathlib here, we should probably go all the way with it:

project_root = Path(self.distribution.src_root or os.curdir).resolve()

and then

abs_path = (project_root / dep).resolve() # this handles both absolute and relative `dep`s
rel_path = abs_path.relative_to(project_root).as_posix()
yield rel_path

is_excluded = os.path.commonpath([d_abs, root]) != root
return is_absolute, is_excluded, d_rel

path = os.path.abspath(dep)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dep is not an absolute path, abspath will resolve it relative to the CWD. It should be treated as relative to project_root instead.

I am not sure, under which circumstances can project_root be different from os.curdir, but given that we are using self.distribution.src_root when it's not None, I am assuming that this can in fact happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, thank you very much for the review @RuRo


path = os.path.abspath(dep)
rel_path = str(Path(path).relative_to(project_root))
assert ".." not in rel_path # abspath should have taken care of that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extreme nitpick: dotdot../..in/path..components is a perfectly valid relative path with .. in it.

path = os.path.abspath(dep)
rel_path = str(Path(path).relative_to(project_root))
assert ".." not in rel_path # abspath should have taken care of that
yield rel_path.replace(os.sep, "/") # POSIX-style relative paths
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar extreme nitpick: ./filename_with_\_character is a valid POSIX filename. I think, the correct thing to do here is rel_path = whatever.as_posix() instead of str(whatever).replace(...).

Copy link
Contributor

@abravalheri abravalheri Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is good suggestion to use .as_posix()

I have a hard time to keep track of which version of Python each method in pathlib.Path was introduced, but I am assuming that .as_posix() should be available now for all Python versions that are still active.

Just note that in a POSIX system "./filename_with_\_character".replace(os.sep, "/") will just return "./filename_with_\_character" (that is how as_posix() is implemented anyway).

@RuRo
Copy link
Contributor Author

RuRo commented Aug 10, 2023

Regarding log messages, I am fine with whatever level of verbosity you deem appropriate.

Regarding extra tests, I'm unfortunately a bit busy with other stuff right now, so feel free to implement these tests yourself.

@abravalheri
Copy link
Contributor

Hi @RuRo, thank you very much for the review comments, I will iterate over them once I have the time.

abs_path = (project_root / dep).resolve() # this handles both absolute and relative deps

Regarding the suggestion to use resolve, the reason why I intentionally went with abspath instead is because resolve is going to expand symlinks (right?). At this stage I don't have any personal problems if people want to symlink something like /opt/x/include/y.h to project_root/y.h. Maybe later in the setuptools pipeline there is a check for that, but I don't recall this being a validation requirement (I might be wrong).

@RuRo
Copy link
Contributor Author

RuRo commented Aug 10, 2023

Regarding the suggestion to use resolve, the reason why I intentionally went with abspath instead is because resolve is going to expand symlinks (right?).

Yes, and as the docs point out, attempting to do path traversal without resolving symlinks is actually invalid, since foo/symlink/.. should refer to "the parent of whatever symlink points to", not foo.

At this stage I don't have any personal problems if people want to symlink something like /opt/x/include/y.h to project_root/y.h. Maybe later in the setuptools pipeline there is a check for that, but I don't recall this being a validation requirement (I might be wrong).

Yeah, in this case, I see no reason not to include project_root/y.h in the sdist. This example is firmly in the "why would anybody do something like that" territory, but imho it's not worth introducing extra special cases for it (or at the very least outside the scope of this PR).

@abravalheri
Copy link
Contributor

Yes, and as the docs point out, attempting to do path traversal without resolving symlinks is actually invalid, since foo/symlink/.. should refer to "the parent of whatever symlink points to", not foo.

That is a tricky one... In your original implementation you also used os.path.normpath and os.path.abspath that will resolve the paths using logical operations (instead of traversing the file system), right? So I suppose it is subject to the same shortcomings... (Please correct me if I am wrong).

Let me think about that... what would be the best approach here?

Probably the test case should be something like:

# rushed and probably wrong pseudo-code to express brain dump
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h
ln -s /tmp/headers myheaders
ext_modules=Extension(..., depends=["myheaders/dir1/../dir1/f.h"])

This should be fine, because when producing the sdist we can copy /tmp/headers as myheaders inside the tar.gz. Then later (when creating the wheel from the sdist), myheaders/dir1/../dir1/f.h would translate to <sdist_root>/myheaders/dir/f.h, which would build happily.

Maybe I am overthinking this (and that is probably true). But I am trying to be mindful to not introduce backwards incompatibilities.

@RuRo
Copy link
Contributor Author

RuRo commented Aug 10, 2023

That is a tricky one... In your original implementation you also used os.path.normpath and os.path.abspath that will resolve the paths using logical operations (instead of traversing the file system), right? So I suppose it is subject to the same shortcomings... (Please correct me if I am wrong).

Let me think about that... what would be the best approach here?

Yes, my original implementation didn't resolve symlinks before path traversal, and I now believe that to be a mistake (although not a very significant one).

Probably the test case should be something like:

# rushed and probably wrong pseudo-code to express brain dump
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h
ln -s /tmp/headers myheaders
ext_modules=Extension(..., depends=["myheaders/dir1/../dir1/f.h"])

This should be fine, because when producing the sdist we can copy /tmp/headers as myheaders inside the tar.gz. Then later (when creating the wheel from the sdist), myheaders/dir1/../dir1/f.h would translate to <sdist_root>/myheaders/dir/f.h, which would build happily.

With the current implementation, myheaders shouldn't be included in the sdist, because /tmp/headers/dir1/f.h is not inside the project root (and I think, that this is a perfectly sane behaviour for now).

Maybe I am overthinking this (and that is probably true). But I am trying to be mindful to not introduce backwards incompatibilities.

I think that we should keep symlink handling mostly outside the scope of this PR for now. The simple rule of "completely resolve the path specified in depends, if it ends up being inside the project root, include the resolved path in the sdist" should be mostly backwards compatible, and it would cover the most common/sane use case for depends (in-project header files).

P.S. symlink handling seems to already be a known problem (#415), so I'm leaning heavily towards "don't try to be too clever with symlinks" for now

@abravalheri
Copy link
Contributor

abravalheri commented Aug 10, 2023

Hi @RuRo thank you very much for the comments. Please find my take, inline below.

I think that we should keep symlink handling mostly outside the scope of this PR for now.

I understand the concerns and I agree that we can leave the scope of the PR as slim as possible. But only as long as we don't introduce backwards incompatibility in this PR.
(Unfortunately, GitHub is not great for splitting multiple dependent PRs, so splitting compatibility concerns in other PRs is not a good option).

With the current implementation, myheaders shouldn't be included in the sdist, because /tmp/headers/dir1/f.h is not inside the project root (and I think, that this is a perfectly sane behaviour for now).

People might have different opinions if it is correct or not, but this is the behaviour we observe nowadays:

> docker run --rm -it python:3.8-bullseye /bin/bash
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h

rm -rf /tmp/headers /tmp/myproj
mkdir -p /tmp/headers/dir1
touch /tmp/headers/dir1/f.h

mkdir /tmp/myproj
cd /tmp/myproj
ln -s /tmp/headers myheaders

cat <<EOF > pyproject.toml
[build-system]
requires = ["setuptools", "Cython"]
build-backend = "setuptools.build_meta"
EOF

cat <<EOF > setup.py
from setuptools import Extension, setup

setup(
    name="myproj",
    version="42",
    ext_modules=[
        Extension("hello", sources=["hello.pyx"], depends=["myheaders/dir1/../dir1/f.h"])
    ],
)
EOF
echo 'print("hello")' > hello.pyx
cat <<EOF > MANIFEST.in
global-include *.h
EOF

python -m venv /tmp/venv
/tmp/venv/bin/python -m pip install build
# ...
# Successfully built myproj-42.tar.gz and myproj-42-cp38-cp38-linux_x86_64.whl
/tmp/venv/bin/python -m build
tar tf dist/*.tar.gz
# myproj-42/
# myproj-42/MANIFEST.in
# myproj-42/PKG-INFO
# myproj-42/hello.pyx
# myproj-42/myheaders/
# myproj-42/myheaders/dir1/
# myproj-42/myheaders/dir1/f.h
# myproj-42/myproj.egg-info/
# myproj-42/myproj.egg-info/PKG-INFO
# myproj-42/myproj.egg-info/SOURCES.txt
# myproj-42/myproj.egg-info/dependency_links.txt
# myproj-42/myproj.egg-info/top_level.txt
# myproj-42/pyproject.toml
# myproj-42/setup.cfg
# myproj-42/setup.py

So they are included without an error.
The implementation in the PR should result in a similar outcome.

The simple rule of "completely resolve the path specified in depends, if it ends up being inside the project root, include the resolved path in the sdist" should be mostly backwards compatible, and it would cover the most common/sane use case for depends (in-project header files).

I am very happy to do that, as long as when sdist runs it does not raise an error for a configuration that previously would be fine.

I think that before merging the PR, we should make sure to not introduce backwards incompatible behaviour. Maybe the approach proposed ("completely resolve the path ...") already covers that.

P.S. symlink handling seems to already be a known problem (#415), so I'm leaning heavily towards "don't try to be too clever with symlinks" for now

My take on #415 is that it is asking for a change in behaviour in setuptools. The current behaviour seem to have been implemented via PR, so it is probably deliberate (someone thought it was better that way)1. Changing it might be controversial between groups of users (by Hyrum's law that is very likely someone depends on the current behaviour).

So my approach would be to be compatible with the existing behaviour. Future changes can be implemented when #415 is handled (the how will also be important, a valid outcome of #415 is to have an "opt-out" configuration flag for following symlinks).

Footnotes

  1. Due to the migration to GitHub is a bit hard to keep track of PR numbers and links.

@RuRo
Copy link
Contributor Author

RuRo commented Aug 10, 2023

I understand the concerns and I agree that we can leave the scope of the PR as slim as possible. But only as long as we don't introduce backwards incompatibility in this PR. (Unfortunately, GitHub is not great for splitting multiple dependent PRs, so splitting compatibility concerns in other PRs is not a good option).

To be clear, I am not suggesting that we split backwards compatibility into a separate PR. I simply don't see, how the presence of symlinks affects backwards compatibility in this case. By just resolving all paths and not adding any special logic for symlinks, we are (AT WORST) not including some of these paths in the manifest by default (just like they aren't included now).


With the current implementation, myheaders shouldn't be included in the sdist, because /tmp/headers/dir1/f.h is not inside the project root (and I think, that this is a perfectly sane behaviour for now).

People might have different opinions if it is correct or not, but this is the behaviour we observe nowadays:

(snip)

cat <<EOF > MANIFEST.in
global-include *.h
EOF

(snip)

So they are included without an error. The implementation in the PR should result in a similar outcome.

The only reason, why myheaders is included in the sdist in your example is that you have manually added *.h to the manifest. This PR doesn't/shouldn't change the behaviour of how the manifest is interpreted, only how the "default" manifest paths are generated based on the contents of Extension.depends.


I am very happy to do that, as long as when sdist runs it does not raise an error for a configuration that previously would be fine.

I think that before merging the PR, we should make sure to not introduce backwards incompatible behaviour. Maybe the approach proposed ("completely resolve the path ...") already covers that.

Yes, I currently don't see any (reasonable) way for this approach to result in an error where none were raised previously. Do you have something specific in mind? Basically, this would have to be a configuration that produces valid sdists today, but that would be broken by adding some in-project file to the manifest.

@abravalheri
Copy link
Contributor

abravalheri commented Aug 10, 2023

This is what I did to double check everything:

  1. I added the regression tests 29a00ee and tested against main unchanged (passing)
  2. I cherry-picked the original contribution from aefe73b to 43abba2 and run the regression tests (passing)
  3. I cherry-picked the changes I added (817edb4 to 44d727f) and run the regression tests (passing)
  4. I added the suggestions including Path.resolve (973f7f4 to 87d5e9c) and run the regression tests (passing)

So far very good, they don't break any existing build, and I am happy with it.

  1. For curiosity I decided to check what would be the outcome of automatically adding depends when using symlinks.
    So I added 77a1295 (failing).

We don't necessarily have to support this use case (since it is contrived), but it is a shortcoming of the implementation using Path.resolve (considering that the current behaviour of setuptools is being OK with people using symlinks).

  1. Then I added cf2bda6 to see what a solution would look like that adds the automatic inclusion of files even when people use symlinks.
    Running the tests again, they pass.

  2. Then I added more tests to check what happens if a given depends does not exist (no symlink here, just checking an edge case for completeness) in 379cbe3.


I think it is completely fine to stop at 4 (we can revert/reset the other commits). I am happy with it because we have regression tests that prove no error occurs.

If we want to go the extra mile, some code is there (77a1295 to cf2bda6). We might want to review if it has other flaws. But that is not necessary.

@abravalheri abravalheri force-pushed the bugfix/extension_depends_in_sdist branch from 44d727f to 379cbe3 Compare August 10, 2023 19:54
@RuRo
Copy link
Contributor Author

RuRo commented Aug 11, 2023

Hi. Sorry to continue bikeshedding this PR, but I have just looked over all the implementation attempts so far (both yours and mine), and I am really not satisfied with any of them. This PR was supposed to be super small and simple (both to implement and to reason about), but I think that we've ended up overcomplicating it quite a bit.

I would like to "start over" by explicitly listing the scope, expectations and requirements for this PR. After we agree on that, we can make sure that we have test coverage for all expected cases. This way, we can make sure that our implementation is actually sane and that we haven't overengineered or overlooked anything.

Again, sorry to block this. I'll try writing up the "spec" later today.

@abravalheri
Copy link
Contributor

abravalheri commented Aug 11, 2023

No problems. Thank you very much @RuRo for having a look on this. Please feel free to open a separated PR if you think that is a better way of starting from scratch.

@RuRo
Copy link
Contributor Author

RuRo commented Aug 12, 2023

Okay, here goes.

I think, that the main thing that I don't like about the current approaches is that we've identified a bunch of cases, where "We currently don't understand, why would anyone do this" and "This doesn't make sense" and yet we are attempting to support these "weird" cases.

In my opinion, this approach is wrong because we have no way of evaluating if what we are doing makes sense. Even more importantly, by adding these files to the manifest and sdist, we are introducing new behaviour that people may start to rely on. If we later decide that we want a different behaviour for one of these "weird" cases, we would be forced to keep compatibility with current behaviour.


Currently, none of the files in Extension.depends are added to the manifest by default. I think, that in this PR we should add to the manifest only those paths from Extension.depends for which we are 100% sure that we understand that we are doing "the right thing™". For all other paths, we do a log.info informing the user, that this file will not be automatically included in the manifest (not a warning, because might be a "valid" condition that doesn't need "fixing" by the user).

In particular, I'd say that "we are 100% sure" that we should include files that satisfy all of these conditions:

  1. the path is a relative path
  2. the path doesn't have any .. path traversal components
  3. the path physically resolves to some path that exists
  4. the resolved path is still located inside the project root

Note, that these are requirements for automatic inclusion in the manifest, not for any other functionality that depends already offers. A path that doesn't satisfy one of the above conditions would be still allowed in Extension.depends and would behave exactly as it does today.

Here are some concrete examples and my justifications for each of these restrictions:

  1. This is already a requirement for sources.

    Note, that I am proposing to refuse all absolute paths. Previously, I wanted to allow the case where the absolute path happens to end up referring to a file inside the project. This was an attempt to accommodate cases along the lines of Extension(depends=[os.path.abspath(valid_relative_path)]), but now I think that this was a mistake. We have no way to tell if /home/user/the_project_dir/somefile.h is os.path.abspath("./somefile.h") or just a hardcoded absolute path, so "In the face of ambiguity, refuse the temptation to guess".

  2. This requirement might seem to be overly restrictive, but while testing, I have found that the current manifest logic doesn't really handle paths with .. traversal well. For example, the following example that has a valid path with .. in its manifest actually fails during sdist creation:

    mkdir foo
    touch bar
    touch pyproject.toml
    
    cat <<EOF >MANIFEST.in
    include foo/../bar
    EOF
    
    python -m build -s .

    This means, that we can't just blindly pass unnormalized paths to the manifest system, as that might lead to backwards compatibility problems (the project used to build, but now fails). Fixing such problems in the manifest system is definitely outside the scope of this PR (and I personally currently don't have the time for it).

    In addition to this, properly handling .. path traversal when symlinks are involved is surprisingly hard because symlinks are converted to regular folders during sdist creation. For example, imagine that some_symlink points to deep/subfolder/stucture. Then, the user then tries to include some_symlink/../../bar (this file actually exists with the canonical name deep/bar). In this case, there is no way to represent this path in a sdist without symlinks, because logically some_symlink/../.. tries to traverse outside the root of the package.

  3. Adding non-existent paths to the manifest seems to cause them to be silently ignored, but I would still prefer to be conservative here and skip such paths just in case. Additionally, if the path doesn't exist, then we can't be 100% sure that we have resolved the path correctly and so without requirement 3, we can't check for requirement 4.

  4. This requirement is probably the "weakest" in my opinion (meaning that you can probably convince me to relax it, if you can articulate why this should be allowed).

    Regarding not including symlinks to absolute paths, the main motivation here is that we should avoid trying to include stuff that we are not sure about. For example, I can totally imagine a case where some_symlink is a helper symlink to something like /opt/cuda and then some dark magic is used to create that symlink at runtime (even during an isolated wheel build). This is basically like specifying an absolute path with extra steps. Why would anybody do such a thing and add it to Extension.depends is an open question, but at this point I think that automatically including these files in the manifest/sdist would most likely be wrong.

We can revisit any of these restrictions if/when we find an actual example "in the wild" where someone wants to include such a non-compliant path in depends (after they explain, what prevents them from making it compliant with the above restrictions).


@abravalheri thoughts?

@RuRo
Copy link
Contributor Author

RuRo commented Aug 12, 2023

@abravalheri I pushed an updated implementation with the described logic. I decided not to rewrite the history in case you still need it. Feel free to rebase/squash the commits as you see fit.

I removed the test_auto_include_symlinked_depends test, as it attempts to assert some things that are no longer true in the new implementation. I didn't touch the tests in TestRegressions as they are basically unrelated to the changes introduced in this PR.

I also added a couple of new tests that verify some of the edge cases, that I've outlined in the "spec".

setuptools/command/build_ext.py Show resolved Hide resolved
setuptools/command/build_ext.py Outdated Show resolved Hide resolved
@abravalheri
Copy link
Contributor

abravalheri commented Aug 14, 2023

Thank you very much @RuRo. This approach should be fine. Also very happy with the extensive testing.

Co-authored-by: Anderson Bravalheri <andersonbravalheri+github@gmail.com>
@RuRo
Copy link
Contributor Author

RuRo commented Aug 14, 2023

Ugh. Apparently the stdlib version of distutils.log doesn't use the logging module, but instead just writes directly to stdout/stderr, so we can't easily capture/introspect the logs with caplog.

Edit: For now, I just marked the test as skipif when the distutils module comes from the stdlib. IMHO, it's not really worth rewriting this test to accommodate the stdlib version of distutils.log. Unless we already have some kind of mock/wrapper for testing distutils.log, we'd have to capture the raw stdin/stdout and then parse out the expected logging line or manually mock the logging API in distutils. It would be more fragile and we wouldn't be able to verify which logging level is used for the message.

@abravalheri abravalheri merged commit 3c25cdd into pypa:main Aug 15, 2023
@abravalheri
Copy link
Contributor

Thank you very much for the hard work and thoughtful conversations!

Comment on lines +284 to +286
if path.is_absolute():
skip(dep, "must be relative")
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question #4181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

header files not included in tar.gz file
3 participants