Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following symlinks should be optional #415

Open
ghost opened this issue Aug 17, 2015 · 14 comments
Open

Following symlinks should be optional #415

ghost opened this issue Aug 17, 2015 · 14 comments

Comments

@ghost
Copy link

ghost commented Aug 17, 2015

Originally reported by: untitaker (Bitbucket: untitaker, GitHub: untitaker)


In #105 a change was made to follow symlinks while discovering package content. I propose that this behavior should be made optional or reverted. In my case my project directory accidentally included a symlink to another, very large project, and I ended up including a complete installation of ownCloud into a PyPI upload.

I'm willing to provide a patch for this.

This is a continuation of https://bitbucket.org/pypa/pypi/issues/293/re-uploading-of-releases-is-completely


@ghost
Copy link
Author

ghost commented Aug 23, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Is your reference to #105 correct? I don't see how changes to the terminal encoding would affect file discovery.

I'm somewhat disinclined to add complexity to setuptools to not traverse paths for inclusion simply because they're symbolically linked.

I acknowledge that this behavior might have been different in the past, but that alone is not sufficient to restore the behavior.

Nevertheless, I believe more discussion may sway my initial instinct. Let's answer these questions.

  1. Where/when was the change made that affected package discovery?
  2. A flag or other optional support would work for this one use-case (with symlinks), but wouldn't prevent accidental inclusion of other unintentionally-present content (hardlinked directories, copied trees). Why should symlinks be given special consideration?
  3. Does the existing support that uses MANIFEST.in or SCM-based file finders provide a suitable solution?

If after answering these questions, you still feel there's a good reason to alter the implementation to support your proposal, let's talk about how you would implement a solution. Describe your approach or submit a PR - whatever is easiest.

@ghost
Copy link
Author

ghost commented Aug 23, 2015

Original comment by untitaker (Bitbucket: untitaker, GitHub: untitaker):


Sorry, I meant https://bitbucket.org/pypa/setuptools/pull-requests/105/fix-277-data-files-in-symbol-link

The only rationale I can give is that mistakes like the one I made wouldn't happen if symlinks weren't followed. At the same time I totally see the usecase for following symlinks.

I'm not sure how a compromise would look like. I also briefly thought about requiring symlinks to be manually whitelisted in MANIFEST.in, but that seems more confusing.

Perhaps the real solution is to warn about overly large PyPI tarballs, or somehow show the size after upload?

@ghost
Copy link
Author

ghost commented Aug 23, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Warning about excessive uploads is something that a distutils command could easily do. If you model after the upload command in distutils, you can see how the distribution or distributions generated by previous commands are then uploaded to PyPI. Consider creating a new command "check_upload_sizes" or similar which checks that file sizes are of some reasonable size and raises an error otherwise. This command can be implemented as a third-party package, advertise a distutils.commands entry point, be included as a setup_requires requirement on your packages (or others who desire that functionality), and invoked as part of the release process, such as python setup.py sdist bdist_wheel check_upload_sizes upload.

@ghost
Copy link
Author

ghost commented Aug 24, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I believe the existing MANIFEST.in templates and SCM-based file finders provide adequate techniques for including/excluding files from the prepared distribution and more custom behavior can be vetted first as a third-party plugin as described above.

@ghost
Copy link
Author

ghost commented Oct 24, 2015

Original comment by benoit-pierre (Bitbucket: benoit-pierre, GitHub: benoit-pierre):


The problem is not just about including unwanted files: traversing those symlinked directories can be an issue too. In my case, I have Wine prefixes under my source tree, configured to use Windows specific tools, and since Wine creates various symlinks for its dos devices emulation (e.g.: e: -> /media/storage, or z: -> //), this result in setuptools traversing my whole file system...

That's why symlinks (and mountpoints) are special: they can point outside the source tree (and create loops too).

@ghost
Copy link
Author

ghost commented Oct 26, 2015

Original comment by avdd (Bitbucket: avdd, GitHub: avdd):


+1
Blindly following symlinks is very bad.
MANIFEST.in is insufficient because egg_info will do a full walk before pruning per the template.

@abadger
Copy link
Contributor

abadger commented Jul 20, 2017

‎We have an additional use case in our project. We have our own plugin system there. Sometimes we want to rename a plugin but leave the old name as an alias for a few releases as a deprecation period. To do that, we rename the file with the actual code and then we symlink to that file with the old filename. Whenever our plugin system runs across a symlink, it knows that it's dealing with an alias and can do things like show the alias in the docs, display a deprecation warning when the plugin is used via the old name, etc.

If we had a way to mark these symlinks as not to be expanded or a way to toggle that behaviour for all files in the run we'd be able to use that to enable our use-case.

distutils currently hardcodes non-expansion of symlinks in sdists for tarfiles. zip files expand the symlinks but it looks like a bug: https://bugs.python.org/issue12585 it looks like it's something people desire to make the same but currently there isn't an obvious fix for the bug so the behaviour is asymmetric between tars and zips.

It looks like even distutils expands the symlinks when installing the package. So I'd probably still have to do more work to make it do what we need at install-time.

Our software requires a *nix like system so symlinks are supported all the places we care about. This might not be the case for other projects. being togglable in some manner would allow projects to choose but of course, then setuptools has to support both code paths.

@abadger
Copy link
Contributor

abadger commented Sep 10, 2017

Found a second use-case for including symlinks without dereferencing. We have some code that copies files and has to handle symlinks appropriately. So we have some integration tests that use a circular symlink structure to test that our code behaves properly.

distutils/setuptools is expanding the symlinks in our integration tests during sdist build time which wastes a bit of space in the tarball and also renders the test useless for consumers of the tarball.

@abadger
Copy link
Contributor

abadger commented Sep 10, 2017

/cc @jaraco

@gtristan
Copy link

gtristan commented Mar 8, 2018

I am curious about this as it presents a bug in my project:
https://gitlab.com/BuildStream/buildstream/issues/284

Can I have a clear message here ? Are we allowed to package symlinks in setuptools source distributions or not ? Do we have to work around this permanently in downstream projects ?

In our case, we have a lot of code which deals with staging directories from one place to another, and we started working around this when this broke in setuptools; we don't actually need to install symlinks as of yet, but we like to include them in our git repository for our test suite; which ensures that our code which handles symlinks works correctly.

The side effect is that running the test suite only works from git, but not from a source distribution.

There are certainly other use cases for "having symlinks", and setuptools ignoring that is not very nice, there is no reason as far as I can see to follow them, either:

  • When packaging the tarball recursively, as long as the symlink refers to something inside the tarball and is not a broken symlink, then one can rest assured that the symlink's target will also be in the tarball
  • When the symlink is a broken symlink, this can either be:
    • Intentional on behalf of the maintainer, I may very well need to package a symlink which is broken
    • An error on behalf of the maintainer, which the maintainer should correct by ensuring the symlink's target is present at sdist time and in the MANIFEST

@gtristan
Copy link

gtristan commented Mar 8, 2018

Sorry for re-commenting, It seems to me that #764 may have fixed this, I'm unsure reading the comments and really would just like to know; is this fixed ? And if so, what version of setuptools should I require in order to start using symlinks again ?

@jaraco
Copy link
Member

jaraco commented Mar 8, 2018

#764 leads me to Setuptools 28.5.0 release, so my guess is it's not fixed. But if it is, that'd be great.

I unfortunately won't have time to develop a solution, but I'll happily review a PR that addresses this issue.

@abadger
Copy link
Contributor

abadger commented Mar 8, 2018 via email

@jaraco jaraco reopened this Apr 10, 2018
@Conan-Kudo
Copy link

This actually broke the release tarballs for Pagure 5.0 as well, since we relied on identifying symlinks for some migration procedures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@abadger @Conan-Kudo @jaraco @gtristan and others