-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not scan whole file tree when making MANIFEST #764
Conversation
7de66bb
to
5c655a8
Compare
These tests seem to be failing on CI for unrelated reasons. The tests fail even on |
Oh, |
5c655a8
to
d568f4a
Compare
OK, test pass again. I've rebased this on top of #765, which excludes |
This patch is great, and fix my issue with following symlinks when building the manifest. I however get a test failure on my system: self = <setuptools.tests.test_manifest.TestFileListTest object at 0x2b1c7b96d850>
def test_include_pattern(self):
# return False if no match
file_list = FileList()
self.make_files([])
assert not file_list.include_pattern('*.py')
# return True if files match
file_list = FileList()
self.make_files(['a.py', 'b.txt'])
assert file_list.include_pattern('*.py')
# test * matches all files
file_list = FileList()
self.make_files(['a.py', 'b.txt'])
file_list.include_pattern('*')
> assert file_list.files == ['a.py', 'b.txt']
E assert ['b.txt', 'a.py'] == ['a.py', 'b.txt']
E At index 0 diff: 'b.txt' != 'a.py'
E Use -v to get the full diff This is both with Python 2.7.12 and Python 3.5.2 on Arch Linux. |
Thanks for checking this. The order of files in the file list appears to depend on the underlying file system. My system reported the files in the order they were created, but apparently yours does not. I've wrapped the file list in a There is already a separate test that checks that |
The code for this patch is complete, barring any feedback. Is there anything more I can do to help this patch along? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. I would simply accept it except for the questions I raise above. Please have a look. Thanks.
return found | ||
|
||
def include(self, pattern): | ||
"""Include files that match 'pattera'n.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks like it has a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -239,7 +322,151 @@ def check_broken_egg_info(self): | |||
|
|||
|
|||
class FileList(_FileList): | |||
"""File list that accepts only existing, platform-independent paths""" | |||
# Implementations of the various MANIFEST.in commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this class was a subclass of the implementation in distutils. It's not clear from this diff if there's still value in subclassing _FileList
. Is this a complete re-implementation or does it also re-use some of the upstream implementation? Do you expect that these changes could be merged upstream at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it for backwards compatibility. Packages or plugins may depend on calling FileList.include_pattern
, the old way of adding packages, and I did not want to break these.
self._add_egg_info(cmd=ei_cmd) | ||
self.filelist.include_pattern("*", prefix=ei_cmd.egg_info) | ||
|
||
def _add_egg_info(self, cmd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, cmd.egg_base
was processed, but now it appears not to be. What new behavior obviates the need for this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behaviour is still required, but is now as simple as self.filelist.graft(ei_cmd.egg_info)
, assuming I understand what is going in _add_egg_info()
. I've replaced that whole function with that one-liner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Great!
""" | ||
Filename globbing utility. Mostly a copy of `glob` from Python 3.5. | ||
|
||
Changes include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file adds a big amount of technical debt to the project. Although I believe it will work, it's a fork of the functionality in Python 3.5 without a merge strategy. Moreover, it has already diverged from the upstream implementation. Thanks, by the way for describing the divergence (I worry other contributors might not be so diligent).
But I wonder - if you could rely on Python 3.5, would the implementation have been different, or would you have had to override all of this behavior not to ignore hidden files? For example, if you look at the recent setuptools.sdist.py36compat.py
, you'll see functionality that's a backport of code in Python 3.7, which can be removed when Python 3.7 is the minimum supported version. Could you do something similar here, where the forward-compatibility implementation and the divergent behavior are maintained separately (even in separate files)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not be possible to ship the glob library from Py3.5 without any modifications, as yield from
is not valid syntax in older Pythons, so at least that needs to change regardless.
To make the library compatible with Py2, I also had to use six
for dealing with strings/bytes. These changes require modifications to internal function code, so are not easily patched from "outside". I notice that I did not mention this in the list of changes, I will add it now.
The same problem exists with including hidden files - is_hidden
is called from a number of functions, so these functions need to be changed to not call it. Alternatively, is_hidden
could be changed to always return False
, which would have the same effect.
Finally, all the functions in the glob library call each other directly. There is no good way of doing dependency injection or something similar to override the specific functions that need patching.
I completely agree with you that this adds some undesirable technical debt. If you can think of any good solutions to the problems above I am very happy to implement them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also worth noting that a diff -U5 /usr/lib/python3.5/glob.py setuptools/glob.py
shows a very nice summary of the changes.
d31f12e
to
44630f1
Compare
44630f1
to
bb45468
Compare
Thanks for the merge! |
After pypa#764, `global-exclude .pyc` no longer excluded `.pyc` files. This fixes that regression, and adds a test for this behaviour.
@timheap thanks for taking this on and succeeding where I ran out of steam! and thanks @jaraco for the merge. This has made a world of difference to my use case - tox-dev/tox#293 (comment) |
Fixed in setuptools (quite some time ago) in pypa/setuptools#764
When building a MANIFEST from a MANIFEST.in, setuptools previously
scanned the whole directory tree in to a list, and then picked matching
files based on MANIFEST.in commands from this list.
Now, files are found using the
glob
library from Python 3.5. This onlyexplores directories that need to be scanned, resulting in a large speed
improvement for projects with large file trees. A modified
glob
module has been included. It has been changed to support back to
Python 2.6, and to include
.hidden
files in its matches. The previousfunctionality included
.hidden
files in its glob matches. It isunclear if this behaviour is desired and required, or accidental and not
required, but for strict backwards-compatibility, this behaviour is
kept.
Each command in the MANIFEST.in is now represented by its own function
on the FileList (
include
,exclude
,graft
, etc.) to allow for anefficient implementation. The previous commands
FileList.include_pattern
andFileList.exclude_pattern
stillexist for backwards compatibility, but these use the slow 'scan all
files' method, so are discouraged.
global_include
by its nature mustscan all directories in the project to work, so this does not receive
any speed improvements.
The changes will speed up creating packages for the vast majority of
users. There are a few unusual corner cases, such as multiple
graft
commands operating on the same set of directories, that will be slower.
These can be solved by consolidating the overlapping
graft
commands into one command.
Note that this is a competing PR to #610. The approach in #610 made it hard to correctly implement
graft
.