-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ez_setup.py failing sporadically when setuptools is already installed #168
Comments
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): I've had this same issue. The issue can also occur outside of running ez_setup, but also when simply easy_installing setuptools or running setup.py from a tarball or zipball. I know I've had a conversation about this before, but I forget what the outcome is. I'll look around more later. |
Original comment by pje (Bitbucket: pje, GitHub: pje): The cause of the error is usually a corrupt/invalid zipimport cache entry. If you replace the zipfile without removing the corresponding entry from the cache, then you're going to get an error if the zipfile's contents are not in the same locations as they were in the old zipfile. Basically, any time you move, rename, delete, or overwrite a zipfile that could have previously been imported (or inspected via a zipimporter), you need to delete the corresponding entry in the zipimport cache. |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Have you looked at It seems to be a dictionary with zip file paths as keys.
zipimport module's docstring description mentions it:
and it seems to be implemented in CPython's Hope this helps. Best regards, |
Original comment by pje (Bitbucket: pje, GitHub: pje): That's one cache. The other is Just to make things even more fun, there is the fact that If the relevant zipimport entry is being removed and the importer cache is being cleared, that leaves stale |
Original comment by matthewbrett (Bitbucket: matthewbrett, GitHub: matthewbrett): As a datapoint, I have this failure reliably on OSX. It happening reliably on our automated builds starting on March 17 : http://nipy.bic.berkeley.edu/builders/numpy-bdist-whl-osx-2.7-downloaded I can cause it reliably on my laptop with current tip of setuptools with:
Setting the ez_setup.py version to 3.2 makes the error less likely, but I rerun ez_setup.py a few times, I get the error again. Running again with no other change most often works. |
Original comment by mtimmsj (Bitbucket: mtimmsj, GitHub: Unknown): I see this consistently if:
This always fails in pkg_resources.py at |
Original comment by tal_weiss (Bitbucket: tal_weiss, GitHub: Unknown): Why is this issue classified as Minor? Install needs to work perfectly, every time. It is part of our server deployment scripts. |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): When I ran into it and saw no one else reported it already, and that it must have been present for a long time now, I assumed it is not all that important... And on the other hand, I figured it is quite easy to manually check if and what version of setuptools is already installed so a manual workaround (uninstall by deleting a few files) is not all that difficult. Hmmm... I might try and take a look at this now. If anyone more knowledgeable about setuptools internals wants to join me - I'm available on Skype (username: jurkog). Best regards, |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Ok, so far I've managed to get it reproduced reliably in the following scenario:
Steps to reproduce:
Still working on it, but now at least I should be able to debug into the relevant zipimport C code... If anyone more experienced with setuptools/zipimport internals wants to chime in - I'd be more than happy to accept any useful tips/thoughts/ideas... 😄 Best regards, |
Original comment by pje (Bitbucket: pje, GitHub: pje): Does it make a difference what Python version is used? i.e. Do any older versions (e.g. 3.1, 3.2, 2.x) work correctly? As I mentioned earlier, there is code in |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Ok, just tested it and the described scenario fails consistently with:
With Python 3.1.3 setuptools'
|
Original comment by pje (Bitbucket: pje, GitHub: pje): I've reviewed the code and the tracebacks; it seems to me the next critical point is to determine whether If there are entries, then there is something wrong with the uncache functions or they aren't being called. If there aren't entries corresponding to the cache, then something else is wrong -- like maybe there's some other cache not being considered. (Which seems unlikely.) I also wonder what's the oldest version of setuptools itself that manifests this bug, since the uncaching code was added to fix this problem quite a long time ago. (Suggesting that this is a regression due to some other change in the setuptools code base, or perhaps a change in ez_setup.) |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Just did some bisecting over the old repo commits, and the problematic commit has been found on the
The commit in question was done by @pje with commit description:
All testing done using CPython 2.7.6. x64 on Windows 7 SP1 x64. The behaviour so far has been 100% consistent, so this should at least give us a starting point. I'll continue looking into this a bit later on when I get back home. Best regards, |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): @pje - I've been debugging the issue using pdb and so far I think your cleanup hook is getting called correctly. Although I am not yet sure what should really be stored in those caches. I'll some more debugging tonight and hopefully have some more exact results for you. Best regards, |
Original comment by pje (Bitbucket: pje, GitHub: pje): Based on that commit, it looks as though the problem was exposed because of the addition of a requirements file to setuptools' metadata. Before the commit, the file wouldn't have existed, so it wouldn't have called At this point, it seems to me that the
The value of egg_path must not have a corresponding entry in the zipimport cache at the time this executes, or it will see stale data (for the previous version of the zipfile), and get the bad local header error. I suspect the problem here is that somehow that entry isn't being removed. The init function for a zipimporter checks for a cache entry, and if found, uses it. Otherwise it opens the zipfile on disk, reads its directory, and caches it. So this is the critical moment for whether bad data will be seen later, in the code that actually gets the error, as far as I can tell. |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): I found time to do some more debugging and I can now confirm that the reason is in fact an old I can now reproduce the behaviour and inspect the buggy I inspected the The problem is somewhere inside setuptools. As I see it, the stale During the installation procedure the following occurs in order (none of them recursively): [1] [2] [3] The original egg file is removed. Output:
[4] [5] The new egg is copies into the old one's location. Output:
[6] [7]
[8]
[9]
Hope this helps. Best regards, P.S. I'd appreciate some help from someone more knowledgeable about the internal setuptools workings, e.g.:
|
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Here's some more info... The problem seems to come from the I'll work on tracking this down some more tomorrow... but I have a nagging feeling someone more familiar with the code-base would find all this trivial now that we know that this stale Hope this helps. Best regards, |
Original comment by pje (Bitbucket: pje, GitHub: pje): Yes, it's probably as simple as changing the code here to read:
In other words, add the new distribution to the working set used to resolve the requirements, so that it takes precedence over the stale distribution in self.local_index. In truth, that is actually just a workaround. Really, the problem here is that the earlier line here:
...is broken in the case where there was already a matching distribution, because adding a new distribution with the same version, location, etc. is ignored. Probably the correct fix is to do:
in order to force the existing copy (if any) to be removed. That should get rid of the stale distribution along with its stale metadata+loader... though it still won't fix the global pkg_resources.working_set entry if there is one. But it should be enough to fix the current bug, anyway. I'm a bit surprised this whole problem hasn't surfaced before with anything besides setuptools, but I suppose it can really only happen when it's an egg that's loaded by the build/install process, so it's probably limited to self-updates of build tools, and only ones that have requirements of their own defined. So, if this is fixed, it should also solve any similar problems encountered by tools that build on top of setuptools. |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Yup, I can confirm it. The stale I'll see how to implement this cleanly and hopefully have a patch for you soon. It seems better to me to 'clear all the old cached data' when the egg gets replaced (i.e. as has already been done for Best regards, |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): I have prepared some debugging code that uses the CPython garbage collector to detect and report remaining stale So far I've used this code to track down all remaining stale Basically they are all held in the
I have not yet tracked down exactly who uses them, what for, and whether that usage may end up using a stale Any hints/ideas on whether they should be cleared + when & how to do that best? Also, I have not found any problematic references in the 'master working set'. What does that entity represent anyway? It's global and may contain Hope this helps. Best regards, |
Original comment by mtimmsj (Bitbucket: mtimmsj, GitHub: Unknown): I quickly added:
The failure I was seeing when installing an egg with the -Z option over a previous egg that was installed without unzipping previously without an uninstall first went away. |
Original comment by pje (Bitbucket: pje, GitHub: pje): Actively removing them from the environments sounds like a good idea. The global working set is pkg_resources.working_set, and the stale egg would be in there if it was previously imported from, or had its metadata queried previously. Replacing a distribution in a working set is trickier than removing it from an Environment, though: you have to do something like:
However, this won't address the problem of what happens if this is an install run in a All of this is, in my opinion, optional at this point. The simple fix for now is just to cleanse the environment objects, which can be handled by making a function for removing a dist from an environment, and then calling it on the various environments, at the same point where the other uncaching is taking place. Until there's a set of steps to actually generate any problems with the working set and/or sandbox situation, the more advanced fixes are probably overkill. Heck, it might be a good idea to submit a bug report for zipimport, so that in the event of bad local data, it first checked to see if the stat info had changed and fix its internal cache automatically... oh wait... I just thought of something. A Way To Fix zipimport's Internal CachesSuppose that, when we remove the entries from the zip cache, we actually set them aside for a moment, then create a new zipimporter for the file. Then, we clear the old entry, and update it with the new entry. Because the cache entries are shared between zipimporters, updating the old entry would automatically fix every existing zipimporter for that zipfile. Something like:
(But done for each path in the cache that is the "same" path as path_to_zip, not just the one entry.) This approach should fix all the use cases, without needing to track down and replace individual distribution objects in every environment and working set, because the way the cache works, the dictionary representing a given zipfile is shared between all existing zipimporter objects. So rewriting the old entry should fix the old zipimporters. It would require a bit more elaborate change to the uncaching code, as it would need to first pull out any matching entries for the zipfile, saving them in a list, then it would need to create a zipimporter instance for the path, then pull out the new entry and update the old one(s). All in all, this is probably the "right" way to do it, as it should take care of all updating-an-egg-in-place scenarios properly. (But it's still not that terribly high of a priority, and I'd still be satisfied with the approach of, "take it out of the local_index and wait for actual bug reports for the rest".) |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): pull request #48 includes a quick-fix for this problem + improves the currently implemented zipimporter cache clearing a bit (catches some more stale zipimporter instances to clear). I hope to look into the suggested cleaner solution soon... Best regards, |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): For anyone interested - the debug_references branch in the jurko/setuptools repo contains some additional debugging code that makes setuptools display some more information on all the remaining stale zipimporter instances detected in the system. Hope this helps. Best regards, |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Be sure to do a double check with pypy\win32. Its implementation of _zip_directory_cache is subtly different and it has to do with paths. That and http://bugs.python.org/issue2953 were the primary reasons EggMetaData and ZipProvider does not use it for directory manifests anymore. Looking that this, it doesn't appear to be a python import, the get_data appears to bypassing the manifest and uses the zipimporter loader. Also, https://bitbucket.org/tarek/distribute/issue/27. Correction NullProvider._get() not get_data |
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Ugh, sorry about that previous post, I've corrected it. After refreshing my memory, pypy uses "/" internally for path separation and cpython contains the results of os.normpath. |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): Why sorry? It gave us additional background info on the problem, so I think we should be thankful. 😄 And.... it's all a mess... and all because zipimporters cache their zip archive directory information, do so as a performance optimization and provide no official way to clear this cache... yuch... |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):
If that's true, and that blocks this issue, I suggest:
I'll take a look at the pull request 48 later. Do I understand correctly that it implements pje's suggestion and perhaps a bit more? |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): As for the upstream ticket - I have a feeling no one is actively maintaining the zipimport.c module there. I've seen several related issue open already:
Yup. Commit 041d281 is the original minimalistic quick-fix that corrects the reproducible failing use-case we had in this issue. The same pull request also contains two other commits:
Each of them should be easy enough to review by itself, but if having them all in a single pull request presents a problem, I can easily refactor them into separate pull requests. Hope this helps. Best regards, |
Original comment by jurko (Bitbucket: jurko, GitHub: jurko): An updated zipimport module implementation would allow us to use
|
Original comment by philip_thiem (Bitbucket: philip_thiem, GitHub: Unknown): Yeah these issues are pretty old. Will need the workaround while 2.6 is supported (when fixed and backported to 2.7+ and 3.2+), if zip-importer cache must be fiddled with. On a side note, I've been working on memoizing build_zipmanifest from #154. I've had some success just cutting the loader out of the equation. It seems that EggMetaData is only used when the is not a directory, if I use the associated filename in the zipmanifest I was able to simply read from the zipfile. After fixing 154, should be able to use the zipinfo objects directly. However, if one was replacing a dependency in a chain, wouldn't one would still need to update the cache or would there be a process boundary? So I'm not sure that would be desirable in the first place (or if thirdparties would be using it) as a more unified attack approach. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): quick-fix #168: avoid using stale cached zipped egg dist info in easy_install When installing a zipped egg, into a Python environment with a same named zipped This can occur if setuptools loads the original egg for some reason and the two The mismatch between the two archives can be reproduced by installing the same The underlying reason for this problem is setuptools using zipimporter instances This patch cleans up only one such old zipimporter instance - one referenced via A clean solution needs to make certain there are no more zipimporter instances Further debugging & development note: A complete list of all the currently |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): Following merging of Pull Request 48, I've created #202 to follow up with the suggestions of a more robust cache invalidation. |
Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco): It was fixed in 3.5.2. See the history for other details about released versions. |
Originally reported by: jurko (Bitbucket: jurko, GitHub: jurko)
When you run
ez_setup.py
to install setuptools into a Python environment where setuptools are not already installed - everything works great. But if you do that, and then rerunez_setup.py
, you sometimes (ranging from occasionally to almost always, depending on the exact Python & setuptools version used) get an uglyZipImportError
about the setuptools zipped egg file having a bad local header.I encountered & tested this on Windows 7 SP1 x64 with different clean CPython installations and different setuptools versions.
Here are some of the collected test results:
Here's a Windows batch script I used to test this behaviour:
Here are some captured console outputs containing the actual displayed exception tracebacks.
CPython 2.6.6 x86 / setuptools 3.3
CPython 3.4 x86 / setuptools 2.0
CPython 3.4 x64 / setuptools 3.3
Hope this helps.
Best regards,
Jurko Gospodnetić
The text was updated successfully, but these errors were encountered: