-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
importlib: _find_and_load() race condition on sys.modules[name] check #75074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
1:36:32 [224/405/1] test_import failed -- running: test_mmap (2236 sec), test_io (2529 sec), test_venv (1437 sec)
beginning 6 repetitions
123456
.....test test_import failed -- Traceback (most recent call last):
File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\__init__.py", line 377, in test_concurrency
raise exc
File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\__init__.py", line 362, in run
import package
File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\data\package\__init__.py", line 2, in <module>
package.submodule
AttributeError: module 'package' has no attribute 'submodule' |
I was able to *easily* (100% failure rate) reproduce the failure with Python 3.6 and 3.7 (master) on Windows using this command: ./python -m test -R 3:100 -m test_concurrency test_import |
I don't understand if a test_concurrency() failure is a bug in importlib, or a bug in the test :-) |
Note: "./python -m test -R 3:100 -m test_concurrency test_import" is just a convenient way to run test_concurrency() "in a loop", in the same process, 100 times. I'm unable to reproduce the bug on Linux, even when I write 3 into /proc/sys/vm/drop_caches while the test is running to drop filesystem caches. I also tried to use a temporary directory in my home (TMPDIR=$HOME/tmp) to use temporary files in ext4 rather than tmpfs. But I don't know if importlib uses temporary files or not :-) |
Oh, the unit test was added 4 days ago: commit b4baace
|
Indeed, Serhiy's been stress-testing some bits of the import system, and finding a few holes made possible by the shift away from a single global import lock to per-module locks. (bpo-30876 is another example of that) Nothing immediately comes to mind as a platform-dependent code path that could explain why that error is showing up on Windows and not on Linux, though. |
Some parts of import machinery are platform depended (see bpo-30873). I don't have idea what is wrong on Windows. I'm puzzled. test_concurrency itself runs a test in a loop. The only difference between one and several consequent runs of test_concurrency is running setUp() and tearDown() between tests. Victor, try to add importlib.invalidate_caches() inside a loop. Does it cause a failure in a single test_concurrency? |
I added a "importlib.invalidate_caches()" call in the "for i in range(10):" loop. With this change, a single run doesn't fail. It doesn't fix the test neither: with -R 3:100, the command still fails.
I don't think that this bpo-30873 or bpo-30626 is platform dependent.
IMHO it's just that my Windows VM is slow. It's a race condition. If I run "./python -m test -m test_concurrency test_import test_import": the command pass, but when again, it fails. It's pure random :-) |
I added print(id(package)) in Lib/test/test_import/data/package/init.py: when the bug occurs on my Windows VM, I see two print with two different identifiers. Moreover, when the bug occurs, first id(sys.modules['package']) is the first identifier, and then package is replaced with the new 'package' package in sys.modules (new identifier). I don't understand something: sys.modules isn't supposed to be protected by a lock? I checked if _installed_safely.__enter__() if the _imp is held: it isn't held when _bootstrap._setup() is called, nor when _imp._install_external_importers() is called. But even if I patch these two functions to acquire the lock, Python startup fails on importing a codec module, because the _imp lock is not held in _installed_safely.__enter__(). So maybe sys.modules dict is supposed to be protected by a different lock, maybe indirectly? I hack _bootstrap._find_and_load_unlocked() to sleep 10 ms if name == 'package'. With this change, test_concurrency() fails immediatly on my Linux box.
By the way, importlib_sleep.patch changes test_concurrency() to run the test exactly once (it changes to loop to a single iteration, instead of 10). Is it normal that a sleep makes import failing? Imports are not supposed to be atomic? -- I tried to similar hack before the commit b4baace:
While I'm not sure of my hack, it *seems* like sys.modules was somehow atomic before this commit? |
No, there's no global lock on the import system these days. Instead, imports are supposed to be atomic on a per-module basis, but bpo-30814 showed that some of the accesses to the shared state management (which *is* still supposed to be protected by the import lock) could get into trouble given concurrent imports from different threads. However, the new test exercises an interesting sequence of per-module lock acquisition, since each thread effectively runs through this sequence:
That moves back-and-forth between C & Python in a few different places, and hence may have been relying on the GIL to keep the parts executed in C atomic. So now I'm wondering whether or not this new problem might be due to moving the _imp.acquire_lock() calls into Python code as part of bpo-30814, as that creates a window near the start of _find_and_load() where the thread may end up holding neither the GIL *nor* the import lock. By contrast, when the import lock was acquired in C, then by the time the Python code starts running (potentially releasing the GIL), the import lock has already been acquired. |
Oops, please ignore this comment, it was a mistake. After clearing my repository (removing __pycache__ directories), the test failed. I forgot to copy the Lib/test/test_import/data/package/ directory. |
Using attached importlib_sleep.patch, you can *see* the race condition with: diff --git a/Lib/test/test_import/data/package/__init__.py b/Lib/test/test_import/data/package/__init__.py
index a4f2bc3..f07eb11 100644
--- a/Lib/test/test_import/data/package/__init__.py
+++ b/Lib/test/test_import/data/package/__init__.py
@@ -1,2 +1,3 @@
import package.submodule
+print("package: id=%#x" % id(package))
package.submodule Example of output: Lib/test/test_import/data/package/init.py is run twice instead of being run once. A lock is missing somewhere on the package module. |
Victor tested test_concurrency on 3.5, and it is failed on slow Windows VM. Therefore the bug exists on 3.5 and is not related to recent changes. |
I'm about to head to bed so I'll get back to this discussion tomorrow, but in the meantime: Victor, maybe it would make sense to reformulate your reproducer as a monkeypatch for _find_and_load_unlocked() that injects the 10 ms sleep from the test case if it finds _frozen_importlib in sys.modules? That is, temporarily replacing it with something like: orig_find_and_load_unlocked = _frozen_importlib._find_and_load_unlocked
def _slow_find_and_load_unlocked(name, import_):
time.sleep(0.010)
return orig_find_and_load_unlocked(name, import_) That way, we'd hopefully be able to reproduce the failure independently of the slower buildbot. |
Backporting the fix ot Python 3.5 doesn't seem trivial, 3.5 lacks many changes. |
With Python 3.5.4rc1 released (3.5.4 will be the last 3.5 release accepting bug fixes), I'm not sure that it's still worth it to try to backport such changes to Python 3.5. The importlib is very important, any regression can be very important. I'm not sure that backporting the unit test is interesting. If the test fails, we will be unable to fix it. |
Victor:
Serhiy closed his PR: So now I close this issue. |
If you're looking at this issue because it still happens to you, you are probably using zipimport. Stop using zipimport. |
Łukasz Langa: "If you're looking at this issue because it still happens to you, you are probably using zipimport. Stop using zipimport." Hum, do you mean that zipimport also has an issue? Maybe open a new issue? |
Yeah, what's wrong with zipimport? |
There's a bunch of things wrong with zipimport, just ask Thomas. ;) |
In the context of this issue: zipimport doesn't properly lock on imports happening in multiple threads at the same time. This causes one thread to receive a partially imported module while the other thread is still performing the import. zipimport also doesn't properly check __pycache__ directories but *does* look for old-style .pyc files next to the corresponding .py file. IIRC Brett chooses not to deal with zipimport's warts and has no time to rewrite it in Python. Since the rest of the core team is less qualified, they steer clear, too. So, don't use zipimport. |
That's rather disturbing, and none of that is reflected in the zipimport docs. Is there at least a bpo issue about it? I may be the last person in the world to still recommend putting a zipfile on sys.path -- I used to believe it's faster than directory searches, is that no longer the case? |
I recommend and use it without trouble, but then I rarely defer imports to worker threads. Barry may have some ideas, but I think this really shouldn't be difficult to do. importlib is pretty extensible, and it's not difficult to read from a zip file. |
The other problem is zipimport is written in C and no one wants to deal with that headache if possible. ;) We actually tweaked the importlib.resources API specifically to avoid having to muck with zipimport's C code as it would have required major surgery otherwise. Thomas has also been saying he was planning to upstream the improvements he has made to zipimport internally at Google which also demotivates touching the code knowing that changes are planned. As for Steve's comment, the tricky bit is zipfile has a bunch of dependencies, so it's not so simple to just freeze that one module and wrap it with importlib to be able to load the stdlib from a zip file (which zipimport supports). Basically zip-reading code needs to be written that can be bootstrapped (C or Python), and then use that in importlib to be able to import the stdlib from a zip file. I have also been toying with the idea of a zippath module that would take pathlib's API and apply it to zip files acting like a self-contained file system, but I have never bothered as zipfile's API is messy. The "all the time in the world" project would be making all of this come together in a release of Python by building off of each other: new code to read zip files written in freezable Python, use importlib to replace zipimport, and provide a pathlib API to work with zip files (which would probably be partially built for use by importlib). If I drop my idea of putting a package on PyPI to do lazy importing I could look at tackling this for 3.8. |
Take a look at https://bugs.python.org/issue25711 which wants to rewrite zipimport from scratch. I took Serhiy's C->Python patch and turned it into a PR, but it would need a lot of love to get it into shape. (Let's not even talk about https://bugs.python.org/issue32866 which caused me some pain during importlib.resources work.) Pretty sure Thomas's rumored rewrite is still in C, and it solves problems like support for zip64 and comments, which we've run into also. zipimport is crucial for single-file executable solutions like pex and our own soon-to-be-open-sourced tool, so it really can't go away. Both of those still unpack wheels/eggs/sdists to a file system cache, not so much for speed but because shared libraries generally can't be imported from anything but a physical path (thanks to dlopen()). I think Google has a patch for glibc that would solve that but there's some hesitation from the glibc maintainers to merge it (and then what about other platforms?). |
OK, this sounds much better: zipimport has problems and we should fix them, |
And mostly for completeness, I know Thomas has maligned the fact that zipimport has tests for some crazy things in its semantics which made it difficult for him to improve the module. |
If there are overspecified tests we can debate them of course. On Wed, Apr 4, 2018, 11:55 Brett Cannon <report@bugs.python.org> wrote:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: