-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
importlib lock race issue in deadlock handling code #91351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We've seen tracebacks in production like: File "<frozen importlib._bootstrap>", line 1004, in _find_and_load(name='oe.gpg_sign', import_=<built-in function __import__>) and File "<frozen importlib._bootstrap>", line 1004, in _find_and_load(name='oe.path', import_=<built-in function __import__>) I've attached a reproduction script which shows that if an import XXX is in progress and waiting at the wrong point when an interrupt arrives (in this case a signal) and triggers it's own import YYY, _blocking_on[tid] in importlib/_bootstrap.py gets overwritten and lost, triggering the traceback we see above upon exit from the second import. I'm using a signal handler here as the interrupt, I don't know what our production source is as yet but this reproducer proves it is possible. |
This is a production backtrace after I inserted code to traceback if tid was already in _blocking_on. It is being triggered by a warning about an unclosed asyncio event loop and confirms my theory about nested imports, in the production case I'd guess being triggered by gc given the __del__. File "/home/pokybuild/yocto-worker/oe-selftest-fedora/build/meta/classes/base.bbclass", line 26, in oe_import |
I also encountered this case, triggered by the resource warning emitted by an unclosed
Here's the
This basically shows the same thing as the original reproducer attached above but with a patch instead of a monkey-patch and no sleep/signals. Seeing how these reproducers work, the cause seems pretty straightforward to me.
I haven't fully understood what this part of _bootstrap.py is trying to accomplish yet so I'm not going to try to suggest what the fix might be yet. |
@exarkun opened a PR against 3.9 which is security only -- I've closed though a PR against HEAD would be welcome. cc also @brettcannon as importlib expert. A |
If the failure is due to My guess is that wouldn't actually break anything as there isn't something to unlock, so it's really just stale caching at that point instead of a semantic failure. |
Hi @brettcannon . Did you have a chance to look at my PR? |
@exarkun which PR? The only one I see referenced in this issue was closed for being targeted at the wrong branch. Do you have a PR for |
Hi @brettcannon. I meant the PR that was closed for being targeted at the wrong branch, yes. I am happy to fix that process mistake if the code in the branch is useful. If it is not going to be accepted then I'd prefer not to sink more effort into it. I've looked at the recent history of importlib/_bootstrap.py and I don't think there are any changes that make a difference to this bug or to the changes in the PR so I think the current branch is quite representative of what the changes will be after they are retargeted at main. Thanks again for your attention. |
Quite so. The cases it is trying to handle are not simple and the implementation strategy taken is not simple and the problems that arise as a result are not simple. :( One thing I'll point out is that the commits in the PR should be sequentially reviewable, with each commit focusing on a different and (I hope) internally consistent part of the change.
Unfortunately I don't think so. It doesn't solve the problem with deadlock detection during a re-entrant import. That problem goes like this:
It also doesn't solve the single-threaded re-entrant deadlock. That problem goes like this:
In this case the outer import never even gets to cleaning up _blocking_on so there is no KeyError to handle. |
Thanks a lot @exarkun for tackling this. I would agree with using a RLock if it makes things simpler (RLock is written in C nowadays so should be usable from the import bootstrap). |
Thanks @pitrou. I do think RLock will help. If I adapt the implementation to that approach then should I also re-target it at main? |
Yes. Backports get cherry-picked from main afterwards. |
I submitted a new PR against main and with the RLock simplification - #94504. |
Hello. Could someone (@brettcannon @pitrou @ericsnowcurrently ?) take a look at #94504? Thanks. |
Yeah, I have it on my TODO list, but not high priority, sorry :-) |
It's in my review queue, but it's 12/15. |
…H-94504) Co-authored-by: Brett Cannon <brett@python.org>
…imports Summary: upstream issue: python/cpython#91351 upstream PR: python/cpython#94504 upstream merge commit: python/cpython@3325f05 symptom: ``` File "<frozen importlib._bootstrap>", line 1004, in _find_and_load(name='oe.gpg_sign', import_=<built-in function __import__>) File "<frozen importlib._bootstrap>", line 158, in _ModuleLockManager.__enter__() File "<frozen importlib._bootstrap>", line 110, in _ModuleLock.acquire() KeyError: 139622474778432 and File "<frozen importlib._bootstrap>", line 1004, in _find_and_load(name='oe.path', import_=<built-in function __import__>) File "<frozen importlib._bootstrap>", line 158, in _ModuleLockManager.__enter__() File "<frozen importlib._bootstrap>", line 110, in _ModuleLock.acquire() KeyError: 140438942700992 ``` Reviewed By: carljm Differential Revision: D53641441 fbshipit-source-id: e142eb17442da370861cd3a3398b0eef9930d041
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: