Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-101766: Fix refleak for _BlockingOnManager resources #101942

Merged
merged 2 commits into from
Feb 17, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Lib/importlib/_bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,11 @@ def __enter__(self):
def __exit__(self, *args, **kwargs):
"""Remove self.lock from this thread's _blocking_on list."""
self.blocked_on.remove(self.lock)
if len(self.blocked_on) == 0:
# gh-101766: glboal cache should be cleaned-up
# if there is no more _blocking_on for this thread.
del _blocking_on[self.thread_id]
corona10 marked this conversation as resolved.
Show resolved Hide resolved
del self.blocked_on
Comment on lines +88 to +92
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corona10 @gvanrossum @sobolevn

This change re-introduces the re-entrancy bug that _BlockingOnManager was introduced to fix.

Consider: A thread has executed __enter__ and begins executing __exit__. It executes line 87 and the length of self.blocked_on becomes 0. The check on the new line 88 passes and the thread enters the block starting with the new line 89. Call this "point x". Now the garbage collector is triggered and an object with a __del__ is collected. The __del__ runs and imports a module (as is very common now that there are ResourceWarnings). As part of satisfying the import, the thread runs __enter__. It executes line 82 and receives the list that is already in _blocking_on (it uses the same key because it is the same thread - this is a single-threaded case). It executes line 83 to add its lock to the list. The import eventually finishes and it executes __exit__. It removes its lock from the list (line 87) and removes the list from _blocking_on (new line 91). Garbage collection proceeds and we get back to the original import being resolved. Execution returns to "point x". At point x we have already decided there are no locks in self.blocked_on and we're going to clean up the global dictionary. New line 91 runs and finds no item at key self.thread_id. This raises a KeyError which goes unhandled all the way up to some application code that was doing import somemodule.

I recommend reverting this change until a re-entrant safe version of the fix can be determined. The resource leak is not great but if I understand correctly, it is also more or less academic right now (it causes a very small amount of memory to be wasted while the test suite is running). The re-entrancy bug is real and breaks many real applications in extremely confusing ways (consider that the original bug reports were open for years before anyone tracked down the problem, at great expense, and it took more than half a year to get the fix reviewed and merged).

Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exarkun cc @gvanrossum @sobolevn
Okay, I understand your intention.
I will try to submit the patch if the resource can be cleaned up at the test suite level including reverting the current change..

And also would you like to submit the test code to prevent regression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to submit the patch if the resource can be cleaned up at the test suite level including reverting the current change..

Thanks you very much.

And also would you like to submit the test code to prevent regression?

That would be ideal, certainly. There is a reproducer on #94504 but it is CPU intensive and non-deterministic without additional instrumentation inside _bootstrap.py itself. It's not clear to me exactly how to turn this into a regression test that can be included in the automated suite. I'd be happy to discuss it further though.



class _DeadlockError(RuntimeError):
Expand Down