Avoid deadlock caused by generational GC and object finalizers #720

jacksmith15 · 2022-10-07T16:26:04Z

This PR implements a solution to #712 as discussed here.

If a re-entrant logging call is performed (a particular thread emits a message whilst in the process of emitting another), then the log will not be emitted, ~~and instead handled by the error interceptor~~ and instead a RuntimeError will be raised. This differs from the current behaviour, which is that a deadlock would be reached.

If this occurs during a garbage collection (i.e. logs emitted in __del__ handlers), then the RuntimeError will not propagate and will be printed to stderr (see Python docs)

Resolves #712

jacksmith15 · 2022-10-07T16:28:34Z

loguru/_handler.py

@@ -175,7 +175,8 @@ def emit(self, record, level_id, from_decorator, is_raw, colored_message):
                    self._queue.put(str_record)
                else:
                    self._sink.write(str_record)
-
+        except HandlerLockNotReentrant:


I think this shouldn't be re-raised.

If we swallow it here, we don't emit the log, but the object finalizer can complete whatever else it should do. If we raise it, it will likely still be swallowed (as is the case with finalizers), and the finalizer stop.

So, I've been wondering about this, and I think we should not catch HandlerLockNotReentrant specifically. Instead, it can be handled by the except Exception: block bellow with conditional re-raising depending on catch argument.

It's true that it can cause __del__ implementation to fail if the error is re-raised (because it won't be able to complete), but that's the expected behavior. Using the logger with catch=False means it can fail for different reasons, either because the lock is already in use or because the custom sink contains a bug for example.

Although the __del__ documentation specifies raised exceptions are ignored by CPython, they do cause a warning to be displayed, so I don't think they will go unnoticed or be difficult to debug.

Finally, I think that this new safety measure will be triggered mostly when a user reuses the logger in a sink. In such case, the application should fail if the user specified catch=False.

jacksmith15 · 2022-10-07T16:31:52Z

loguru/_locks_machinery.py

 if not hasattr(os, "register_at_fork"):

    def create_logger_lock():
        return threading.Lock()

    def create_handler_lock():
-        return threading.Lock()
+        return _HandlerLockWrapper(threading.Lock())


The approach here changes the behaviour for the handler lock in general. So this applies not only to Handler.emit, but also to Handler.stop, and Handler.complete_async.

I'm not sure if that is correct? If it is, do we need additional handling in those methods?

If this behaviour should only apply for emit, I think the solution shouldn't touch this file at all, and I will move it to _handler.py.

Indeed, we should also protect the other methods using the lock, this is a very good idea!

However, I'm thinking it might be better to explicitly wrap the lock on use instead of on creation. I prefer to differentiate the context manager of the lock itself because in some cases it is not required. In particular, I'm thinking to the lock inside the _queued_writer thread which does not need to be protected.

I'm thinking of replacing with self._lock: with with self._protected_lock(): for example. There is certainly a risk if we forget to use it later, but I'll keep an eye on it and I think it should be fine.

Delgan · 2022-10-09T16:50:12Z

loguru/_handler.py

@@ -175,7 +175,8 @@ def emit(self, record, level_id, from_decorator, is_raw, colored_message):
                    self._queue.put(str_record)
                else:
                    self._sink.write(str_record)
-
+        except HandlerLockNotReentrant:


So, I've been wondering about this, and I think we should not catch HandlerLockNotReentrant specifically. Instead, it can be handled by the except Exception: block bellow with conditional re-raising depending on catch argument.

It's true that it can cause __del__ implementation to fail if the error is re-raised (because it won't be able to complete), but that's the expected behavior. Using the logger with catch=False means it can fail for different reasons, either because the lock is already in use or because the custom sink contains a bug for example.

Although the __del__ documentation specifies raised exceptions are ignored by CPython, they do cause a warning to be displayed, so I don't think they will go unnoticed or be difficult to debug.

Finally, I think that this new safety measure will be triggered mostly when a user reuses the logger in a sink. In such case, the application should fail if the user specified catch=False.

Delgan · 2022-10-09T16:50:33Z

loguru/_locks_machinery.py

 if not hasattr(os, "register_at_fork"):

    def create_logger_lock():
        return threading.Lock()

    def create_handler_lock():
-        return threading.Lock()
+        return _HandlerLockWrapper(threading.Lock())


Indeed, we should also protect the other methods using the lock, this is a very good idea!

However, I'm thinking it might be better to explicitly wrap the lock on use instead of on creation. I prefer to differentiate the context manager of the lock itself because in some cases it is not required. In particular, I'm thinking to the lock inside the _queued_writer thread which does not need to be protected.

I'm thinking of replacing with self._lock: with with self._protected_lock(): for example. There is certainly a risk if we forget to use it later, but I'll keep an eye on it and I think it should be fine.

Delgan · 2022-10-09T16:50:45Z

loguru/_locks_machinery.py

+
+    def __enter__(self):
+        if getattr(self._local, "acquired", False):
+            raise HandlerLockNotReentrant(


Given my other comment about not handling HandlerLockNotReentrant specifically, I think we can just raise a RuntimeError here.

Also, given that you suggested to protect all methods, I would make the message a bit more generic (because the error won't necessarily be raised when we "tried to emit a log"). For example:

raise RuntimeError( "Could not acquire internal lock because it was already in use (deadlock avoided). " "This likely happened because the logger was re-used inside a sink, a signal handler " "or a '__del__' method. This is not permitted because the logger and its handlers are " "not re-entrant." )

Delgan · 2022-10-09T16:50:48Z

loguru/_locks_machinery.py

+    pass
+
+
+class _HandlerLockWrapper:


Can we use @contextlib.contextmanager for simplification?

Yes we can, but only if we separate the Lock from this logic. E.g.

with _acquire_protected(self._lock): ...

This works given your suggestion to avoid this behaviour on the queued writer.

Another alternative is to keep the wrapper object, and have a contextlib interface like this:

with self._lock.acquire(raise_on_reenter=True): ...

Although acquire is a poor choice of method name given its meaning on threading.Lock, and I can't think of a better name.

Delgan · 2022-10-09T16:50:55Z

tests/test_locks.py

+        # This simulates assigning some memory as part of log handling. Eventually this will
+        # trigger the generational garbage collector to take over here.
+        # You could replace it with `json.dumps(message.record)` to get the same effect.
+        _ = [dict() for _ in range(20)]


I actually don't understand why do we need to trigger the garbage collector this way instead of calling gc.collect()?

Functionally that works!

I based this on the example on the issue, and I avoided explicitly calling gc.collect there to demonstrate how this could be triggered reliably with seemingly innocuous operations here.

In short, I think this demonstrates how easy it would be to trigger GC at this point, for readers who are not familiar with how the generational GC works.

Happy to switch to gc.collect if you'd prefer?

Given the unpredictable nature of the garbage collector, I would indeed prefer to call gc.collect() to guarantee the test behaves as expected regardless of gc configuration and possible change of the gc implementation in the future.

It's a good idea to give a concrete example, though. I would suggest to leave it as a comment and/or link to the related ticket you created.

Delgan

Hey @jacksmith15, thanks a lot for your reactivity and willingness to open a new PR!

I have left a few comments. If you don't have time to address them, I can do it myself, no problem, just let me know. 👍

Delgan · 2022-10-10T17:25:48Z

loguru/_handler.py

@@ -297,6 +317,7 @@ def _queued_writer(self):
    def __getstate__(self):
        state = self.__dict__.copy()
        state["_lock"] = None
+        state["_lock_acquired"] = None


Very good catch, I assumed threading.locals() was cleaned when cloned in a new process, but it's not the case actually.

It was well was covered in tests 💪

Delgan · 2022-10-10T17:32:27Z

loguru/_handler.py

+                "handlers are not re-entrant."
+            )
+        self._lock_acquired.acquired = True
+        try:


I would replace this block with:

self._lock_acquired.acquired = True with self._lock: yield self._lock_acquired.acquired = False

Is theer any reason to use try / finally?

I think so. If an exception is raised, then the lock would be released, but the selc._lock_acquired.acquired would still be True.

Try this fake example:

from contextlib import contextmanager @contextmanager def fake_lock(): try: print("lock acquired") yield finally: print("lock release") @contextmanager def fake_protected_lock(): print("lock state set to True") with fake_lock(): yield print("lock state set to False") with fake_protected_lock(): raise ValueError

My bad, I don't know why I assumed it was automatically handled, I'm probably too used to pytest fixtures. The try / finally is required as explained in the docs I shared myself... 🤦‍♂️

Thanks for the example. 👍

Delgan · 2022-10-10T17:54:14Z

tests/test_locks.py

+        # This simulates assigning some memory as part of log handling. Eventually this will
+        # trigger the generational garbage collector to take over here.
+        # You could replace it with `json.dumps(message.record)` to get the same effect.
+        _ = [dict() for _ in range(20)]


Given the unpredictable nature of the garbage collector, I would indeed prefer to call gc.collect() to guarantee the test behaves as expected regardless of gc configuration and possible change of the gc implementation in the future.

It's a good idea to give a concrete example, though. I would suggest to leave it as a comment and/or link to the related ticket you created.

Delgan · 2022-10-10T19:13:14Z

Everything looks fine, PR merged! 🎉

Thanks a lot for your time and your work. This is a problem which exists since the beginning of Loguru and to which I could not think of an acceptable solution. I'm sure this new explicit RuntimeError will be of tremendous help to a lot of users which faced tricky deadlocks.

jacksmith15 added 2 commits October 7, 2022 17:16

Avoid thread-local deadlock in log handler

b2eee33

Add test for deadlocks

5b1b4a5

jacksmith15 commented Oct 7, 2022

View reviewed changes

jacksmith15 mentioned this pull request Oct 7, 2022

Deadlock when objects emit messages at garbage collection #712

Closed

Please the linter

17e04aa

Delgan reviewed Oct 9, 2022

View reviewed changes

jacksmith15 added 3 commits October 10, 2022 09:37

Move lock protection to Handler

ecc127e

Fix pickling

bbd114e

Please the linter

7e8ea06

Delgan reviewed Oct 10, 2022

View reviewed changes

Trigger GC explicitly in test

242190d

Delgan merged commit ea483a4 into Delgan:master Oct 10, 2022

jacksmith15 deleted the bug/712/handler-error-instead-of-deadlock branch October 11, 2022 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid deadlock caused by generational GC and object finalizers #720

Avoid deadlock caused by generational GC and object finalizers #720

jacksmith15 commented Oct 7, 2022 •

edited

Loading

jacksmith15 Oct 7, 2022

Delgan Oct 9, 2022

jacksmith15 Oct 10, 2022

jacksmith15 Oct 7, 2022

Delgan Oct 9, 2022 •

edited

Loading

jacksmith15 Oct 10, 2022

Delgan Oct 9, 2022

Delgan Oct 9, 2022 •

edited

Loading

Delgan Oct 9, 2022

jacksmith15 Oct 10, 2022

Delgan Oct 9, 2022

jacksmith15 Oct 10, 2022 •

edited

Loading

Delgan Oct 9, 2022 •

edited

Loading

jacksmith15 Oct 10, 2022 •

edited

Loading

Delgan Oct 10, 2022

Delgan left a comment

Delgan Oct 10, 2022 •

edited

Loading

jacksmith15 Oct 10, 2022

Delgan Oct 10, 2022

jacksmith15 Oct 10, 2022

Delgan Oct 10, 2022

Delgan Oct 10, 2022

Delgan commented Oct 10, 2022

Avoid deadlock caused by generational GC and object finalizers #720

Avoid deadlock caused by generational GC and object finalizers #720

Conversation

jacksmith15 commented Oct 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Delgan Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Delgan Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacksmith15 Oct 10, 2022 • edited Loading

Choose a reason for hiding this comment

Delgan Oct 9, 2022 • edited Loading

Choose a reason for hiding this comment

jacksmith15 Oct 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Delgan left a comment

Choose a reason for hiding this comment

Delgan Oct 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Delgan commented Oct 10, 2022

jacksmith15 commented Oct 7, 2022 •

edited

Loading

Delgan Oct 9, 2022 •

edited

Loading

Delgan Oct 9, 2022 •

edited

Loading

jacksmith15 Oct 10, 2022 •

edited

Loading

Delgan Oct 9, 2022 •

edited

Loading

jacksmith15 Oct 10, 2022 •

edited

Loading

Delgan Oct 10, 2022 •

edited

Loading