Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsspec.asyn.sync shoud check the liveness of IO thread to avoid deadlocks #1723

Open
sugibuchi opened this issue Oct 13, 2024 · 2 comments
Open

Comments

@sugibuchi
Copy link

fsspec.asyn creates and runs an event loop used by async file system implementations as the default event loop.

However, this module does not explicitly close the event loop. As a result, when a Python interpreter enters the shutdown sequence, we experience a specific period during which the event loop is still marked as "running", but the IO thread running the loop has already stopped.

import fsspec.asyn

class Dummy:
    def __init__(self):
        self.loop = fsspec.asyn.get_loop()

    def __del__(self):
        print(fsspec.asyn.loop[0])
        print(fsspec.asyn.iothread[0])

dummy = None

if __name__ == "__main__":
    dummy = Dummy()
<_UnixSelectorEventLoop running=True closed=False debug=False> # Event loop is marked as "running"
<Thread(fsspecIO, stopped daemon 140283887175232)> # IO thread has stopped -> call of sync() with the loop will lead to a deadlock

This period is dangerous as it can lead to unexpected deadlocks, particularly if there are unclosed files that potentially trigger file system access when they are garbage-collected at the final moment of the interpreter shutdown, as I reported in #1685.

To mitigate this risk, we should add a liveness check of the IO thread in fsspec.asyn.sync() widely used by async file system implementations to run async functions synchronously.

def sync(loop, func, *args, timeout=None, **kwargs):

    if loop is None or loop.is_closed():
        raise RuntimeError("Loop is not running")
    if iothread[0] and loop == get_loop() and not iothread[0].is_alive():  # New
        raise RuntimeError("fsspec IO thread has already stopped.")
@sugibuchi
Copy link
Author

Two additional remarks regarding this change.

This mitigation is not perfect

This check is only applied when fsspec event loop is used with fsspec.asyn.sync(). It's important to note that the same check is not applied if a file system implementation uses the event loop directly. Example:

https://github.com/fsspec/adlfs/blob/2024.7.0/adlfs/spec.py#L2021

When we check the readiness of an event loop, we use is_running() of the loop. However, this does not check the liveness of the thread running an event loop. Moreover, knowing the thread running an event loop is usually impossible.

As fsspec event loop has a certain unhealthy period potentially leading to deadlocks, it would be better to encourage developers to use the fsspec event loop only in conjunction with fsspec.asyn.sync().

Errors produced by this check may be confusing

This check raises the exception only in a specific case (fsspec.asyn.sync() is called after the IO thread has stopped). This behaviour is generally better than before, as a program falls into a deadlock if we don't raise this exception in this case.

However, this error can confuse users as it depends on the garbage collection of unclosed files, the timing of which is difficult to predict and control. It might be better to add an explanation about this error in the documentation.

@martindurant
Copy link
Member

I might suggest wrapping the exceptions with if sys.is_finalizing(); yes, we want to avoid the deadlock, but we don't really want garbage tracebacks to the user.

We might want to avoid any sync calls during shutdown, but some connection pools really like to be shut down before exit, and produce various warnings of their own if not allowed to do so.

Another reasonable check that might be done in sync(), is that the process ID is still what we started with (which will not be true following fork). That's a separate topic, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants