Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[asyncio][FreeBSD] _UnixWritePipeTransport.write() may raise BrokenPipeError on FreeBSD #109757

Open
sorcio opened this issue Sep 22, 2023 · 5 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes OS-freebsd stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@sorcio
Copy link
Contributor

sorcio commented Sep 22, 2023

Bug report

Bug description:

Earlier discussion: #109710 (comment)

This issue is specific to some systems with kqueue. I can reproduce on FreeBSD, and according to the internet it should be reproduced on older macOS versions (10.11) and on OpenBSD/NetBSD (with a different error code). A current version of macOS (13) is not affected.

Minimal snippet to explain the root issue:

import os, selectors
sel = selectors.DefaultSelector()
read_fd, write_fd = os.pipe()
# Close one end of the pipe
os.close(read_fd)
# Register the other end of the pipe
sel.register(write_fd, selectors.EVENT_WRITE)

Running this code on Linux or macOS will not raise any exception. The last call to sel.register() will return a list with an event, signaling that a write on the FD is possible. A subsequent write will, of course, raise a BrokenPipeError. On FreeBSD, sel.register() will raise directly.

asyncio does not account for this platform difference. This reproducer is a bit contrived because it needs to trigger a race condition1:

import asyncio
from itertools import count
import os
from threading import Thread


class MyPipeProtocol(asyncio.BaseProtocol):
    def __init__(self):
        self.is_connection_lost = False

    def connection_lost(self, exc):
        self.is_connection_lost = True


async def broken_pipe_repro():
    read_fd, write_fd = os.pipe()
    os.set_blocking(write_fd, False)
    write_file = open(write_fd, "wb")
    loop = asyncio.get_running_loop()
    transport, proto = await loop.connect_write_pipe(MyPipeProtocol, write_file)

    # Pass one end of the pipe to another thread that will eventually close it
    t = Thread(target=lambda: os.close(read_fd))
    t.start()

    try:
        # This line will never fail on Linux/macOS, but might on other BSDs:
        transport.write(b"ping" * 65536)
        await asyncio.sleep(0)
        assert proto.is_connection_lost
    finally:
        t.join()
        transport.close()


async def amain():
    for i in count():
        if i % 1000 == 0:
            print(i)
        await broken_pipe_repro()


if __name__ == "__main__":
    asyncio.run(amain())

On most platform this will run indefinitely. On FreeBSD, after a few run I get a traceback:

Traceback (most recent call last):
...
  File "asyncio-broken-pipe.py", line 28, in broken_pipe_repro
    transport.write(b"ping" * 65536)
  File "cpython/Lib/asyncio/unix_events.py", line 713, in write
    self._loop._add_writer(self._fileno, self._write_ready)
  File "cpython/Lib/asyncio/selector_events.py", line 317, in _add_writer
    self._selector.modify(fd, mask | selectors.EVENT_WRITE,
  File "cpython/Lib/selectors.py", line 265, in modify
    key = self.register(fileobj, events, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "cpython/Lib/selectors.py", line 508, in register
    self._selector.control([kev], 0, 0)
BrokenPipeError: [Errno 32] Broken pipe

The issue here is that transport.write() is not supposed to raise, as far as I understand.

The current implementation of _UnixWritePipeTransport.write() catches all exceptions on os.write() but not on self._loop._add_writer(). The same happens switching read and write ends. Probably the issue was not detected before because it's a rare condition and it doesn't happen on Linux. gh-109709 showed this occurring on a subprocess test case. The fix applied in that case works by wrapping the call to write() in an exception handler, but I think in general user code can't be expected to always catch that error.

I found prior discussion of the equivalent issue in Tokio. In their case, they decided to solve this at an abstraction level that is closer to KqueueSelector.register(), by ignoring the EPIPE and instead reporting the fd as readable/writable (which is what users expect in other selectors, and what happens in modern macOS). libevent also does something similar. I wonder if this could be a valid solution for Python, because the actual work is done in selectmodule which is lower level. By contrast, to the best of my understanding, libuv does not have special handling for this.

It can also be caught in asyncio (at some layer, either in selector event loops, or in code that calls add_reader/add_writer). Given that this is only seems to happen with pipes, it could make sense to handle this in pipe-specific code.

I can make a PR if this is accepted as a bug.

cc @vstinner

Edit note: my mistake, the error is not raised when registering the read end of a pipe, only the write end.

CPython versions tested on:

3.12, CPython main branch

Operating systems tested on:

Other

Footnotes

  1. the other end of the pipe needs to be closed after _UnixWritePipeTransport.write() calls os.write() but before it tries to register a writer.

@sorcio sorcio added the type-bug An unexpected behavior, bug, or error label Sep 22, 2023
@AA-Turner AA-Turner added OS-freebsd 3.12 bugs and security fixes 3.13 bugs and security fixes stdlib Python modules in the Lib dir labels Sep 22, 2023
@vstinner
Copy link
Member

I can make a PR if this is accepted as a bug.

I think that it makes sense to catch exceptions on _add_write(). I suppose that it would look like something like that:

            except Exception as exc:
                self._conn_lost += 1
                self._fatal_error(exc, 'Fatal write error on pipe transport')
                return

It looks like calling self._loop._remove_reader() in _close() is fine, since it should do nothing if _add_writer() failed.

@sorcio
Copy link
Contributor Author

sorcio commented Sep 25, 2023

Related discussion from ten years ago in bpo-19293 (search for "EPIPE") and bpo-19294. The decision at the time was to use a socket pair instead of a pipe for stdin because some platforms (AIX and Darwin primarily) did not behave as expected, and it was deemed as a platform bug that Python could not solve. This was partially reverted recently (bpo-46364) by switching back to using pipes instead of socket pairs for non-AIX platforms.

Darwin changed behavior (since macOS 10.15) and now they say that registering a kqueue event on a half-closed pipe should not be an error.

As shown in the reproducer, the error does not only happen with a subprocess, because a pipe created elsewhere can be attached to asyncio. Going back to using a socketpair for FreeBSD instead of a pipe (like it used to be until before Python 3.11) could be an easy way out for subprocesses, but does not solve the problem for pipes.

Knowing that Apple fixed this, I'm more convinced that this shouldn't be handled in higher-level code. I will take a moment to consider the impact of changing this in KqueueSelector. Otherwise I will prepare a fix for asyncio.

@sorcio
Copy link
Contributor Author

sorcio commented Sep 27, 2023

While looking into this I found out something that looks like a bug in KqueueSelector. When a fd is registered for both read and write, kqueue requires to register two filters; but KqueueSelector.select() will count the fd as only one (because it's one key) and call kqueue.control() with the smaller number. Therefore some events might not be returned, which could lead to worsened latency and possibly deadlock.

@vstinner
Copy link
Member

While looking into this I found out something that looks like a bug in KqueueSelector. When a fd is registered for both read and write, kqueue requires to register two filters; but KqueueSelector.select() will count the fd as only one (because it's one key) and call kqueue.control() with the smaller number. Therefore some events might not be returned, which could lead to worsened latency and possibly deadlock.

Oh wow, that sounds bad!

@sorcio
Copy link
Contributor Author

sorcio commented Sep 28, 2023

Open a separate issue and opened a PR for this #110038

@vstinner vstinner changed the title _UnixWritePipeTransport.write() may raise BrokenPipeError on FreeBSD [asyncio][FreeBSD] _UnixWritePipeTransport.write() may raise BrokenPipeError on FreeBSD Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes OS-freebsd stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

3 participants