Skip to content

asyncio.ProactorEventLoop mishandles signal wakeup file descriptor #87079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hidmic mannequin opened this issue Jan 12, 2021 · 30 comments
Open

asyncio.ProactorEventLoop mishandles signal wakeup file descriptor #87079

hidmic mannequin opened this issue Jan 12, 2021 · 30 comments
Labels
OS-windows topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@hidmic
Copy link
Mannequin

hidmic mannequin commented Jan 12, 2021

BPO 42913
Nosy @gvanrossum, @pfmoore, @tjguk, @asvetlov, @zware, @1st1, @zooba, @hidmic

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-01-12.21:43:53.586>
labels = ['type-bug', '3.8', 'OS-windows', 'expert-asyncio']
title = 'asyncio.ProactorEventLoop mishandles signal wakeup file descriptor'
updated_at = <Date 2021-07-29.05:27:36.588>
user = 'https://github.com/hidmic'

bugs.python.org fields:

activity = <Date 2021-07-29.05:27:36.588>
actor = 'gvanrossum'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Windows', 'asyncio']
creation = <Date 2021-01-12.21:43:53.586>
creator = 'hidmic'
dependencies = []
files = []
hgrepos = []
issue_num = 42913
keywords = []
message_count = 6.0
messages = ['384979', '384998', '385198', '385211', '398445', '398457']
nosy_count = 8.0
nosy_names = ['gvanrossum', 'paul.moore', 'tim.golden', 'asvetlov', 'zach.ware', 'yselivanov', 'steve.dower', 'hidmic']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue42913'
versions = ['Python 3.8']

@hidmic
Copy link
Mannequin Author

hidmic mannequin commented Jan 12, 2021

asyncio.ProactorEventLoop uses a socket.socketpair and signal.set_wakeup_fd to wake up a loop that's polling I/O. However it does so with no consideration for file descriptors previously set (i.e. no signal number forwarding). Either by user code or by another instance of asyncio.ProactorEventLoop.

The following snippet is enough for the above to cause the loop to hang forever:

import asyncio
import gc

asyncio.set_event_loop(asyncio.ProactorEventLoop())
asyncio.set_event_loop(asyncio.ProactorEventLoop())
gc.collect()
asyncio.get_event_loop().run_forever()

The first asyncio.ProactorEventLoop instance sets a signal wakeup file descriptor on construction (see

signal.set_wakeup_fd(self._csock.fileno())
). The second instances does the same, dropping the file descriptor set by the first one (not good, not necessarily bad). When the garbage collector purges the first instance, signal wakeups are disabled completely (see
signal.set_wakeup_fd(-1)
). The loop cannot be interrupted with Ctrl+C anymore (bad).

@hidmic hidmic mannequin added 3.8 (EOL) end of life OS-windows topic-asyncio type-bug An unexpected behavior, bug, or error labels Jan 12, 2021
@gvanrossum
Copy link
Member

Looks this was not tested or thought through with multiple loops and signals (admittedly, using signals is never fun, and even less so on Windows).

Can you send a PR with a fix?

@hidmic
Copy link
Mannequin Author

hidmic mannequin commented Jan 18, 2021

Sorry for taking so long to reply.

Sure, I would be happy to contribute. We should probably take care of Unix event loops -- since I opened this ticket I found out those tend to not cooperate with other signal wakeup file descriptor users either. I am a bit short on time right now though, so it will take some time.

@gvanrossum
Copy link
Member

No pressure. If there's an API change needed you will have until 3.10 beta 1. Without API changes this can be fixed any time.

@hidmic
Copy link
Mannequin Author

hidmic mannequin commented Jul 28, 2021

Circling back. I've been giving some thought to this. While this could be fixed within asyncio.proactor_events.ProactorEventLoop and asyncio.unix_events._UnixSelectorEventLoop implementations (e.g. calling signal.set_wakeup_fd once and forwarding signal numbers to subsequent instances through file descriptors or direct reference), I can't help thinking I've encountered this issue of contending signal API calls too many times already. Rarely two call sites cooperate with each other.

So... how do people feel about changing the signal API module to enable multiple file descriptors and multiple signal handlers per signal? It'd be much simpler to use, and not only in asyncio. Implementation-wise, a lock-free singly linked list could be used to store them (may need to expose compare-and-swap atomic primitives in pycore_atomic.h).

@gvanrossum
Copy link
Member

Have you investigated how that could be implemented and what the new API would look like? Personally I don't really like signals (they're not portable and code using them is super hard to get right) and I hesitate to add to their API. OTOH maybe you'll be able to find a champion for such a change among other core devs. I doubt that anyone is going to put any time in this based on a few messages from you though -- you must do the work.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@kumaraditya303
Copy link
Contributor

What is the use case of creating two loops and then deleting them? When asyncio is used on windows, signals cannot be used because windows does not supports them. asyncio uses self-pipe to wake event loop from blocking calls of select. The reproducer makes no sense in real code as usually you don't have instantiate loop object.

@kumaraditya303 kumaraditya303 added pending The issue will be closed if no feedback is provided and removed 3.8 (EOL) end of life labels Jul 9, 2022
@gvanrossum
Copy link
Member

IIUC the Windows libc emulation does support some limited signal operations (else how would the code work at all?). Presumably the demo program is meant to represent a more realistic scenario where this bug was discovered by the OP.

That said, I don't recall any of the details so maybe someone else can help decide how this could be done better (maybe @zooba?).

@kumaraditya303
Copy link
Contributor

Presumably the demo program is meant to represent a more realistic scenario where this bug was discovered by the OP.

The bug can only happen if you manually install a wakeup fd or you are creating two loops on the same thread both of which are unsupported by asyncio.

@gvanrossum
Copy link
Member

From the wording of the initial comment I think the OP had a manually installed wakeup fd that was getting clobbered by asyncio. I think it's reasonable to see if there's a way asyncio can somehow support this scenario better.

@kumaraditya303
Copy link
Contributor

I think it's reasonable to see if there's a way asyncio can somehow support this scenario better.

The only way I can think of to support this would be to first allow multiple signal wakeup handlers as currently only 1 can be added. I expect the change to be non trivial.

@gvanrossum
Copy link
Member

Is there a way to ask whether a fd has a signal handler? Is an asyncio loop without a wakeup fd totally crippled, or merely slightly less performant? I am wondering if an async loop could detect whether a signal wakeup fd is already in use, and then refrain from setting up another one. (And I am reluctant to read the corresponding source code. :-) I don't think changing the signal module to support multiple handlers is the way to go, so we agree there.

@kumaraditya303
Copy link
Contributor

Is there a way to ask whether a fd has a signal handler?

Not that I am aware of.

Is an asyncio loop without a wakeup fd totally crippled, or merely slightly less performant?

It doesn't matter for performance but only for signal handling which on Linux include subprocesses which use SIGCHILD, so subprocess would be broken.

I am wondering if an async loop could detect whether a signal wakeup fd is already in use, and then refrain from setting up another one.

It would break subprocesses on Linux using child watcher which use SIGCHILD, and there is no API to get the wakeup fd, it can only be set or removed with -1.

@gvanrossum
Copy link
Member

So maybe the minimal solution would be to add something to signal to ask for the wakeup fd (if the OS allows it?) and then if a wakeup fd is already set, refrain from clobbering it, and instead disable the subprocess functionality of the event loop. Note that some folks are debating to get rid of the current child watcher machinery (#82772).

@kumaraditya303
Copy link
Contributor

So maybe the minimal solution would be to add something to signal to ask for the wakeup fd (if the OS allows it?) and then if a wakeup fd is already set, refrain from clobbering it,...

I think there is some confusion here, wakeup fd is not dependent on the OS, it is not some kernel space thing but rather a file descriptor to which ceval writes the signal number to wakeup from blocking select or other blocking calls on Python side.

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Jul 12, 2022

if a wakeup fd is already set, refrain from clobbering it, and instead disable the subprocess functionality of the event loop

That seems rather complicated for little gain, even if the entire child watcher machinery ought to be removed still SIGCHILD signal would be used for subprocesses, not every platform and kernel version supports pidfd.

@gvanrossum
Copy link
Member

So maybe the minimal solution would be to add something to signal to ask for the wakeup fd (if the OS allows it?) and then if a wakeup fd is already set, refrain from clobbering it,...

I think there is some confusion here, wakeup fd is not dependent on the OS, it is not some kernel space thing but rather a file descriptor to which ceval writes the signal number to wakeup from blocking select or other blocking calls on Python side.

You got me there. But nevertheless this means we could easily add a getter API so asyncio can avoid clobbering it.

@gvanrossum
Copy link
Member

if a wakeup fd is already set, refrain from clobbering it, and instead disable the subprocess functionality of the event loop

That seems rather complicated for little gain, even if the entire child watcher machinery ought to be removed still SIGCHILD signal would be used for subprocesses, not every platform and kernel version supports pidfd.

Yes, it's little gain, but I think it's a legitimate bug that asyncio clobbers the wakeup fd when it is already in use by something else, and this is the smallest fix I can think of. (Maybe smaller still would be to pass an explicit flag to new_event_loop() that means "don't use wakeup fd", maybe the OP would be okay with that? @hidmic)

@hidmic
Copy link
Contributor

hidmic commented Jul 12, 2022

Thanks for the bump @gvanrossum, and my apologies for not (ever) submitting a patch. It ended up in cold storage.

The reproducer makes no sense in real code as usually you don't have instantiate loop object.

you are creating two loops on the same thread both of which are unsupported by asyncio.

@kumaraditya303 Well, there's set_event_loop() and new_event_loop() APIs, both public and documented, and I see no mention of potential breakage if one were to attempt calling new_event_loop() more than once in the same thread.

I give you that the repro is artificial though, and as @gvanrossum correctly pointed out, my main issue was with asyncio clobbering an existing signal wakeup file descriptor.

The only way I can think of to support this would be to first allow multiple signal wakeup handlers as currently only 1 can be added. I expect the change to be non trivial.

I don't think changing the signal module to support multiple handlers is the way to go, so we agree there.

I understand where you're coming from, but I will point out that it is not the first time I've had to deal with two or more libraries or subsystems not playing along when it comes to signal handling. I think it would be great if Python could manage multiple handlers and wake up file descriptors, precluding this kind of issues entirely.

Yes, it's little gain, but I think it's a legitimate bug that asyncio clobbers the wakeup fd when it is already in use by something else, and this is the smallest fix I can think of.

Alternatively, a warning in ProactorEventLoop documentation and an exception if another wake up file descriptor is getting clobbered would do. It's harsh, but IMHO it's best to force a conscious call on the user instead of silently breaking/disabling some functionality.


In any case, thank you both for Python 🚀

@gvanrossum
Copy link
Member

The perverse thing here is that your use case is on Windows, which doesn't even have real signals.

There's one glimmer of hope -- signal.set_wakeup_fd() returns the old fd, so we can use it to raise that exception, like so:

old_fd = signal.set_wakeup_fd(self._csock.fileno())
if old_fd != -1:
    signal.set_wakeup_fd(old_fd)
    raise OSError(...)

This has two flaws: it doesn't restore the warn_on_full_buffer flag, and it's possible that a signal arrived and wrote a byte to fd before we reset it. But it's better than we currently have.

@kumaraditya303 kumaraditya303 removed the pending The issue will be closed if no feedback is provided label Jul 13, 2022
@ezio-melotti ezio-melotti moved this to Todo in asyncio Jul 17, 2022
@kumaraditya303
Copy link
Contributor

A PR raising RuntimeError if wakeup fd is already set would be better than status quo. Adding multiple wakeup fd would require some discussion as it would requires changes to the eval loop and signal handling.

@gvanrossum
Copy link
Member

Let's see a PR for the error if it's already set then. There could still be scenarios where this breaks "working" code, but maybe the breakage will do something useful (e.g. alert people that they have multiple subsystems fighting over the wakeup fd).

I agree that expanding the API to support multiple wakeup fds or multiple handlers would require more discussion -- for example what would we do to users of the old API?

FWIW the actual C code for the wakeup fd is a bit gruesome, I wouldn't want to be the one to make changes there (nearly everything's done one way for Windows and another way for UNIX).

@kumaraditya303
Copy link
Contributor

@hidmic Interested in creating a PR for raising exception ?

@hidmic
Copy link
Contributor

hidmic commented Sep 13, 2022

@kumaraditya303 sure

@hidmic
Copy link
Contributor

hidmic commented Sep 13, 2022

@gvanrossum @kumaraditya303 see #96807

gvanrossum pushed a commit that referenced this issue Sep 17, 2022
…96807)

Warn on loop initialization, when setting the wakeup fd disturbs a previously set wakeup fd, and on loop closing, when upon resetting the wakeup fd, we find it has been changed by someone else.
gvanrossum added a commit that referenced this issue Sep 17, 2022
gvanrossum added a commit that referenced this issue Sep 17, 2022
…yncio` (#96807)" (#96898)

This reverts commit 0587810.
Reason: This broke buildbots (some warnings added by that commit are turned to errors in the SSL buildbot).
Repro:  ./python Lib/test/ssltests.py
@gvanrossum
Copy link
Member

@hidmic Alas, this broke the SSL tests, which are only run on a buildbot. Do you need help reproducing the failures?

@hidmic
Copy link
Contributor

hidmic commented Sep 17, 2022

@gvanrossum I can reproduce. It's ssltests promoting warnings to exceptions (see here) and asyncio tests running into inconsistent signal handling setups over and over e.g. using a signal.set_wakeup_fd mock on proactor event loop construction but not on close.

I'll try to fix it.

@gvanrossum
Copy link
Member

@hidmic Any progress on fixing this?

@willingc
Copy link
Contributor

@hidmic or @kumaraditya303 Do either of you wish to make a PR for this issue? Thanks!

@willingc
Copy link
Contributor

Next action: If there is no further activity on this issue by the September core dev sprint, let's review there and consider closing as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

4 participants