-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
defer signal handling to a singleton thread #605
Conversation
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
@ros-pull-request-builder retest this please |
The |
Looks like MacOS may have failed due to network issues, then failed again on |
Another network failure. I don't see any errors though. |
|
The summary line explicitly says: 24522 tests, 0 errors, 0 failures, 50 skipped 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this patch makes CI happy, then this looks good to me. I believe I don't have too much context to see to review as deeply as I wanted to, but I had a few minor comments generally to it.
void | ||
deferred_signal_handler() | ||
{ | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (is_installed())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this would work because the location of the if (!is_installed()) {
is important, in that it comes after the loop over the context's. So I'd pass on changing this logic unless @hidmic thinks it's a good idea to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @wjwwood, location of the statement is important. Changing it may cause the process to lock forever if signal handlers are uninstalled as a result of context shutdown. Keeping both may prevent a signal to be handled if it comes while the handler is about to be uninstalled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine for me. I was just wondering if we could somehow change the while(true)
, but I guess in this case it's all right.
#endif | ||
signal_received.exchange(true); | ||
events_condition_variable.notify_one(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @hidmic mentioned this before, calling notify_one()
from a signal handler is undefined behavior. Is there anything else that can be used to wake the thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked him about this, but I don't remember his answer. Do you know of an alternative? Only thing I can think of is a busy wait in the signal handler loop and an atomic, but that doesn't sound very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching only turns up things that are platform specific.
sem_post()
for POSIX platforms http://pubs.opengroup.org/onlinepubs/009695399/functions/sem_post.html
I'm not sure about windows. I can't tell from the documentation if windows semaphores are signal safe, https://docs.microsoft.com/en-us/windows/desktop/sync/using-semaphore-objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like notfiy_one()
in libcxx uses pthread:
Which I gather isn't safe enough? Is that your conclusion as well @sloretz?
On Windows it uses WakeConditionVariable()
:
We're not using libcxx on Windows, but I imagine the std::condition_variable
is also implemented with that function in MSVC.
What do you (or others) think we should do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sloretz Yeah, that's the thing with signal handlers 😅
AFAIK there's no completely safe cross-platform synchronization mechanism that can be used from a signal handler (if only Windows was POSIX compliant...). Busy waits, with or without a timeout, are a portable option. Not a very neat one though. We can also do some more research on how to do this right for each platform.
I'd expect the current implementation to work in almost all cases though. Unless a signal comes right when the thread is about to do something with the condition variable. and that only takes place during a negligible fraction of the process lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which I gather isn't safe enough? Is that your conclusion as well @sloretz?
It uses pthread_cond_signal
, which this page says
It is not safe to use the pthread_cond_signal() function in a signal handler that is invoked
asynchronously. Even if it were safe, there would still be a race between the test of the Boolean
pthread_cond_wait() that could not be efficiently eliminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I was wrong, this is not safe for pthread, see:
It is not safe to use the pthread_cond_signal() function in a signal handler that is invoked asynchronously. Even if it were safe, there would still be a race between the test of the Boolean pthread_cond_wait() that could not be efficiently eliminated.
Mutexes and condition variables are thus not suitable for releasing a waiting thread by signaling from code running in a signal handler.
-- https://linux.die.net/man/3/pthread_cond_signal
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
I addressed some of the feedback, still need to address the locking of the install/uninstall functions. Also the safety of notifying from a signal. |
Signed-off-by: William Woodall <william@osrfoundation.org>
Ok, please re-review this. I'm working on a solution for the unsafe notification issue, but I think we should merge it as-is under the idea that's it is an incremental improvement (we were doing much more than using cond var's in the signal handler before). Can someone do CI and stuff for this while I work on the improved signal notification solution? We can do that in a follow up pr, since it should be API and ABI neutral. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with green CI. Just to make sure, the flaky test is addressed with the CI configuration?
Edit: oops ros2/build_farmer#152 (comment) |
It used to happen roughly once every 10 times or so for me, so I agree here. |
* [WIP] Refactor signal handling. * fix deadlock Signed-off-by: William Woodall <william@osrfoundation.org> * finished fixing signal handling and removing more global state Signed-off-by: William Woodall <william@osrfoundation.org> * add missing include of <condition_variable> * use unordered map in signal handling class Signed-off-by: William Woodall <william@osrfoundation.org> * use consistent terminology Signed-off-by: William Woodall <william@osrfoundation.org> * use emplace in map Signed-off-by: William Woodall <william@osrfoundation.org> * avoid throwing in destructor Signed-off-by: William Woodall <william@osrfoundation.org> * words Signed-off-by: William Woodall <william@osrfoundation.org> * avoid throwing from destructors in a few places Signed-off-by: William Woodall <william@osrfoundation.org> * make install/uninstall thread-safe Signed-off-by: William Woodall <william@osrfoundation.org>
Fixes #604