-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incoming service calls #669
Fix incoming service calls #669
Conversation
Hi, thanks for your contributions! I've just refactored some of the test code (#675) to allow us to more easily write tests which use the WebSocket server. Could you please update this PR to add a test — ideally one which fails without the proposed fixes, and passes with them? Check out the tests I'm adding in #666 for an example. |
+1 Please fix this, rosbridge is currently unusable (at least in Foxy) with this bug :( I've spent some time debugging why Foxglove Studio tool does not display anything when connecting to our ROS2 instance through rosbridge, then I found this... Maybe I will find some time later to write that test, but not right now. |
228586d
to
d67f891
Compare
Just pushed a commit that contains a test for the service call fix. With the patch from the first commit reverted, the test stalls until it times out due to the previously described issue. With the proposed fixes the test passes successfully. I borrowed the |
Oh, perfect, you were faster than me :) Thank you ... Now let's get it merged and released! |
I'll take a look at this after the holiday weekend! |
Thanks for pointing that out, I might not have noticed it otherwise. I was going to say it looks like the service call is happening in a background thread anyway, so changing it back to synchronous is fine, but now I see it's not actually running in a separate thread anymore (and the fact that it inherits from Thread is just tech debt now).
I haven't followed all the IncomingQueue changes and am still learning this codebase, but I would generally say that no blocking calls should be happening in the websocket message handler if possible. It seems rclpy provides a decent number of APIs that are async-friendly, which is how I fixed #650 without adding more threads. I wonder if we used this async approach consistently for all message handlers, perhaps we wouldn't need the IncomingQueue thread at all? In the meantime this small fix seems useful and is something we can iterate on. |
If you enable allow edits from maintainers then I can help resolve the merge conflict. It's a pretty minimal conflict though. |
I'd be glad to do so, except unfortunately I don't see the option available on the PR. Might be a parent org setting preventing me from enabling that option. Regardless, I'll do a quick rebase to resolve the merge conflict in just a minute. |
* After porting `IncomingQueue` from the ros1 branch, service calls from rosbridge clients are not receiving any response. * Traced the problem back to trying to call `spin_until_future_complete()` from a background thread. * Previously this function would have been called from the same thread that `rclpy.spin()` is called on. Add a websocket test for service call
d67f891
to
17cc7bd
Compare
Public API Changes
N/A
Description
Not to be confused with #650 (although possibly related?).
After porting the
IncomingQueue
class from the ROS1 branch in #663, I discovered that incoming service calls from rosbridge clients were no longer working. The rosbridge server would receive the service call and run until this line in thecall_service()
function where it would get stuck:rosbridge_suite/rosbridge_library/src/rosbridge_library/internal/services.py
Line 127 in 181e039
On the client side the connection would sit open ("pending" according to the Chrome debugger) waiting for a response that would never arrive.
After doing some digging I believe the problem is that we are now attempting to call
rclpy.spin_until_future_complete()
on a background thread instead of the main thread. I'm not super familiar with the details, but doing some quick research seems to indicate that at least historically trying to callspin()
from multiple threads can cause problems. For example:MultiThreadedExecutor:spin_until_future_complete
can block when the future is ready ros2/rclpy#585(These links are not directly related to the issue at hand but are semi-related problems I've seen regarding spinning and multiple threads.)
Since this code is running in the
IncomingQueue
thread instead of the main thread, it should be safe to use a synchronous service call here without blocking the main event loop. Making this change fixed the service calls and the client received responses again.One question I have is if we are going to switch back to a synchronous service call, do we want to run it in the
IncomingQueue
thread? Or do we want to create a new thread for each service call so that theIncomingQueue
is not blocked? TheServiceCaller
class inherits fromthreading.Thread
and historically under the ROS1 branch it looks like it was run with.start()
instead of.run()
like it is now.Perhaps someone with a better understanding of Tornado and the event loop and threads used in rosbridge has a better idea of how this situation could be handled without adding more threads?