-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XMLRPC HTTP/1.1 causes performance degradation in rosmaster between Kinetic and Melodic #2118
Comments
thanks for the information, this really helps 👍
totally agree.
but this affects entire ROS system...which i think that is not very much acceptable. btw, do you happen to know which kernel patch fixes this problem? |
Sorry, I haven't had time to narrow it down, all I know is that it works on 5.4 |
Addressing performance issues described in #2118 Signed-off-by: Jesse Ikawa <jikawa@amazon.com> Co-authored-by: Emerson Knapp <537409+emersonknapp@users.noreply.github.com>
can we close this? resolved in #2132 |
Yes, this is resolved! It would be good to get a ros_comm release to Melodic with the patch |
I'll do a Noetic release first, let that soak, then I can do a backport to Melodic. |
@jacobperron Any updates for the backport? Thanks! |
Addressing performance issues described in #2118 Signed-off-by: Jesse Ikawa <jikawa@amazon.com> Co-authored-by: Emerson Knapp <537409+emersonknapp@users.noreply.github.com>
I tested Further info:
|
Related to #371
Introduced in #1287
Environment
Description
I am able to reliably cause the
rosmaster
to stop responding to service calls withunable to contact master
- as raised by https://github.com/ros/ros_comm/blob/noetic-devel/clients/rospy/src/rospy/impl/tcpros_service.py#L467 (triggered specifically by the call tomaster.lookupService
)The reproduction workflow involves starting a single
rospy.Service
in one node (servingstd_srvs/SetBool
), and 11rospy.ServiceProxy
instances each in separate nodes. Each of these clients calls the service at 200Hz. After about 30 seconds, theServiceException: unable to contact master
starts to occur. The master does not crash, but is unreachable for several seconds. If the stress is stopped without stopping the master, then the situation is reproducible, suggesting there is no lasting damage done to the master process - just a temporary hang of some kind.Repro Instructions
I am running in a container, the image was build using the following Dockerfile
The test application sources are testpkg.tar.gz
I run the following workflow
testapp.launch
- it fails out in under a minute, meaning the rosmaster was unreachable even after several tries.from osrf/ros:kinetic-desktop
for the docker image)ros_comm
atmelodic-devel
into the workspace, revert Use HTTP/1.1 in XMLRPC Server #1287, build and run, then the app will run indefinitelyThis is of course a toy stress example, but it reproduces an error we have seen in more complex applications being run in a production environent.
Next Steps
I see the following options:
ros_comm maintainers, what do you think would be best? We will probably be able to solve this problem for our specific case by providing an environment running an upgraded kernel, but it likely affects other users, perhaps who have spent less time trying to debug its root cause. Given the 2023 EOL for Melodic, I would think we should take action rather than just wait it out.
The text was updated successfully, but these errors were encountered: