Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spawner.py in controller_manager is a bit unstable #475

Closed
samiamlabs opened this issue Jul 27, 2021 · 7 comments
Closed

spawner.py in controller_manager is a bit unstable #475

samiamlabs opened this issue Jul 27, 2021 · 7 comments

Comments

@samiamlabs
Copy link

Hi folks!

Thank you for this project.
It feels like a big step up from the original ros_control framework in terms of design and architecture.

I started having some issues with loading the controllers reliably after I moved my code from my laptop to a Jetson Xavier and the launch file got bigger.
The spawner.py script sometimes throws an unhandled RuntimeError on this line:

controllers = list_controllers(node, controller_manager).controller

RuntimeError: Could not contact service /controller_manager/list_controllers

If I increase the sleep here to 1.0 seconds, everything works fine:

It's probably better to catch the exception in and try again until the list_controllers service responds, right?

Do you want me to make an attempt at a fix and submit it as a PR or do you prefer if someone in the project team looks at it?

@samiamlabs
Copy link
Author

samiamlabs commented Jul 27, 2021

Very strange that this happens though.

One would think that cli.service_is_ready() should return false when the service is not ready to be called and result in wait_for_service(2.0) being called here:

if not cli.wait_for_service(2.0):

Maybe I'm missing something...
Just retrying does not really sit right with me on second thought, should probably find and address the root cause here.

I'm using a custom hardware_interface::SystemInterface that is in the middle of its pretty long start() function when this happens. Maybe that could prevent controller_manager from processing service calls temporarily?

@samiamlabs
Copy link
Author

Update:
Still getting the RuntimeError sometimes even after I increased the delay, just not as often. Did not see it for a while after I increased the sleep is all.

@samiamlabs
Copy link
Author

Needed to be able to start my robot base reliably, so added this to my fork for now:
DynoRobotics@e8d13b7

@destogl
Copy link
Member

destogl commented Aug 9, 2021

@samiamlabs thanks you for your interest in the library.

Indeed this seems to be a "bug" on your platform. We currently test everything on PC/laptops, and they are probably faster with spawning services after starting the controller_manager node, so everything works with sleep(0.2).

Before we decide to increase this wait time to sleep(1.0), we should probably check if there is some bug in ros2controlcli. It seems that "wait_for_service" is not working properly.

Do I understand your problem correctly?

@samiamlabs
Copy link
Author

samiamlabs commented Aug 9, 2021

I don't completely understand the problem myself at the moment. Increasing the sleep to 1.0 only made the issue less frequent, but it still crashed sometimes for me.

It's possible that list_controllers tries to make a service request before the controller_manager action server has started and something related to "wait_for_service" is not working as it should.

I think it is more likely that controller_manager is somehow being temporarily blocked from processing service call callbacks during hardware_interface initialization. I have not looked at the relevant parts in the code and could be wrong about that.

My current understanding is that "wait_for_service" is only supposed to check if relevant topics are available etc, not if, for example, something is blocking a single tread executor from processing callbacks in the action server node.

@destogl
Copy link
Member

destogl commented Aug 17, 2021

@samiamlabs I think you understood this correctly. There is basically a timeout between of spawner.py script before controller_manager services become available. Depending on used hardware, this can take longer or shorter.

I am currently working on control of HW's lifecycle, and this can lead that we start hardware asynchronously from the main CM thread. Still, this would probably lead to spawners making successful service calls, but their execution will fail if hardware is not ready. In that case, we would probably need a "retry_times" parameter in spawner.

For now, it could be safe to timeout even longer than 10 seconds. This should also help (if wait_for_service is working properly.

@bmagyar
Copy link
Member

bmagyar commented Jul 16, 2022

This issue I believe I solved now. If you guys find any new occurrence please feel free to reopen this issue or create a new one 👍

@bmagyar bmagyar closed this as completed Jul 16, 2022
pac48 pushed a commit to pac48/ros2_control that referenced this issue Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants