-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller_server and planner_server dies during execution #1727
Comments
Please provide an actual commit hash of your last build. Also, please get a traceback and post here. The crashing error is pretty non-descript. The ROS2 master branch is what master is built on. If you’re using eloquent binaries, you should use the eloquent branch but you will lose certain new functionality. |
Also, how soon after starting did these happen for you? Please attach a file with the full terminal output next time it happens. I want to see if something happened on setup. I see a couple things in your config file that makes me suspect. Do you see crashing with the default profile? |
Any update? |
Hi, thank you for the follow up. Due to the covid19 situation we have limited access to the robot labratorium, and have not been able to test any further. We will continue the work on monday and get back to you with the full terminal output. |
Got it. I suspect it has to do with your install / mismatched versions / using a really old version / error on startup you're just not noticing. If this were happening, we'd be having CI crash and other users reporting it (we run 10 minutes full simulation in CI and we have about 20 developers running master). There also might be a real issue, in that case we should figure out where this is happening with a traceback |
Hi, I got the same "double free or corruption" error (1 and 3) with eloquent and not with foxy. I have never got the error 2 but it could be related to I hope my experience could help. |
I run eloquent binaries locally (or rather, I did until a couple weeks ago when API breaking changes happened) and never saw this with any of my local development - so I'd be surprised if it was only due to that. Checking with master/foxy would be a good second step, but running GDB with it and figuring out who's crashing is a good starting point, to know if its something we can change in ROS2-ecosystem-land or something in ROS2-land. |
Hi, sorry for the delay. The problem occur after an initial pose and goal pose is set, and the robot is navigating towards the goal. It doesn't happen at every execution, but often enough. The errors are observed both in simulation and with real execution. We tried, as you suggested, to install the Eloquent binaries, but the problems still occur. Below are three terminal outputs of the commonly observed errors. The errors from the master-branch installation are shown in 1. Output and 2. Output. The terminal output for the Eloquent branch is shown in 3. Output (next comment) 1. Output:
2. Output:
|
3. Output:
|
@charlotteheggem we need a traceback, I don't see anything in logs that hits me immediately as a problem. Given that you see 3 different crashes with the multiple of the same setups as I (and all of the developers on this project) test with, I think the issue is likely to be in your system rather than this codebase, but lets work to figure out why just to make sure. The first step is to run GDB with this so that when it crashes you can get a traceback of what happened and where. Once we have that, we can go from there. I suspect that you'll find that its something unrelated to Navigation2, I suspect TF, since the only thing I can think of that links those 3 servers together is TF. It may be that your master builds are old and you need to pull in new ones. In fact, if you're working with our current master branch, you couldn't compile it without updates because of some Also, what computer are you running this on (general specs on CPU, memory, cores)? |
Hi, From what I have seen in other issues, the output might not be in the format that you wanted. Please correct me if this is not the right way to do it. Both outputs attached are obtained with the Eloquent installation, but at two different computers.
Both the ROS computer and the mobile robot is running over WIFI on a ASUS 5G network. (https://www.asus.com/Networking/RTAC51U/) GDB output 1
|
GDB output 2
|
What was the traceback though? Please see https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_42.html. In the GDB session, what I'm looking for is the actual traceback of the error to see where its coming from. From your second console output:
Its relatively clear, though we can't know 100% for sure without the traceback, your issue is deeper than navigation2. Looks like issues with your clock jumping in rcl ros2/rcl#293. Maybe you need to update eloquent or built from source and the rcl / rclcpp didn't match versions? Are you resetting clocks / simulations or something? 0.8.4 is 4 months old. |
Hi, here are some of the tracebacks from today: traceback 1:
traceback 2:
traceback 3:
traceback 4:
traceback 5:
We agree that this does not seem to be due to Navigation2. We suspect that the TF data is published too slow compared to the sensor data. What rates should the scan data and odometry data ideally be published at? |
You have a bunch of issues with TF2 and clocks - my guess is your issue is actually with the clocks and its just being manifested in TF2 since it uses timing heavily. Publishing TF too slow would not cause a crash or a segfault. You'd get navigation level warnings about it but that's all. I think you should file tickets in the appropriate places from these tracebacks and see what they say. I don't know how you've installed ROS2, but you might have something out of sync and you need to pull in compliant versions. |
Thank you so much for using your time to look into this, it is really appreciated! :) We will hopefully sort this out soon. I am closing this issue as it is not related to Navigation2. |
If someone else happens to run into the same problem, we solved it by using a different branch of rclcpp (ros2/rclcpp#1144) :-) |
Hi. When launching Navigation2 with the attached parameter file, the controller_server and planner_server dies during the execution due to the following errors:
Error1:
[controller_server-4] double free or corruption (fasttop) [ERROR] [controller_server-4]: process has died [pid 19044, exit code -6, cmd '/home/ninamwa/navigation2/install/nav2_controller/lib/nav2_controller/controller_server --ros-args --params-file /tmp/tmprvckenvo -r /tf:=tf -r /tf_static:=tf_static'].
Error2:
[controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] [INFO] [controller_server]: Passing new path to controller. [controller_server-4] malloc_consolidate(): invalid chunk size [ERROR] [controller_server-4]: process has died [pid 21673, exit code -6, cmd '/home/ninamwa/navigation2/install/nav2_controller/lib/nav2_controller/controller_server --ros-args --params-file /tmp/tmpqtefwy86 -r /tf:=tf -r /tf_static:=tf_static'].
Error 3:
[planner_server-5] double free or corruption (fasttop) [ERROR] [planner_server-5]: process has died [pid 25477, exit code -6, cmd '/home/ninamwa/navigation2/install/nav2_planner/lib/nav2_planner/planner_server --ros-args -r __node:=planner_server --params-file /tmp/tmpagidbqgn -r /tf:=tf -r /tf_static:=tf_static'].
This happens about every time, and the robot almost never reaches the goal because one of these errors occur (not the same every time). It happens both with the real robot and in simulation (gazebo).
Is there anything with the parameters which overload the system or what could cause the problem? It is really frustrating having to restart Navigation2 all the time. I tried downloading the master branch from source yesterday, but got a build error.
Thanks.
`
Parameters:
`
The text was updated successfully, but these errors were encountered: