`move_base_interruptable_server` seems to be causing `move_base` to crash #87

warnellg · 2017-07-19T17:06:54Z

Seems to happen when move_base_interruptable_server tries to call clear_costmap_service, but only sometimes.

I'm in the processes of investigating why this happens.

warnellg · 2017-08-01T18:53:58Z

I did a little more investigating this morning, including capturing a core dump (thanks for the tip, @jack-oquin!).

First, I'm not actually sure that this is a problem with move_base_interruptable_server. I'll leave the title of this PR as-is for now, but it may need to be updated going forward. The reason I say this is because I rebuilt after commenting out a large portion of move_base_interruptable_server.cpp and the problem didn't go away.

I'm now thinking that this could possibly be a bug within the ROS navigation stack itself. I ran our move_base node in GDB in a separate terminal, and, after a long while (I'm still unsure as to how to reliably reproduce this bug, which is worrisome), found Leela motionless at a door with the following output in the move_base terminal (entire core dump is available on Leela at /home/users/warnellg/Desktop/core.3472):

Obviously, the backtrace here wouldn't be able to implicate any of our BWI code even if it was at fault, but the fact that the error occurs so many levels deep is why I wonder if this is some kind of bug in the ROS navigation stack. Specifically, I wonder about the global_planner code, which is implicated in the lowest-level frames.

I'm not exactly sure what the path forward is at this point, so I'm open to suggestions. Perhaps next I'll try switching our move_base's base_global_planner from global_planner/GlobalPlanner back to the default navfn/NavfnROS to see if that makes a difference.

jack-oquin · 2017-08-02T14:42:41Z

At first glance, there seems to be enough information to open an issue in https://github.com/ros-planning/navigation/issues

It's probably worth checking whether ros-planning/navigation#584 is related.

warnellg · 2017-08-02T16:47:11Z

@jack-oquin, that does seem like it could be very related! Though we don't seem to get that exact warning message. I suppose to test this, we'd have to compile our own navigation stack from source and use that.

Another theory I've had is that this somehow related to the way we (BWI) deal with the map. Specifically, when monitoring Leela during visit_door_list, I've noticed that the map seems to refresh after every completed goal. Further, these crashes are only occurring at the goal locations themselves (doors). Is there perhaps some kind of weird race condition cropping up here where the planner is trying to use a costmap that has temporarily been deleted because it's being replaced by some other BWI process?

@piyushk, are you able to comment on this?

…egbot into fix-move-base-crash

warnellg · 2017-08-03T21:42:46Z

OK, I tried out switching the global planner to navfn/NavfnROS today, and Leela ran without issue until the battery gave out.

Assuming this passes further testing, this would seem to indicate that our issue might be specific to the global_planner/GlobalPlanner code, or at least the particular way we interact with it.

I want to test this out a few more times in order to make sure this has really resolved our issue.

I also want the extra test runs in order to make sure Leela actually navigates smoothly in the hallways: I already found, for instance, that I needed to remove the obstacle_layer/footprint_clearing_enabled=false line in costmap_common_params.yaml in order for it to not get "stuck" sometimes.

warnellg · 2017-08-08T19:23:53Z

Tested this for a few more hours today and all seems well: Leela seemed to run smoothly and without crashing.

So I'm going to go ahead and merge this into master.

I'm also going to open another pull request, perhaps destined to last a very long time, to figure out how we can switch back to global_planner/GlobalPlanner. There, we could, for example, try building the latest navigation stack from source and see if recent changes have fixed this or if we can track down the bug ourselves.

marking where crash appears to happen sometimes

0714625

warnellg self-assigned this Jul 19, 2017

Merge branch 'master' into fix-move-base-crash

d9751e3

warnellg requested review from piyushk, shihyunlo, jack-oquin and justinhart August 1, 2017 18:54

jack-oquin force-pushed the master branch from 9d4ec99 to e8505b4 Compare August 2, 2017 15:37

warnellg added 3 commits August 2, 2017 11:52

Merge branch 'master' into fix-move-base-crash

1cd64e2

switched global planner to navfn

903be38

Merge branch 'fix-move-base-crash' of https://github.com/utexas-bwi/s…

04b78b4

…egbot into fix-move-base-crash

warnellg merged commit a82dae3 into master Aug 8, 2017

warnellg deleted the fix-move-base-crash branch August 8, 2017 19:24

warnellg mentioned this pull request Aug 8, 2017

debug move_base crash when using global_planner/GlobalPlanner #91

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`move_base_interruptable_server` seems to be causing `move_base` to crash #87

`move_base_interruptable_server` seems to be causing `move_base` to crash #87

warnellg commented Jul 19, 2017 •

edited

Loading

warnellg commented Aug 1, 2017

jack-oquin commented Aug 2, 2017

warnellg commented Aug 2, 2017

warnellg commented Aug 3, 2017 •

edited

Loading

warnellg commented Aug 8, 2017

move_base_interruptable_server seems to be causing move_base to crash #87

move_base_interruptable_server seems to be causing move_base to crash #87

Conversation

warnellg commented Jul 19, 2017 • edited Loading

warnellg commented Aug 1, 2017

jack-oquin commented Aug 2, 2017

warnellg commented Aug 2, 2017

warnellg commented Aug 3, 2017 • edited Loading

warnellg commented Aug 8, 2017

`move_base_interruptable_server` seems to be causing `move_base` to crash #87

`move_base_interruptable_server` seems to be causing `move_base` to crash #87

warnellg commented Jul 19, 2017 •

edited

Loading

warnellg commented Aug 3, 2017 •

edited

Loading