-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix (#3109) deadlock when costmap receives new map #3145
Merged
SteveMacenski
merged 3 commits into
ros-navigation:main
from
CMU-cabot:main-costmap-deadlock-fix
Aug 31, 2022
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably never be a problem, but please note that part of the
processMap
method is not thread-safe if we don't lock the Costmap2D mutex here as well. There is also a second call to this mutex withinprocessMap
, which might be able to produce the same deadlock that we are fixing.The best way to go would probably find a way to move
processMap
to the main thread completely.Again, this will probably never be a problem, and I might be wrong about this, but just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I removed it once, but I got some errors in the test.
These tests use
LayeredCostmap
andStaticLayer
, but do not callupdateMap
and wait for the costmap to be ready. So they got a timeout.So I needed to keep this call here and keep it mutex free because the mutex can cause a deadlock at the very beginning.
Do you want me to remove the call and fix the tests too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can simply remove it. If we want to remove it from here, we will need to do some other changes. There might be several places where the main loop might attempt to make use of the first map when was received, but not yet processed. We need to ensure that all places where the processed map is used are aware of the possibility of the map not being yet initialised.
Then, in my opinion, it is good that that test fails, it should, because you are changing the behaviour of this plugin. However, the tests failing are not a reason to not do the full change to processing in the main loop. If we want to move that last call to
processMap
, then we should fix these tests and/or write new ones that fit the current implementation.I understand that you are focused on fixing the deadlock, but that doesn't mean you can simply leave unprotected code behind.
There are three points I want to do here, which I'm not sure have been considered yet:
First, in line 225 of the static layer we are locking the same mutex. This means you might not have fixed the lock entirely, only you don't see it anymore because it only happens with subsequent calls to
incomingMap
.Second, all this code is retrieving data and performing operations on the layered costmap without protection, which to me, sounds like a very bad idea.
Lastly, setting
map_received_
without protection might bring undesired behaviour when checking it from the main loop. For example, in the main thread, you could have just calledupdateBounds
withmap_received_
set to false, followed by aupdateCosts
withmap_received_
set to false.As a rule of thumb, never leave a variable shared between threads unprotected. I am sorry, I know I am being a bit bipolar about this. I sometimes feel like it's okay to break that rule, but things here look a bit too complex to leave the code unprotected. I think I wasn't thorough enough about the threading issues before my previous review, I apologize.
To sum up, you should move back the lock to the beginning of this method. I don't think you will see the deadlock if you do so. Then it's up to you to remove the last
processMap
call here and fix the tests.Any thoughts @SteveMacenski? I think you know better if it's best to remove that last call to
processMap
or keep it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the issue. In steady state,
processMap
is only called in theupdateBounds
function, so we're really only talking about the very first map when the current costmap is not set to any particular valid size/content.In the situation we're talking about with the first map, the
map_received_
boolean at the start of both Update functions will make them immediately return without trying to do anything (if false) - which is before any map is processed.So, if no map is processed before updateBounds its OK. If its processed before that point in the map callback, also OK.
If no map is processed before updateCosts and updateBounds, OK. If its processed after updateBounds but before updateCosts, that's where it could potentially be not OK. I think my comment just below lays out a solution to that (maintain the state of the "updatedness" from updateBounds to also use in updateCosts).
But in terms of leaving things unprotected, this seems like it does that fine? Because of the special case that
map_received_
immediately returns the functions without doing anything.