Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race conditions when updating PlanningScene #350

Merged
merged 8 commits into from
Jan 5, 2017
Merged

fix race conditions when updating PlanningScene #350

merged 8 commits into from
Jan 5, 2017

Conversation

rhaschke
Copy link
Contributor

This a new attempt to resolve #232. The core new idea is to wait until the robot has stopped moving before claiming the execution of a trajectory has finished.

…ue_)

Due to an upstream bug, it's not possible to start multiple AsyncSpinners from different threads.
Filed PR: ros/ros_comm#867

The spinner is now only needed to serve our own callback_queue_ for
scene updates, which is only required for syncSceneUpdates() that
syncs all kind of scene updates, not only the robot state.
... because we might wait up to 1s for a robot state update
@davetcoleman
Copy link
Member

If I'm not mistaken, this PR is the same as #232 except you addressed your last comment in that issue, correct?

Copy link
Member

@davetcoleman davetcoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - seems like a great addition, though I haven't tested it in runtime

ros::WallTime start = ros::WallTime::now();
ros::WallDuration timeout(wait_time);

ROS_DEBUG_NAMED("planning_scene_monitor", "sync robot state to: %.3fs", fmod(t.toSec(), 10.));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"planning_scene_monitor" can just be LOGNAME for consistency

@@ -1430,6 +1437,61 @@ bool TrajectoryExecutionManager::executePart(std::size_t part_index)
}
}

bool TrajectoryExecutionManager::waitForRobotToStop(const TrajectoryExecutionContext &context, double wait_time)
{
if (allowed_start_tolerance_ == 0) // skip validation on this magic number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< epsilon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not a comparison against a small value.
Rather, we agreed that a value of zero would mean to skip validation completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed_start_tolerance_ is a double, which as I understand it you should never compare directly to 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true if the result is an outcome of some computation. However, in this case, allowed_start_tolerance_ should be set from parameter server to be exactly zero.

prev_state->enforceBounds();

// assume robot stopped when 3 consecutive checks yield the same robot state
unsigned int no_motion_count = 0; // count iterations with no motion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class (and most of moveit) uses std::size_t rather than unsigned int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::size_t is required for array access. Here we count up to 3 only. No need for a 64bit integer ;-)


// assume robot stopped when 3 consecutive checks yield the same robot state
unsigned int no_motion_count = 0; // count iterations with no motion
while (time_remaining > 0. && no_motion_count < 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> epsilon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to compare against epsilon here. time_remaining is not affected by precision issues. The resolution is defined by the clock only.


// check for motion in effected joints of execution context
bool moved = false;
for (const auto &trajectory : context.trajectory_parts_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan on picking this to previous ROS versions? if so we should avoid auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should exploit C++ features in new code as much as possible. If we decide for backporting, this needs to be adapted indeed. But I mentioned that in the initial comment.

"but the full state is not known.");
if (!current_state_monitor_->waitForCurrentState(ros::Time::now(), wait_seconds))
{
ROS_ERROR("Failed to fetch current robot state");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_NAMED

@@ -5,4 +5,5 @@ if (CATKIN_ENABLE_TESTING)
target_link_libraries(test_cleanup moveit_move_group_interface)

add_rostest(python_move_group.test)
add_rostest(robot_state_update.test)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks!

rospy.init_node(NODENAME)
rostest.rosrun(PKGNAME, NODENAME, RobotStateUpdateTest)

# suppress cleanup segfault
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand this comment to explain when this can be removed? this isn't necessary in kinetic right?

robot_state::RobotState start = *planning_display_->getQueryStartState();
updateQueryStateHelper(start, ui_->start_state_selection->currentText().toStdString());
planning_display_->setQueryStartState(start);
}

void MotionPlanningFrame::useGoalStateButtonClicked()
{
planning_display_->addBackgroundJob(boost::bind(&MotionPlanningFrame::useGoalStateButtonExec, this), "update goal "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment on why a background job is used for this click?

}

void MotionPlanningFrame::useStartStateButtonClicked()
{
planning_display_->addBackgroundJob(boost::bind(&MotionPlanningFrame::useStartStateButtonExec, this), "update start "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment on why a background job is used for this click?

@v4hn
Copy link
Contributor

v4hn commented Nov 14, 2016

format checking failed.

@rhaschke
Copy link
Contributor Author

Damn formatting. clang essentially reformats long code line every time again...

@davetcoleman
Copy link
Member

@v4hn can someone in your lab test this at runtime?

@BrettHemes
Copy link

@davetcoleman what exactly do you want tested?

I am in Indigo and now have this issue after recently reinstalling from source. I can try to pick this into indigo-devel and test on a real robot if it will help.

@davetcoleman
Copy link
Member

@BrettHemes I'm not exactly sure - that updating the planning scene with collision objects still works and waiting until the robot trajectory stops executing before doing the next command? @rhaschke could better answer this question

Note that a few small changes are required for indigo

@rhaschke
Copy link
Contributor Author

@BrettHemes It would be cool, if you could repeat the unit test (moveit_ros/planning_interface/test/robot_state_update.py) on a real robot setup. This test performs some random walk in joint space and should ensure, that the robot states reported after execution of a planned trajectory match the actual robot state. Additionally, as @davetcoleman suggested, it would be cool to test that collision objects obtained from depth sensors are correctly updated in the collision scene.

@davetcoleman
Copy link
Member

@BrettHemes have you gotten a chance to test this on real hardware?

@davetcoleman
Copy link
Member

I have run the test script locally without issue. Given the elapsed time of this PR I think we should merge this.

@davetcoleman davetcoleman merged commit a77b3da into moveit:kinetic-devel Jan 5, 2017
@rhaschke rhaschke deleted the fix-442 branch January 11, 2017 17:06
k-okada added a commit to k-okada/moveit that referenced this pull request Dec 10, 2017
…ate(1.0) in TrajectoryExecutionManager::validate, cherry pick part of moveit#350
@rhaschke rhaschke mentioned this pull request Dec 12, 2017
5 tasks
davetcoleman pushed a commit that referenced this pull request Jan 15, 2018
…ate(1.0) in TrajectoryExecutionManager::validate, cherry pick part of #350
rhaschke added a commit to ubi-agni/moveit that referenced this pull request Jan 25, 2018
@rhaschke rhaschke mentioned this pull request Jan 25, 2018
v4hn added a commit that referenced this pull request Jan 26, 2018
@mmoerdijk
Copy link

I'm not sure if this issue is completely resolved. I'm still getting "Invalid Trajectory: start point deviates from current robot state" message. When using the moveitcommander and the group.go() command without delays to move the robot. The speed of the robot is quite high, this seems to increase the frequency of the problem.

I'm on kinetic and build move it from source using the 245c7b5 commit.

I spent quite some time at debugging and I can say the following things:

  • Joint state updates on the /joint_state topic are correct / no delay.
  • The waitForCurrentRobotState method actually waits for the next joint message
  • current_state_time_ and joint values received correctly in the jointStateCallback
  • Planning starts after waitForCurrentRobotState but still somehow uses old values

I found out that if I set update = true just before line 352 I can plan and execute in rapid succession.

So this probably has something to do with the callbacks that are called on line 405 of the current_state_monitor. But I don't understand why @rhaschke you seem to be the expert on this can you explain why always executing the callbacks makes a difference...?

@rhaschke
Copy link
Contributor Author

rhaschke commented Apr 3, 2018

@mmoerdijk Can you please verify that the unittest passes without issues.
Can you also list the callbacks called in your situation?
I don't think, semantically the callbacks should make a difference. However, they take some extra time and thus may affect the timing.
It would be perfect if you could provide a new or modify the existing unit test to reproduce your problem.

@mmoerdijk
Copy link

Just to be sure I removed my old moveit repo and freshly cloned the latest kinetic branch.

The unittest you mention passed without errors using catkin run_tests --no-deps moveit_ros_planning_interface. But running the unit test with my setup (directly running the python file) it still generates the same errors, with the modification it works without problems.

Obviously my setup has something to do with it. I'm using moveit in combination with the joint_trajectory_controller from ros_control and a custom hardware interface. The hardware interface is currently just mirroring the values from the joint_trajectory_controller. And looking at the joint_states topic doing it correctly. And the weird thing is, I added some debug code in jointStateCallback to output the joint values during the call to waitForCurrentState and there it outputs the correct joint values before the planning starts. But still the plan starts at older joint values.

I will try to make a minimal example setup but i don't have much time until the beginning of may.

As for the callbacks, there is only one callback and that is the call to planning_scene_monitor::PlanningSceneMonitor::onStateUpdate

@rhaschke
Copy link
Contributor Author

rhaschke commented Apr 4, 2018

planning_scene_monitor::PlanningSceneMonitor::onStateUpdate

This function actually performs the scene update, i.e. applies changes to the current state within the planning scene. Looks like that not all scene updates are triggered as necessary.

@mmoerdijk, could you please check whether this fix solves your issue.

@mmoerdijk
Copy link

@rhaschke I tried out your fix and now and it seems to solve my problem. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants