Premature controller initialization #206

miguelprada · 2018-10-03T12:57:14Z

This is a follow up to zagitta/ur_modern_driver#29.

There is a potential race condition in the current driver where sometimes the ros_control controllers are initialized before the joint handles contain proper information about the current robot position. This in turn can (depending on the specific controller) produce very fast and unexpected motions towards whichever garbage is found in the uninitialized handle's position memory locations. This usually happens when the controllers are spawned immediately after starting the driver, which is the default behavior when using the provided urX_ros_control.launch launchfiles.

I think I tracked down the problem to this line in Zagitta#6. Calling ROSController::update on consumer timeouts in turn calls ControllerManager::update, which potentially starts controllers if requested to do so, even if no RT packets from the robot have been yet received. I'd like to do more testing to see whether this has other side effects, but I can confirm that removing this controller update removes this issue for me.

According to the commit message by @v4hn, The controllers need to update at a rate of at least 125Hz, but there's no further evidence for this affirmation. I'm not sure I agree with it and would suggest removing that call altogether, but I will gladly stand corrected.

Thoughts?

The text was updated successfully, but these errors were encountered:

gavanderhoorn · 2018-10-16T09:02:45Z

Hi @miguelprada. We're not ignoring you, but I can't find some time to reproduce this.

What you write makes a lot of sense. Could you perhaps submit a PR with your suggested change as that would facilitate testing?

miguelprada · 2018-10-16T15:33:09Z

I can quite reliably reproduce the described behavior by just launching the ur10_ros_control.launch in the kinetic-devel branch.

The effect in most trials is that the robot will start moving towards some random position contained in the uninitialized joint handles. To avoid jump scares, I recommend that before starting the above launchfile, the robot's velocity is limited using the speed slider in PolyScope's Move tab to somewhere below 50%.

One quick and (very) dirty way to confirm what's going on is to add joint_trajectory_controller to the test setup and introduce some logging statements here to print out the values used to initialize the joint positions.

Conversely, with the changes in #213 I can safely launch ur10_ros_control.launch.

v4hn · 2018-10-22T14:27:21Z

According to the commit message by v4hn, The controllers need to update at a rate of at least 125Hz, but there's no further evidence for this affirmation. I'm not sure I agree with it and would suggest removing that call altogether, but I will gladly stand corrected.

I am not 100% sure about the current situation, but when we tested the ros_control integration with this driver we needed the 125Hz updates, because the URScript would stop the arm abruptly if it did not receive a new message in that cycle. do_brake = True

I did not verify whether this is still the case, but I guess it is.

I believe the better way to resolve your misbehavior is to wait until the first data is available before controllers can be loaded.

miguelprada · 2018-10-22T19:46:31Z

I am not 100% sure about the current situation, but when we tested the ros_control integration with this driver we needed the 125Hz updates, because the URScript would stop the arm abruptly if it did not receive a new message in that cycle. do_brake = True

I see. It's not so much that the controllers need to be updated at that rate, but rather that the URScript running in the robot when using the position interface needs to receive commands at least at that rate. I failed to see this since I haven't used the position interface in a while (I even thought it was still based on streaming servoj commands, as this driver did before the refactor). I will certainly do some tests with a position interface-based controller and report back.

I believe the better way to resolve your misbehavior is to wait until the first data is available before controllers can be loaded.

This is already achieved with the change proposed in #213, since the controller manager is only updated when receiving RT packets. However, it is very likely it makes position interface-based controllers misbehave (again).

In any case, and without having had a thorough look at how this works, I would find it more reasonable to relax the requirement in URScript to have a new command every 8ms rather than artificially updating the controllers with stale state. Do you remember having tried this?

v4hn · 2018-10-23T09:44:12Z

#213 [...] However, it is very likely it makes position interface-based controllers misbehave (again).

Yes, this pull request would break this logic again and result in abrupt stops in position control.

In any case, and without having had a thorough look at how this works, I would find it more reasonable to relax the requirement in URScript to have a new command every 8ms rather than artificially updating the controllers with _stale_ state. Do you remember having tried this?

I guess both are (crude) workarounds and the correct way to handle it is a realtime-safe system. This requires quite a lot of insight, especially for newcomers, though. It's basically a matter of perspective: - Do we want the robot to continue moving if we did not send a control? - or do we want to explicitly send "blind" commands if we don't know the state of the system? Back then our group opted for the latter because it leaves control with the host system. Feel free to change that decision for upstream, as long as the package works in the end.

miguelprada · 2018-10-23T14:17:04Z

Some non exhaustive tests with nothing but this driver running a position_interface/JointTrajectoryController show me that do_brake is never set to true. However, when I artificially stress the system I can reproduce the abrupt stops.

I just updated #213 with an alternative solution, which I believe should avoid this issue while keeping the current behavior on pipeline timeouts. Could you please have a look @v4hn?

carlosjoserg · 2018-10-23T18:17:57Z

I've been following the issue, as I believe I experienced the same thing about a year ago using the master branch version. Don't know if anyone had it before, but I can assure having an UR10 jumping all of a sudden to all joints to zero from anywhere, in the middle of a factory installation, is more than scary, it is unforgettable. And that matches the description given in https://github.com/Zagitta/ur_modern_driver/issues/29 and one of the videos in #220.. don't remember the CBx box version though, but it happened when the ros-driver and the CBx box were started near in time.

However, I've handled it differently from #213, and without touching the ur driver code base. But I'm still understading what's on the kinetic-devel branch to see if what I did can help here or it's already done and conclude this is different.

So, couple of questions:

Does the received packet has already good values for the joints at this point?
For instance, in the master branch version there is no guarantee of that at starting. In there, the driver starts as long as there is a socket connection on both secondary and real-time ports, and able to unpack a message to get a valid firmware from the secondary port, without knowing if the CBx box controller is ready to provide the real state of the arm (we are talking very small times here).
@miguelprada When you say you printed the initialized values of the controller here, they are zero or non-sense values?

miguelprada · 2018-10-23T21:08:00Z

So, couple of questions:

Does the received packet has already good values for the joints at this point?
For instance, in the master branch version there is no guarantee of that at starting. In there, the driver starts as long as there is a socket connection on both secondary and real-time ports, and able to unpack a message to get a valid firmware from the secondary port, without knowing if the CBx box controller is ready to provide the real state of the arm (we are talking very small times here).

@miguelprada When you say you printed the initialized values of the controller here, they are zero or non-sense values?

In the current kinetic-devel, one of three situations might happen:

Non-sense values occur when a timeout in the RT pipeline ends up calling ROSController::update before a packet has been received and consumed with ROSController::read. I believe this is a race condition specific to the refactored code in kinetic-devel, and is the situation that Resolve premature controller initialization #213 hopefully addresses.
Zero values can occur when a RT packet has been actually received, containing all zeros. I believe this happens when the CB has been powered on but the arm servos have not been yet powered on. I think this used to be a problem with the version in master if you had your ros controllers started at this time (e.g. when launching the default ros_control launchfiles before having, at least once, powered on the arm servos). I probably need to triple check this, but I believe the refactor solves this particular case by resetting the controllers (i.e. !service_enabled) while the robot isn't enabled.
The correct values are received.

It seems to me that you may be referring to the second of those situations, @carlosjoserg.

And yes, it's really scary.

carlosjoserg · 2018-10-23T23:44:20Z

Great summary!

The first item seems to be only on kinetic-devel. That race condition doesn't look good though, it reminds me of why the KUKA Fast Research Interaface WaitForKRCTick() method was key to keep all in sync. Perhaps something like that could be added? Because #213 only cares about initialization, but still doesn't prevent that, later on, more than one ROSController::update be called in between two ROSController::read calls, if I understood correctly, right?

Regarding the second item, it might be that it's already tackled since the robot state and controller state can be read. My two cents here anyway.. What I did was to implement a dashboard client mainly to perform the CB initialization sequence (power up, brake release, changing user roles to avoid someone messing with the move tab, etc.) so the the arm is ready before starting the ros-control loop. I gave reasonable times for the CB initialization to happen. My poor-but-effective implementation is here in our fork in case of interest of doing it better. Since then, I never touch the teach pendant again (only to give electricity), and neither have seen any initialization jumps (so far) in our applications.

For tracking purposes, adding a dashboard client has already been mentioned time ago in #5 for safety reasons and recently in #165, and also considered for #99

miguelprada · 2018-10-24T07:56:07Z

Because #213 only cares about initialization, but still doesn't prevent that, later on, more than one ROSController::update be called in between two ROSController::read calls, if I understood correctly, right?

That's by design, to force controller updates at at least 125Hz and avoid hitting this line in the URScript running on the CB when using the position interface.

I'm not really onboard with this decission, but it only causes the controllers to be updated with old state. The issue discussed here causes the controllers to be initialized and updated with uninitialized memory as their input, which is way worse and should be fixed ASAP in my opinion.

happygaoxiao · 2018-10-30T02:09:54Z

@miguelprada Hello, I have used this driver in UR5 and UR3 with urX_ros_control.launch and controlled movement with the client of pos_based_pos_traj_controller/follow_joint_trajectory at 125Hz. It works fine and the movement is smooth. Although There is also a time delay of 100-150ms, which is mentioned in Thomas Timm's report. Now I am using the UR5e robot. The controller parameters is not the Optimal.

gavanderhoorn · 2018-11-05T13:16:49Z

Closing this as #213 was merged.

@miguelprada: if you still see reason to keep this open please re-open it.

gavanderhoorn added the kinetic Issues with the refactor in Kinetic label Oct 5, 2018

miguelprada mentioned this issue Oct 16, 2018

Resolve premature controller initialization #213

Merged

miguelprada mentioned this issue Oct 23, 2018

kinetic-devel branch: erratic motion with ur3_ros_control launch file #220

Closed

miguelprada mentioned this issue Oct 25, 2018

URe Support #216

Closed

gavanderhoorn closed this as completed Nov 5, 2018

gavanderhoorn mentioned this issue Nov 5, 2018

Add timeout on receiving first robot state in ros_control codepaths #227

Open

miguelprada mentioned this issue Jul 3, 2019

Notify users when no packets are received #322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Premature controller initialization #206

Premature controller initialization #206

miguelprada commented Oct 3, 2018 •

edited

Loading

gavanderhoorn commented Oct 16, 2018

miguelprada commented Oct 16, 2018

v4hn commented Oct 22, 2018

miguelprada commented Oct 22, 2018

v4hn commented Oct 23, 2018 via email

miguelprada commented Oct 23, 2018

carlosjoserg commented Oct 23, 2018

miguelprada commented Oct 23, 2018

carlosjoserg commented Oct 23, 2018

miguelprada commented Oct 24, 2018

happygaoxiao commented Oct 30, 2018

gavanderhoorn commented Nov 5, 2018

Premature controller initialization #206

Premature controller initialization #206

Comments

miguelprada commented Oct 3, 2018 • edited Loading

gavanderhoorn commented Oct 16, 2018

miguelprada commented Oct 16, 2018

v4hn commented Oct 22, 2018

miguelprada commented Oct 22, 2018

v4hn commented Oct 23, 2018 via email

miguelprada commented Oct 23, 2018

carlosjoserg commented Oct 23, 2018

miguelprada commented Oct 23, 2018

carlosjoserg commented Oct 23, 2018

miguelprada commented Oct 24, 2018

happygaoxiao commented Oct 30, 2018

gavanderhoorn commented Nov 5, 2018

miguelprada commented Oct 3, 2018 •

edited

Loading