Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Premature controller initialization #206

Closed
miguelprada opened this issue Oct 3, 2018 · 12 comments
Closed

Premature controller initialization #206

miguelprada opened this issue Oct 3, 2018 · 12 comments
Labels
kinetic Issues with the refactor in Kinetic

Comments

@miguelprada
Copy link
Contributor

miguelprada commented Oct 3, 2018

This is a follow up to zagitta/ur_modern_driver#29.

There is a potential race condition in the current driver where sometimes the ros_control controllers are initialized before the joint handles contain proper information about the current robot position. This in turn can (depending on the specific controller) produce very fast and unexpected motions towards whichever garbage is found in the uninitialized handle's position memory locations. This usually happens when the controllers are spawned immediately after starting the driver, which is the default behavior when using the provided urX_ros_control.launch launchfiles.

I think I tracked down the problem to this line in Zagitta#6. Calling ROSController::update on consumer timeouts in turn calls ControllerManager::update, which potentially starts controllers if requested to do so, even if no RT packets from the robot have been yet received. I'd like to do more testing to see whether this has other side effects, but I can confirm that removing this controller update removes this issue for me.

According to the commit message by @v4hn, The controllers need to update at a rate of at least 125Hz, but there's no further evidence for this affirmation. I'm not sure I agree with it and would suggest removing that call altogether, but I will gladly stand corrected.

Thoughts?

@gavanderhoorn gavanderhoorn added the kinetic Issues with the refactor in Kinetic label Oct 5, 2018
@gavanderhoorn
Copy link
Member

Hi @miguelprada. We're not ignoring you, but I can't find some time to reproduce this.

What you write makes a lot of sense. Could you perhaps submit a PR with your suggested change as that would facilitate testing?

@miguelprada
Copy link
Contributor Author

I can quite reliably reproduce the described behavior by just launching the ur10_ros_control.launch in the kinetic-devel branch.

The effect in most trials is that the robot will start moving towards some random position contained in the uninitialized joint handles. To avoid jump scares, I recommend that before starting the above launchfile, the robot's velocity is limited using the speed slider in PolyScope's Move tab to somewhere below 50%.

One quick and (very) dirty way to confirm what's going on is to add joint_trajectory_controller to the test setup and introduce some logging statements here to print out the values used to initialize the joint positions.

Conversely, with the changes in #213 I can safely launch ur10_ros_control.launch.

@v4hn
Copy link

v4hn commented Oct 22, 2018

According to the commit message by v4hn, The controllers need to update at a rate of at least 125Hz, but there's no further evidence for this affirmation. I'm not sure I agree with it and would suggest removing that call altogether, but I will gladly stand corrected.

I am not 100% sure about the current situation, but when we tested the ros_control integration with this driver we needed the 125Hz updates, because the URScript would stop the arm abruptly if it did not receive a new message in that cycle. do_brake = True

I did not verify whether this is still the case, but I guess it is.

I believe the better way to resolve your misbehavior is to wait until the first data is available before controllers can be loaded.

@miguelprada
Copy link
Contributor Author

I am not 100% sure about the current situation, but when we tested the ros_control integration with this driver we needed the 125Hz updates, because the URScript would stop the arm abruptly if it did not receive a new message in that cycle. do_brake = True

I see. It's not so much that the controllers need to be updated at that rate, but rather that the URScript running in the robot when using the position interface needs to receive commands at least at that rate. I failed to see this since I haven't used the position interface in a while (I even thought it was still based on streaming servoj commands, as this driver did before the refactor). I will certainly do some tests with a position interface-based controller and report back.

I believe the better way to resolve your misbehavior is to wait until the first data is available before controllers can be loaded.

This is already achieved with the change proposed in #213, since the controller manager is only updated when receiving RT packets. However, it is very likely it makes position interface-based controllers misbehave (again).

In any case, and without having had a thorough look at how this works, I would find it more reasonable to relax the requirement in URScript to have a new command every 8ms rather than artificially updating the controllers with stale state. Do you remember having tried this?

@v4hn
Copy link

v4hn commented Oct 23, 2018 via email

@miguelprada
Copy link
Contributor Author

Some non exhaustive tests with nothing but this driver running a position_interface/JointTrajectoryController show me that do_brake is never set to true. However, when I artificially stress the system I can reproduce the abrupt stops.

I just updated #213 with an alternative solution, which I believe should avoid this issue while keeping the current behavior on pipeline timeouts. Could you please have a look @v4hn?

@carlosjoserg
Copy link

I've been following the issue, as I believe I experienced the same thing about a year ago using the master branch version. Don't know if anyone had it before, but I can assure having an UR10 jumping all of a sudden to all joints to zero from anywhere, in the middle of a factory installation, is more than scary, it is unforgettable. And that matches the description given in https://github.com/Zagitta/ur_modern_driver/issues/29 and one of the videos in #220.. don't remember the CBx box version though, but it happened when the ros-driver and the CBx box were started near in time.

However, I've handled it differently from #213, and without touching the ur driver code base. But I'm still understading what's on the kinetic-devel branch to see if what I did can help here or it's already done and conclude this is different.

So, couple of questions:

  • Does the received packet has already good values for the joints at this point?
    For instance, in the master branch version there is no guarantee of that at starting. In there, the driver starts as long as there is a socket connection on both secondary and real-time ports, and able to unpack a message to get a valid firmware from the secondary port, without knowing if the CBx box controller is ready to provide the real state of the arm (we are talking very small times here).

  • @miguelprada When you say you printed the initialized values of the controller here, they are zero or non-sense values?

@miguelprada
Copy link
Contributor Author

So, couple of questions:

  • Does the received packet has already good values for the joints at this point?
    For instance, in the master branch version there is no guarantee of that at starting. In there, the driver starts as long as there is a socket connection on both secondary and real-time ports, and able to unpack a message to get a valid firmware from the secondary port, without knowing if the CBx box controller is ready to provide the real state of the arm (we are talking very small times here).

  • @miguelprada When you say you printed the initialized values of the controller here, they are zero or non-sense values?

In the current kinetic-devel, one of three situations might happen:

  • Non-sense values occur when a timeout in the RT pipeline ends up calling ROSController::update before a packet has been received and consumed with ROSController::read. I believe this is a race condition specific to the refactored code in kinetic-devel, and is the situation that Resolve premature controller initialization #213 hopefully addresses.
  • Zero values can occur when a RT packet has been actually received, containing all zeros. I believe this happens when the CB has been powered on but the arm servos have not been yet powered on. I think this used to be a problem with the version in master if you had your ros controllers started at this time (e.g. when launching the default ros_control launchfiles before having, at least once, powered on the arm servos). I probably need to triple check this, but I believe the refactor solves this particular case by resetting the controllers (i.e. !service_enabled) while the robot isn't enabled.
  • The correct values are received.

It seems to me that you may be referring to the second of those situations, @carlosjoserg.

And yes, it's really scary.

@carlosjoserg
Copy link

Great summary!

The first item seems to be only on kinetic-devel. That race condition doesn't look good though, it reminds me of why the KUKA Fast Research Interaface WaitForKRCTick() method was key to keep all in sync. Perhaps something like that could be added? Because #213 only cares about initialization, but still doesn't prevent that, later on, more than one ROSController::update be called in between two ROSController::read calls, if I understood correctly, right?

Regarding the second item, it might be that it's already tackled since the robot state and controller state can be read. My two cents here anyway.. What I did was to implement a dashboard client mainly to perform the CB initialization sequence (power up, brake release, changing user roles to avoid someone messing with the move tab, etc.) so the the arm is ready before starting the ros-control loop. I gave reasonable times for the CB initialization to happen. My poor-but-effective implementation is here in our fork in case of interest of doing it better. Since then, I never touch the teach pendant again (only to give electricity), and neither have seen any initialization jumps (so far) in our applications.


For tracking purposes, adding a dashboard client has already been mentioned time ago in #5 for safety reasons and recently in #165, and also considered for #99

@miguelprada
Copy link
Contributor Author

Because #213 only cares about initialization, but still doesn't prevent that, later on, more than one ROSController::update be called in between two ROSController::read calls, if I understood correctly, right?

That's by design, to force controller updates at at least 125Hz and avoid hitting this line in the URScript running on the CB when using the position interface.

I'm not really onboard with this decission, but it only causes the controllers to be updated with old state. The issue discussed here causes the controllers to be initialized and updated with uninitialized memory as their input, which is way worse and should be fixed ASAP in my opinion.

@happygaoxiao
Copy link

@miguelprada Hello, I have used this driver in UR5 and UR3 with urX_ros_control.launch and controlled movement with the client of pos_based_pos_traj_controller/follow_joint_trajectory at 125Hz. It works fine and the movement is smooth. Although There is also a time delay of 100-150ms, which is mentioned in Thomas Timm's report. Now I am using the UR5e robot. The controller parameters is not the Optimal.

@gavanderhoorn
Copy link
Member

Closing this as #213 was merged.

@miguelprada: if you still see reason to keep this open please re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kinetic Issues with the refactor in Kinetic
Projects
None yet
Development

No branches or pull requests

5 participants