-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch navigation performs poorly in Melodic simulation #36
Comments
Thanks, we'll take a look. FYI: |
@nickswalker sorry for the delay everyone I originally tagged has been busy. I've just spoken with @safrimus and he'll investigate. |
@nickswalker to clarify this only happens with melodic? and Gazebo 9? |
Yes, I have only observed this happening in Melodic with Gazebo 9. |
To me, it looks like its related to localization. Usually when you first
localize robot, its particle cloud is pretty spread out the estimated robot
position jumps around
at bit while the robot figures out where it is.
However, usually the particle cloud converges to the correct position, and
this jumpiness stops happening. Take a look at the AMCL particle cloud
output in RViz (PoseArray type, can't remember what topic is)
…-Derek
On Wed, Feb 13, 2019 at 11:15 AM Nick Walker ***@***.***> wrote:
Yes, I have only observed this happening in Melodic with Gazebo 9.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/fetchrobotics/fetch_ros/issues/102#issuecomment-463328878>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACS-71lGbQ5oObCJCXbElpyz8--x67j5ks5vNGQ6gaJpZM4aB56a>
.
--
*Derek King | **Systems Engineer *
*Fetch Robotics, Inc.*
2811 Orchard Parkway
<https://maps.google.com/?q=2811+Orchard+Parkway+San+Jose,+CA+95134&entry=gmail&source=g>
San Jose, CA 95134
<https://maps.google.com/?q=2811+Orchard+Parkway+San+Jose,+CA+95134&entry=gmail&source=g>
dking@fetchrobotics.com <dking@fetchrobotics.com>
|
Here are some clips with AMCL and the localization transforms visualized: https://www.youtube.com/watch?v=uNb0pJbObHA https://www.youtube.com/watch?v=sk4ANbCywUk I pulled in all the recent Fetch changes and bumped to the latest Melodic sync. It doesn't seem like the AMCL config, the Fetch Gazebo model, or any other component that might obviously cause localization to drift so quickly was changed between the indigo and melodic releases. But it's eminently reproduceable for me. I have a couple machines now where I can start a fresh workspace, clone everything, run the launch files and observe this behavior. Let me know if bags would help. |
@nickswalker thanks, @safrimus was also able to reproduce immediately in the simulator following your steps in the original issue. I haven't seen this on the actual hardware, can you confirm that navigation is working on your fetch running melodic? |
Yes, navigation has been working fine on the real robot |
This issue was previously: https://github.com/fetchrobotics/fetch_ros/issues/102 |
@nickswalker can you test this again? And should we close this ticket as a duplicate of 30 ? I tagged and released 0.9.0 of this package for Melodic, it was "good enough" but still not perfect, but we needed at least one released version into Melodic in order to setup the ros-pull-request-build jobs on the build farm. |
@nickswalker I'll add More Info Needed, and Help Wanted to this ticket. More Info Needed: because I'd like to know how it's performing now. |
@moriarty We also have this issue on Ubuntu 18.04/Gazebo 9. I pulled latest master, which is the same as 0.9.0. Here is the video: https://youtu.be/lLUQtOjqFnM. After I recorded this issue, it takes about 15 second to move to the last goal I set. To reproduce:
We also tested in 14.04 and Gazebo 2, and it works very well. |
I was able to reproduce this issue using the code in #101 and the same steps as before. I don't think the problem is the inflation radius. Something about the simulation is going wrong causing drift during rotation. Given this, no amount of tuning navigation parameters is going make it localize well enough to go through doors. |
@nickswalker check #101 not for the code but for the comment from @mikeferguson
|
ZebraDevs/fetch_ros@09db2ce file are likely causing the difference :( unfortunately the |
@@ -143,8 +144,8 @@ void FetchDepthLayer::onInitialize()
camera_info_topic, 10, &FetchDepthLayer::cameraInfoCallback, this);
depth_image_sub_.reset(new message_filters::Subscriber<sensor_msgs::Image>(private_nh, camera_depth_topic, 10));
- depth_image_filter_ = boost::shared_ptr< tf::MessageFilter<sensor_msgs::Image> >(
- new tf::MessageFilter<sensor_msgs::Image>(*depth_image_sub_, *tf_, global_frame_, 10));
+ depth_image_filter_ = boost::shared_ptr< tf2_ros::MessageFilter<sensor_msgs::Image> >(
+ new tf2_ros::MessageFilter<sensor_msgs::Image>(*depth_image_sub_, *tf_, global_frame_, 10, private_nh));
depth_image_filter_->registerCallback(boost::bind(&FetchDepthLayer::depthImageCallback, this, _1));
observation_subscribers_.push_back(depth_image_sub_);
observation_notifiers_.push_back(depth_image_filter_);
@@ -275,16 +276,26 @@ void FetchDepthLayer::depthImageCallback(
{
// find ground plane in camera coordinates using tf
// transform normal axis
- tf::Stamped<tf::Vector3> vector(tf::Vector3(0, 0, 1), ros::Time(0), "base_link");
- tf_->transformVector(msg->header.frame_id, vector, vector);
- ground_plane[0] = vector.getX();
- ground_plane[1] = vector.getY();
- ground_plane[2] = vector.getZ();
+ geometry_msgs::Vector3Stamped vector;
+ vector.vector.x = 0;
+ vector.vector.y = 0;
+ vector.vector.z = 1;
+ vector.header.frame_id = "base_link";
+ vector.header.stamp = ros::Time();
+ tf_->transform(vector, vector, msg->header.frame_id);
+ ground_plane[0] = vector.vector.x;
+ ground_plane[1] = vector.vector.y;
+ ground_plane[2] = vector.vector.z;
// find offset
- tf::StampedTransform transform;
- tf_->lookupTransform("base_link", msg->header.frame_id, ros::Time(0), transform);
- ground_plane[3] = transform.getOrigin().getZ();
+ geometry_msgs::TransformStamped transform;
+ try {
+ transform = tf_->lookupTransform("base_link", msg->header.frame_id, msg->header.stamp);
+ ground_plane[3] = transform.transform.translation.z;
+ } catch (tf2::TransformException){
+ ROS_WARN("Failed to lookup transform!");
+ return;
+ }
}
// check that ground plane actually exists, so it doesn't count as marking observations |
I confirmed that doing a release build had no impact. I looked at reverting FetchDepthLayer to tf but stopped when I realized it would've required also changing the upstream DepthLayer code back as well. I tried bypassing localization using The local cost map still streaks on rotation, so it definitely seems related to the depth layer somehow not catching the correct transform. As soon as the robot starts rotating, the extra noise in the costmap makes it impossible to navigate through doorways. |
@nickswalker - did you ever resolve this? I am still seeing it on the latest release. I'd be interested in knowing if you root caused this or had other updates? |
No resolution and no updates from the previous comment |
OK, thanks for the update, I'm looking into it |
So I see the same issue when using fake_localization instead of AMCL, and it appears the "odom->base_link" TF is moving around quite a bit. So I suspect it's either a problem with the libfetch plugin or the friction of the wheels. The wheel friction was increased by #59, did you ever see the problem before then? I can try reverting that change to see if it makes a difference. |
This is still an issue. Ubuntu 18.04.5, all of my fetch and ros packages are up to date. The odom transform actually reaches points where it is so far off that it's off the map. So something is wrong with the odometry. |
I'm not sure what the root cause of this is yet, it may have more than one root cause. However, here's what I think. I see that using 'fake_localization' I still have this problem, so I don't think that the odometry, wheel friction, or localization are the cause, although it is strange how much the odom transform drifts. When using the fake_localization however, the odom drift shouldn't matter, which is why I don't think that's the problem. I'm more concerned with the local_costmap, which seems to be getting cleared incorrectly. Maybe @mikeferguson, @DLu, @SteveMacenski or someone with a deeper knowledge of the costmap clearing can take a look at that. If you see my screenshot above, you'll see that as the robot rotates, it seems to cause the costmap to 'smear' previous and current observations. I think that is causing the local planner to get "trapped" and unable to find a path forward. I observe that sometimes after the "clear costmap" recovery, it's able to move again, but not every time, as the doorways are also very narrow compared to the inflation radius of 0.7m. So, I have experimented with a few parameters changes and have a few that seem to at least work-around this issue. With these changes I can navigate room to room mostly fine, occasionally getting stuck temporarily before proceeding. Not perfect, but much better (at least for me). In the
Also, in the I started digging into the local_costmap clearing code, but didn't see anything that seemed to be causing the problem. I might look at this some more but wanted to pass along my learnings so far to see if others have ideas / suggestions etc. |
So I also tried switching out the Fetch depth layer for the standard navigation obstacle layer, and I don't see any noticeable improvement. I also tried changing the amcl alpha1 param to 0.5 per this comment: #101 (comment) from @mikeferguson and don't see much difference there either. I see I can navigate pretty well between the two tables, but navigating into the empty room is sometimes unsuccessful. The robot gets stuck in the doorway often. One thing I may try, per the comment mentioned above, is changing to the DWA planner to see if that improves things. But right now I'm guessing a little bit, which isn't a good debug strategy. If anyone else has time to look into this and has ideas what could be wrong I'm open to collaborating. |
I also forgot to mention, I have also tried changing the conservative and aggressive reset distances = 0.0 to clear the local costmaps as cleanly as possible. |
I also tried running on a Ubuntu 16 / Kinetic system to see how well that works, and hopefully use git bisect to get to the changes that broke this, but I can't get that to run at all. If I run the simulation using the playground.launch, then I start the navigation with the fetch_nav.launch, Gazebo crashes:
Does anyone else see this issue using Kinetic? If anyone has a 'working' version with Kinetic, can you post a video of the Rviz view with the map, laserscan, robot and local costmap? I'd like to see this working as a point of comparison against the current behavior. |
I was using that Dockerfile to quickly switch version... but it’s out of date, the OSRF base images have changed locations, and the Nvidia docker stuff is different/no longer required... but as I recall it was possible to see this stop working when switching back and forth |
@moriarty - thanks for the reply. I just now was able to get this same thing running on a Ubuntu 16 system. Turned out the problem above was a Gazebo 7.0.0 bug that was later fixed. I upgraded to 7.16.1 (the latest) and that fixed it. However, I still see the same problems in Ubuntu 16 using the 'apt' released fetch packages. Here's a screenshot where the robot is stuck trying to get through the door to table 2. |
@moriarty or anyone really, can someone point me to a version that worked, preferably a release tag (like 0.7.0)? I'm now able to build and test on a Ubuntu 16 system, but some dependencies have since been upgraded so I'm not sure how far back I can go. |
@mkhansenbot if you don’t mind, navigation works in Ubuntu 14.04, which is end of life and may have security issues. Fetch Robotics should really try to solve this issue, but the research platform is of low priority from what I can tell. |
Unfortunately all my systems are 18.04 so I don't know off the top of my head if there's a version of Ubuntu 16 + Gazebo 7 that doesn't have this issue. Like @umhan35 said, I believe it does work on 14.04 but that might be too far back to easily compare changes. |
FWIW, I tried it out and as far as I can tell, its something wonky with the odometry/localization, not the costmaps/recovery behaviors, with a very small chance of it being the local planner. Tested with Melodic/18.04/Gazebo 9. I also tried |
16.04 & Kinetic was skipped on Fetch Hardware, I only released Kinetic quietly after releasing 18.04 & Melodic... because of many requests from users who wanted it. |
Thanks everyone for the replies. I can confirm that it doesn't work on the released binaries for Ubuntu 16. I haven't tested on a Ubuntu 14 system, would have to pull a docker image and install ROS on it if that's even possible anymore, not sure if the apt package servers are even alive anymore. |
@mkhansenbot I just tested this on an Indigo docker.
Start a docker (my insanely overkill command is probably not necessary):
I also do the totally unsafe thing as described in http://wiki.ros.org/docker/Tutorials/GUI#The_simple_way:
Install fetch packages in the docker (had to add the keys and stuff as described here):
Then launch playground and fetch_nav:
Then I run rviz outside the docker. 1404_nav.mp4 |
@velveteenrobot - thanks Sarah I'll try that too! |
Update - I was able to get the Ubuntu 14 / Indigo container running with simulation and it does work better (not perfect but noticeably better). The package versions being used are fetch_navigation: 0.7.15, fetch_gazebo: 0.7.3, robot_controllers: 0.5.4, control_toolbox: 1.13.3 On Ubuntu 16, when the robot is failing the versions are: fetch_navigation: 0.7.15, fetch_gazebo: 0.8.2, robot_controllers: 0.5.2, control_toolbox: 1.17.0 Based on that I'm able to find a version that works on Ubuntu 14 but fails on Ubuntu 16: fetch_navigation: 0.7.15, fetch_gazebo: 0.8.2, robot_controllers: 0.5.4, control_toolbox: 1.13.3 So, I don't think the problem is any change that has occurred in any of those packages, which means some dependency change such as gazebo_plugins or the gazebo physics changed between Gazebo 5 / Indigo and Gazebo 7 / Kinetic. So many other things changed between those versions it's hard to know where to look next, I'm open to suggestions. |
I'd be curious to see what happens if you play the same sequence of velocity commands in each and see what the resulting odometry looks like. |
I haven't done that but I did use |
Steps
With up to date versions of fetch_ros and fetch_gazebo
And
Behavior
When given a nav goal, the robot's localization drifts quickly (seems like it happens during rotation). The robot is never able to reach the goal.
https://youtu.be/w1y0b5aI3o8
Nothing jumps out from the standard
move_base
configurations so I'm not sure what's going on.The text was updated successfully, but these errors were encountered: