Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS1 lidar outputs a single points packet instead of full PointCloud2 #63

Closed
GuillaumeHauss opened this issue Dec 3, 2020 · 15 comments
Closed
Labels
question Further information is requested

Comments

@GuillaumeHauss
Copy link

Hello,
My team and I have been using 3 OS1 lidars without much issues in ROS1, until we recently decided to migrate to ROS2. We are using ROS2 Eloquent, on Ubuntu 18.04, with the default DDS implementation (Fast RTPS).
When we launch the ouster node, it publishes the full point cloud for a few seconds, but afterwards it publishes what seems to be only a part of the full point cloud at a time. We suspect it to be publishing only a single packet (about 300 points), but we are not sure about this. We tried using CycloneDDS, but we have the same results.
We used ros2 topic hz to get the publish rate, it was between 180 Hz and 400 Hz, when it should be 10 Hz.

Here is a gif with what we see on Rviz :
VID_20201203_113527

@SteveMacenski
Copy link
Member

Have you tried first with a single sensor?

@GuillaumeHauss
Copy link
Author

Hi Steve,
First, thank you for your quick response!
Yes, we did. We tried all of them one by one and the result is the same. The first few seconds seem fine (full point cloud) and then it snaps back to this flickering point cloud. We also tested both QoS profiles (system default and sensor) and they both give the same result.
Finally, we see this issue on your repo, and tried the same fix, without much more success.
Do you have other tests in mind that we can perform to narrow down the plausible cause?

@SteveMacenski
Copy link
Member

Not off hand, while I'm a user and supporter of the sensors, by no means am I an expert in their firmware / failure modes. I just tested on my sensor a few minutes ago without any issues. Mhm.

Any other information I should know? Non-default DDS configs, firmware versions on the sensors, or playing with source code?

I agree that since you see a lot of messages with little information, that the buffering doesn't seem to be working quite right.

@GuillaumeHauss
Copy link
Author

Here are the news:

  • We tested the sensor with the same driver, same branch, on a new PC. And it works fine. The pointcloud is published completely at 10Hz --> This means that the issue lies with the configuration of the ethernet card on the prototype computer.
  • We compared both configurations (prototype and laptop) : MTU (1500 bytes), speed (1 Gbit/s), clock frequency (33MHz), ... and there are identical.
  • We noticed there was an option to activate/deactitvate fragmentation the UDP frames. However, this option is deactivated on the laptop where the pointcloud is fine. We'll check today if it's also deactivated on the prototype computer

What we plan to do is:

  • Check the size of the buffer the ros driver is receiving, in order to know if the OS firmware is outputting the correct data
  • If there is an issue there, we'll try to flash the new firmware (Rev C, v2.0 released on Dec 1st) as we still have the v1.13 on the sensor.
  • try to find new ideas...

As per your questions, we didn't mess with DDS-configs as we kept the default settings. And we didn't change the source code either.
We'll keep you posted
Cheers

@SteveMacenski
Copy link
Member

This means that the issue lies with the configuration of the ethernet card on the prototype computer.

Exactly what I like to hear: not Steve's fault 😉

@SteveMacenski SteveMacenski added the question Further information is requested label Dec 8, 2020
@GuillaumeHauss
Copy link
Author

Hello Steve,

We did some tests to try to figure out what's happening, here are the results :

OS type ROS version Firmware version Result
OS1-32 1 2.0 Works as intended
OS1-32 1 1.13 Works as intended
OS1-32 2 2.0 Driver crashes
OS1-32 2 1.13 Works as intended when following the ouster tutorial for the dhcp server setup.
OS1-64 1 2.0 Works as intended
OS1-64 1 1.13 Works as intended
OS1-64 2 2.0 Flashing bug (the same as we've described it in the past repliesm
OS1-64 2 1.13 Cannot test without rolling back the firmware version, but was not working last time with our network configuration

As you can see we tried multiple things:

  • Upgrading the sensor firmware: this turned out to be quite problematic, since your ros2 driver doesn't seem to support the new firmware. We can't seem to rollback to the previous driver version for now (we don't have an image of the 1.13 version), so we coud only do tests for the OS-32 lidars (we have 3 of those, one is still on 1.13). The main issue we noticed is that the packet size for lidar data changed for OS-32 lidars in the new firmware, so that has to be taken into account in the driver if it's going to support this new version.
  • We used Wireshark to check what is going on with the packets, and basically whether the ros2 driver works as intended or not. Wireshark can see the packets being fragmented and reassembled correctly all the time (even when the driver crashes).
  • We undid the network configuration we had (based on /etc/network/interfaces and isc-dhcp-server) and used what is described in the ouster tutorial, and the OS-32 on the 1.13 firmware works! So maybe the initial problem was due to the network config
  • The OS-64 still has the same bug, we're unsure if it's due to do the new firmware (for this lidar the packet size did not change) or the network config

At any rate, our flashing bug was happening already before we upgraded to the new firmware, so we're not entirely sure if it happened due to our network configuration, the firmware, or the driver (or all 3 of them). For the time being, we're going to keep using them with ROS1 and use ros1_bridge to relay the topics to ROS2. We'll try contacting ouster to downgrade the firmware and test on the OS-64 if it works as intended if we follow the tutorial for the network config.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 9, 2020

My setup is a 64 layer on ROS2, so I think that last entry is "works". I'm not sure what my exact version is at the moment, but 1.3 sounds about right from memory (definitely not 2.0).

So it seems like there are 3 issues you're running into:

  • For the issue that you've reported above, that's probably to do with your networking configs that needed to be changed to work properly. While this could also be caused by firmware mismatches, I think your experiments show that is not the case. This is an area that I can't help you with since its particular to your own needs and setup, but suffice it to say that it's definitely solvable. You don't have to use my instructions for networking setup, those are just the "getting started" instructions from Ouster that I translated into something more human readable and tutorial-like.
  • Issue that this driver does not support 2.0 version of the firmware. I was chatting with folks from Ouster in a couple of channels and they seem to think that it should work OK with the existing client (though from your experience and the sheer amount changed, I begin to suspect if that's actually the case)
  • Issue with 32 layer lidar packets, which is probably also just another manifestation of the 2.0 version incompatibility, though we don't know for sure, I suspect that's the case since it would be awfully weird for there to be 2 ghosts in the machine.

The new firmware also was only released just days ago, sorry that this hasn't yet been updated to support (even the ROS 1 drivers were only updated to enable this on Monday). I have not had the opportunity to address these changes and to be honest with year-end closing in, I'm not sure when I will have that time either. I'd appreciate the help if you're able to help fix this issue.

The os1.hpp OS1_packet.hpp and OS1_util.hpp files are near-exact carbon copies of those in the ouster_client repo that we pulled in. We made extremely surface level changes in order to enable the multiple time source features in the readme.

https://github.com/ros-drivers/ros2_ouster_drivers/pull/36/files This PR is the only PR I can find that made changes from the default client. That is supported by looking at the blames for https://github.com/ros-drivers/ros2_ouster_drivers/blame/foxy-devel/ros2_ouster/include/ros2_ouster/OS1/OS1.hpp and https://github.com/ros-drivers/ros2_ouster_drivers/blame/foxy-devel/ros2_ouster/include/ros2_ouster/OS1/OS1_util.hpp which are the only 2 that have changed since my first commit client merge (minus some reformatting from linters, which weren't functional code changes).

So it leads me to believe that as long as the general API of the client object is the same or similar, you may be able to just remove those 3 files, copy over the new ones, and check if there's any changes in the client (inside os1.hpp) functions we use in os1_sensor.hpp/.cpp to make those small updates. That should get you something that works. After, then its just a matter of translating those updates from that 1 PR to the new files (if necessary, maybe they implement that for us in 2.0) and presto! Even if you just get the copied files over and got it working, that would be a big step towards V2 support. The only hiccup I think you might run into is the lidar / imu packet sizes. In the os1 sensor we set a packet of size size to capture incoming data. If that size has changed or is dynamic, we may need to adjust that variable's size.

My initial plan when I heard about the v2 was to carefully cherry pick changes, but they lumped all the changes into a single commit that's impossible to merge in & changes so much in the ouster_client that its not just a couple of tweaks.

@anderwm
Copy link

anderwm commented Jan 14, 2021

The files mentioned above (os1.hpp, OS1_packet.hpp, OS1_util.hpp) don't exist in their latest code. I think the code from OS1_packet.hpp winds up in parsing.h in their new code.

@SteveMacenski
Copy link
Member

Yes, you'd remove those files and replace them with the new ones

@anderwm
Copy link

anderwm commented Jan 15, 2021

OK, so I think that entails:

  1. Removing OS1.hpp, OS1_packet.hpp, OS1_util.hpp from includes here
  2. Adding all of the new header files from ouster
  3. Finding all the places where those things were used in your code and changing the #include or the interface to the new style

3 being the long pole in that tent...sound about right?

@SteveMacenski
Copy link
Member

Yes, that would be correct. There's a small number of other edits that would have to be made to the new headers to enable the different time modes unique to this driver (that improves performance for a number of common uses) that I link to in a response above with git blame.

That's kind of the reason I made the os1_sensor abstract class, so as these APIs change or new Ouster sensors are made, the driver doesn't need to change, just the implementations of configure and such.

@RFRIEDM-Trimble
Copy link
Contributor

RFRIEDM-Trimble commented May 13, 2021

I see this issue too. OS0-128 at a few different settings of lidar_mode, ROS2, and FW2.0.0. I'm going to dig into it a bit.
With proc_mask of PCL|IMU, I see 7.5Hz.

@RFRIEDM-Trimble
Copy link
Contributor

It goes away running in release mode.

@SteveMacenski
Copy link
Member

@RFRIEDM-Trimble A PR to just make this always build in release mode would be appreciated. That way no one runs into this in the future.

@grischi
Copy link

grischi commented Jun 10, 2021

Side-note: the crucial thing seems to be compiler optimization.

I observed the same problem like @GuillaumeHauss and was able to solve it on linux/gcc using the -O3 compiler optimization level, which is selected implicitly when choosing CMake build type Release. Still I agree that setting the default CMake build type is preferable as it is compiler-agnostic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants