Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[io] Add option to strip padding bytes while dumping and add padding while reading PCLPointCLoud2 #1566

Open
stefanbuettner opened this issue Mar 8, 2016 · 5 comments
Labels

Comments

@stefanbuettner
Copy link
Contributor

Hey there,

during investigating the test failures of Set default alpha value 255 I was mainly wondering why there is so much code which appears to be duplicate and stumbled over the following:
Point clouds saved and loaded with different functions lead to different internal representations.

If creating a PointCloud<PointT> cloud, converting it to a PCLPointCloud2 object, saving that to a file, loading it back into another object, then the data vector of the two don't have the same length.
Code for reproducing this behavior can be found here.
Actually one of them loads the data with and the other without padding bytes (like the w value of the position). This makes sense to me in order to save disk space, but after loading the cloud again the data should be equal to the original, shouldn't it?
So I don't think this behavior is intended.

Also I have the impression that there is some duplicate code.
For example there are 4 implementations of generateHeader which basically all do the same and could be merged into one.
There are also 8 versions of writing pcd data compared to effectivly 1 read method (read(PCLPointCloud2)). The read method reads a PCLPointCloud2 and the others just use that along with from PCLPointCloud2 to read the templated types. Of course this read function contains reading ASCII, binary and compressed binary files which is split for the writing methods.
in addition there are methods for saving subsets of PointCloud<PointT> types but none PCLPointCloud2 types. Probably it doesn't make much sense for the latter ones. But if they existed one could reduce the implementation to

  • writeASCII(PCLPointCloud2, vector<int>[, bool useIndices])
  • writeBinary(PCLPointCloud2, vector<int>[, bool useIndices])
  • writeBinaryCompressed(PCLPointCloud2, vector<int>[, bool useIndices])

What do you think?

Cheers,
Stefan

@SergioRAgostinho
Copy link
Member

Started to dig into this

If creating a PointCloud cloud, converting it to a PCLPointCloud2 object, saving that to a file, loading it back into another object, then the data vector of the two don't have the same length.
Code for reproducing this behavior can be found here.
Actually one of them loads the data with and the other without padding bytes (like the w value of the position). This makes sense to me in order to save disk space, but after loading the cloud again the data should be equal to the original, shouldn't it?
So I don't think this behavior is intended.

I could confirm everything you mentioned and the different data size is being caused by the different padding/alignment byte(s) that are inserted in each PointT fields. When you convert from PointCloud<PointT> to PCLPointCloud2 those extra bytes are kept because it allows to simply copy data using a single memcpy instruction, which is fast. Nevertheless, for the PointXYZI we do this at the expense of having data of twice the size that is actually needed. These extra bytes are ignored when saving to a PCD file except when you ask to save a PCLPointCloud2 to a binary file.
Upon reading a PCD file to a PCLPointCloud2, these extra byte paddings are not added which leads to the size mismatch.

In this situation I think that best solution might arguably be to introduce the padding upon reading the PCD files.

@jefft255
Copy link

jefft255 commented Aug 8, 2018

Hi,

I'm commenting on the quite old issue because it think it highlights another problem: the padding bytes are saved when saving a PCLPointCloud2 as binary. I noticed the problem while using CloudCompare. I get that it is a lot faster and simpler to dump the binary data array in the file when saving but with big point clouds this can lead to a significant waste of disk space. For example, one of my point cloud goes from 400MB to 800MB after opening it and saving it back from PCLPointCloud2.

@SergioRAgostinho
Copy link
Member

@taketwo What's your feeling on providing an extra parameter during saving operations to allow the user to decide on the trade-off speed vs file size during a save procedure?

@SergioRAgostinho SergioRAgostinho added module: io changelog: enhancement Meta-information for changelog generation labels Aug 31, 2018
@taketwo
Copy link
Member

taketwo commented Aug 31, 2018

Actually I'm quite skeptical about speed benefits. Disk I/O is a true bottleneck and I doubt that single memcpy vs. multiple will change the total saving time in any noticeable way. Thus I'd go all-in for stripping padding bytes. (But of course I'm ready to abolish my skepticism if shown otherwise.)

@taketwo taketwo removed the changelog: enhancement Meta-information for changelog generation label Feb 22, 2020
@stale
Copy link

stale bot commented May 19, 2020

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

@stale stale bot added the status: stale label May 19, 2020
@kunaltyagi kunaltyagi added the kind: todo Type of issue label May 20, 2020
@stale stale bot removed the status: stale label May 20, 2020
@kunaltyagi kunaltyagi changed the title PCD IO [io] Add option to strip padding bytes while dumping and add padding while reading PCLPointCLoud2 May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants