Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenMP to radius outlier removal #6182

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

larshg
Copy link
Contributor

@larshg larshg commented Nov 26, 2024

See #6180 for reference

@larshg
Copy link
Contributor Author

larshg commented Nov 26, 2024

@alexeygritsenko if you can test this on your point cloud it would be nice.

@larshg
Copy link
Contributor Author

larshg commented Nov 26, 2024

On the test clouds, with 387 points and about 300.000 points (measured together in the test) I get something like this:
threads time(ms)
1 84
2 55
3 46
4 41
5 46

It will also depend a lot on number of neighbors.

@alexeygritsenko
Copy link

alexeygritsenko commented Nov 26, 2024

@larshg, I use C++ at a primitive level (and I don't understand anything about compilation), unfortunately I won't be able to until it becomes available through the vcpkg update command. Anyway, thanks for such a prompt improvement!

@larshg larshg added module: filters changelog: enhancement Meta-information for changelog generation labels Nov 26, 2024
@mvieth
Copy link
Member

mvieth commented Nov 26, 2024

Adding multi-threading to the class is a nice idea. However I would not use omp critical since these mutex locks can hinder parallelization, especially with a higher number of threads.
Here is an alternative idea: set up a std::vector<std::uint8_t> to_keep(indices_->size()) which contains a 1 if the point should be kept (enough neighbors within radius), else a 0. Fill this in parallel (no mutex lock necessary). Then, in a second for loop, transfer the results to indices and removed_indices_ (not parallel this time).

@larshg
Copy link
Contributor Author

larshg commented Nov 26, 2024

Adding multi-threading to the class is a nice idea. However I would not use omp critical since these mutex locks can hinder parallelization, especially with a higher number of threads. Here is an alternative idea: set up a std::vector<std::uint8_t> to_keep(indices_->size()) which contains a 1 if the point should be kept (enough neighbors within radius), else a 0. Fill this in parallel (no mutex lock necessary). Then, in a second for loop, transfer the results to indices and removed_indices_ (not parallel this time).

Yes, I reasoned that as well. I'll try your suggestion 👍

@larshg larshg force-pushed the AddOpenMPRadiusOutlierRemoval branch from c3caed5 to 91ae53b Compare November 26, 2024 13:25
@larshg
Copy link
Contributor Author

larshg commented Nov 26, 2024

Looks promising - down to about 20ms now with 6+ threads. But doesn't seem to improve from there.

…removed in an array.

Do a second loop, to assign indexes to indices and removed_indices_.
@larshg larshg force-pushed the AddOpenMPRadiusOutlierRemoval branch from 91ae53b to fc3af6a Compare November 26, 2024 15:34
Copy link
Member

@mvieth mvieth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor things ...

Looks promising - down to about 20ms now with 6+ threads. But doesn't seem to improve from there.

I could imagine that at some point, memory access is the limiting factor instead of computing power. Either way, 20ms is really good.

filters/include/pcl/filters/radius_outlier_removal.h Outdated Show resolved Hide resolved
filters/include/pcl/filters/radius_outlier_removal.h Outdated Show resolved Hide resolved
applyFilter (indices);
pcl::copyPointCloud (*input_, indices, output);
output.is_dense = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to put this last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we copy from the original pointcloud, it also copy is_dense from the original cloud, which could be false, but after this filtering it should be true.
But I'm not sure if all filters should remove nan values if doesn't have to keep the output cloud organized.

Set those to remove to 0 in the to_keep array.
Fix indexing and minor things.
@larshg
Copy link
Contributor Author

larshg commented Nov 30, 2024

Hmm, isn't the negative_ and extract_removed_indices_ kinda the same function?
Maybe the copyPointCloud (with indices) needs to update the is_dense properly, instead of just copying from the source cloud?

@larshg
Copy link
Contributor Author

larshg commented Dec 2, 2024

Ah, its either:
Negative_ = false, extract_removed_indices_= false -> Get those who fulfil the condition(s).
Negative_ = true, extract_removed_indices_= false -> Get those who doesn't fulfil the condition(s).
Negative_ = false, extract_removed_indices_= true -> Get those who fulfil the condition(s). removed_indices_ -> Nan and those who doesn't fulfil condition(s)
Negative_ = true, extract_removed_indices_= true -> Get those who doesn't fulfil the condition(s). removed_indices_ -> Nan and those who fulfil condition(s)

@mvieth
Copy link
Member

mvieth commented Dec 2, 2024

Okay, so in other words: indices belonging to invalid points are always put in the removed_indices_ list (if the user asked for this list), not matter whether negative_ is true or false.

I find the logic in the final for-loop a bit too complicated, though. Currently, to_keep[i]==0 can mean to put the index into either removed_indices_ or indices, depending on the value of negative_. Similarly for to_keep[i]==1. Perhaps it would be clearer to just say to_keep[i]==0 means the index is put into removed_indices_ (if the user requested this list), and to_keep[i]==1 means the index is put into the indices list? And everything else would be handled in the parallel for-loop, including paying respect to the value of negative_. So for example, for an invalid point, to_keep[i]=0. If the point has enough neighbours and negative_==false, set to_keep[i]=1. If the point does not have enough neighbours and negative_==true, also set to_keep[i]=1. Otherwise to_keep[i]=0. Keeping the logic in the final (sequential) for-loop simple would also help parallelism.

@larshg
Copy link
Contributor Author

larshg commented Dec 3, 2024

Okay, so in other words: indices belonging to invalid points are always put in the removed_indices_ list (if the user asked for this list), not matter whether negative_ is true or false.

I find the logic in the final for-loop a bit too complicated, though. Currently, to_keep[i]==0 can mean to put the index into either removed_indices_ or indices, depending on the value of negative_. Similarly for to_keep[i]==1. Perhaps it would be clearer to just say to_keep[i]==0 means the index is put into removed_indices_ (if the user requested this list), and to_keep[i]==1 means the index is put into the indices list? And everything else would be handled in the parallel for-loop, including paying respect to the value of negative_. So for example, for an invalid point, to_keep[i]=0. If the point has enough neighbours and negative_==false, set to_keep[i]=1. If the point does not have enough neighbours and negative_==true, also set to_keep[i]=1. Otherwise to_keep[i]=0. Keeping the logic in the final (sequential) for-loop simple would also help parallelism.

I think I have managed to get what you suggested, though slightly different - getting inspired by the implementation in the passhrough filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Meta-information for changelog generation module: filters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants