Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Directed Polygon Point Distance #251

Closed

Conversation

cwharris
Copy link
Contributor

@cwharris cwharris commented Jul 7, 2020

Contributes to #231. Depends on #250.

@cwharris cwharris requested review from a team as code owners July 7, 2020 00:50
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@cwharris cwharris requested a review from a team as a code owner July 21, 2020 02:03
@cwharris cwharris force-pushed the cusp-231-polygon-separation branch from 36607c1 to 0679517 Compare July 21, 2020 02:10
@cwharris cwharris changed the title [WIP] Euclidean Distance [WIP] Directed Polygon Point Distance Jul 21, 2020
@cwharris cwharris force-pushed the cusp-231-polygon-separation branch from 923c6cd to 8258698 Compare July 21, 2020 16:07
@cwharris cwharris changed the title [WIP] Directed Polygon Point Distance [REVIEW] Directed Polygon Point Distance Jul 21, 2020
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Some questions.

cpp/include/cuspatial/polygon_distance.hpp Outdated Show resolved Hide resolved
cpp/src/spatial/polygon_distance.cu Outdated Show resolved Hide resolved
cpp/src/spatial/polygon_distance.cu Outdated Show resolved Hide resolved
cpp/src/spatial/polygon_distance.cu Outdated Show resolved Hide resolved
* calculates the segment-point distance of a line segment in group `a` and a point in group `b`.
*
* If group `a` contains two or more points, calculate segment-point distance.
* If group `a` contains only a single point, calculate point-point distance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If group `a` contains only a single point, calculate point-point distance.
* If group `a` contains only a single point, calculate point-to-point distance.

cpp/src/spatial/polygon_distance.cu Outdated Show resolved Hide resolved

} // namespace
} // namespace detail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we have a detail version of API functions that take a stream, like we do in libcudf, don't we?

Copy link
Contributor Author

@cwharris cwharris Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I'll fix this PR and then ensure it's being done elsewhere in a followup (if needed).

auto space_offsets = fixed_width_column_wrapper<size_type>({0});

EXPECT_THROW(cuspatial::directed_polygon_distance(x, y, space_offsets), cuspatial::logic_error);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the perennial discussion, but we should have some tests of non-trivial size (large enough to launch multiple thread blocks). You could generate a thousand squares on a regular grid (for example) which would make it easy to generate a reference solution analytically.

Or maybe we should look at having gtests run a Python script to generate input and reference data using another library (if that's easier).

Copy link
Contributor Author

@cwharris cwharris Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue for good measure. I'll think on how we could best approach this.

#259

* @param[in] mr: Device memory resource used to allocate the returned memory
* @return std::unique_ptr<cudf::column>
*/
std::unique_ptr<cudf::column> directed_polygon_distance(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a directed distance? Suggest documenting that.

result : cudf.DataFrame
The pairwise directed distance matrix with one row and one column per
input polygon; the value at row `i`, column `j` represents the
polygon-point distance from polygon i to any vertex in polygon j.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Suggested change
polygon-point distance from polygon i to any vertex in polygon j.
minimum polygon-point distance from polygon i to any vertex in polygon j.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From which point in polygon i is the distance measured? Is it really "the minimum distance from any vertex in i to any vertex in j"?

If so, why would the distance from i to j be different from the distance from j to i?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the question is "why would..." the answer is: because it was more computationally efficient and required less redundant reads.
If the question had been "why should..." the answer is not so simple.

That said, this is the pattern that Hausdorff follows right now, and it's because its more efficient on the computation side, but might be less efficient on the consumers end, depending on how it's used. we should probably approach all/most distance metric APIs similarly, and Hausdorff set the precedence.

This requires further discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the question was another way of asking "why is this a directed distance?". And the reason is because it's the distance from vertices of A to edges of B, not from edges of A to edges of B. If it were the latter, it would be a symmetric distance. This is just asking for clarity in the docs of the geometric reason for having two different distances per pair.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that in order to make it symmetric either 1. the two directed distances would need to be computed simultaneously in the same kernel then stored in the upper triangular matrix entry or 2. the two directed distances need to be computed in the existing form, then the minimum of the reflected values along the diagonal moved into the upper triangular.

I don't think (I could be wrong) that there is an edge-to-edge distance that can be computed in a form other than min( dist( points_A, segment_B ), dist( points_B, segment_A )). The complexity of the operation can be computed in the kernel symmetrically, or after the fact with another kernel. The difference could be benchmarked, but that seems like premature optimization to me at this time.

The main issue I'm seeing with making this a symmetric operation is that it allows us to use (N^2)/2 storage for the final result instead of N^2, except that we don't have any facility for sparse or triangular matrices in cudf or cuspatial at this time. That is, in order to take advantage of the storage complexity reduction you'll have to write some more array offset abstraction for dist(i,j) and return that to the user.

This perhaps makes a good basis for what needs to be added to the docs, describing our justification for returning directed distances in an N^2 matrix.

@github-actions
Copy link

This PR has been marked rotten due to no recent activity in the past 90d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

@github-actions
Copy link

This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d.

@isVoid
Copy link
Contributor

isVoid commented Apr 14, 2023

I believe this is superseded by #988

@isVoid isVoid closed this Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants