-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Directed Polygon Point Distance #251
[REVIEW] Directed Polygon Point Distance #251
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
…-231-polygon-separation
…-231-polygon-separation
…-231-polygon-separation
36607c1
to
0679517
Compare
923c6cd
to
8258698
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Some questions.
* calculates the segment-point distance of a line segment in group `a` and a point in group `b`. | ||
* | ||
* If group `a` contains two or more points, calculate segment-point distance. | ||
* If group `a` contains only a single point, calculate point-point distance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If group `a` contains only a single point, calculate point-point distance. | |
* If group `a` contains only a single point, calculate point-to-point distance. |
|
||
} // namespace | ||
} // namespace detail | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we have a detail version of API functions that take a stream, like we do in libcudf, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. I'll fix this PR and then ensure it's being done elsewhere in a followup (if needed).
auto space_offsets = fixed_width_column_wrapper<size_type>({0}); | ||
|
||
EXPECT_THROW(cuspatial::directed_polygon_distance(x, y, space_offsets), cuspatial::logic_error); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the perennial discussion, but we should have some tests of non-trivial size (large enough to launch multiple thread blocks). You could generate a thousand squares on a regular grid (for example) which would make it easy to generate a reference solution analytically.
Or maybe we should look at having gtests run a Python script to generate input and reference data using another library (if that's easier).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue for good measure. I'll think on how we could best approach this.
* @param[in] mr: Device memory resource used to allocate the returned memory | ||
* @return std::unique_ptr<cudf::column> | ||
*/ | ||
std::unique_ptr<cudf::column> directed_polygon_distance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a directed distance? Suggest documenting that.
result : cudf.DataFrame | ||
The pairwise directed distance matrix with one row and one column per | ||
input polygon; the value at row `i`, column `j` represents the | ||
polygon-point distance from polygon i to any vertex in polygon j. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
polygon-point distance from polygon i to any vertex in polygon j. | |
minimum polygon-point distance from polygon i to any vertex in polygon j. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From which point in polygon i is the distance measured? Is it really "the minimum distance from any vertex in i to any vertex in j"?
If so, why would the distance from i to j be different from the distance from j to i?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the question is "why would..." the answer is: because it was more computationally efficient and required less redundant reads.
If the question had been "why should..." the answer is not so simple.
That said, this is the pattern that Hausdorff follows right now, and it's because its more efficient on the computation side, but might be less efficient on the consumers end, depending on how it's used. we should probably approach all/most distance metric APIs similarly, and Hausdorff set the precedence.
This requires further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the question was another way of asking "why is this a directed distance?". And the reason is because it's the distance from vertices of A to edges of B, not from edges of A to edges of B. If it were the latter, it would be a symmetric distance. This is just asking for clarity in the docs of the geometric reason for having two different distances per pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that in order to make it symmetric either 1. the two directed distances would need to be computed simultaneously in the same kernel then stored in the upper triangular matrix entry or 2. the two directed distances need to be computed in the existing form, then the minimum of the reflected values along the diagonal moved into the upper triangular.
I don't think (I could be wrong) that there is an edge-to-edge distance that can be computed in a form other than min( dist( points_A, segment_B ), dist( points_B, segment_A )). The complexity of the operation can be computed in the kernel symmetrically, or after the fact with another kernel. The difference could be benchmarked, but that seems like premature optimization to me at this time.
The main issue I'm seeing with making this a symmetric operation is that it allows us to use (N^2)/2 storage for the final result instead of N^2, except that we don't have any facility for sparse or triangular matrices in cudf or cuspatial at this time. That is, in order to take advantage of the storage complexity reduction you'll have to write some more array offset abstraction for dist(i,j) and return that to the user.
This perhaps makes a good basis for what needs to be added to the docs, describing our justification for returning directed distances in an N^2 matrix.
This PR has been marked rotten due to no recent activity in the past 90d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. |
This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d. |
I believe this is superseded by #988 |
Contributes to #231. Depends on #250.