Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Polygon.contains(LineString) predicate logic. #1186

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Jun 7, 2023

Fixes #1185

Description

The previous logic for handling the case of a LineString that shares boundary points with a Polygon and has no interior points was wrong, it was just finding the center point of such a LineString and testing that for containment. The correct logic is as follows:

A LineString is contained in a polygon if:
The sum of the points of the LineString that are contained in the Polygon, plus the number of vertices of the linestring that are on the polygon boundary, is equal to the length of the LineString less the number of points in the LineString that are contained in the boundary of the Polygon.

polygon.contains(linestring) = contains + point_intersection_count == rhs.sizes - linestring_intersection_count

The new logic counts the number of point intersections and linestring intersection that the LineString shares with the polygon. Point intersections occur when the LineString shares an edge point with the Polygon exterior, implying either a LineString segment that approaches from the exterior or the interior. An interior point will be counted, an exterior point will be left in the size of the LineString during comparison, with the final count being less than the size of the LineString.

The hardest part of this implementation was in writing _pli_features_rebuild_offsets that takes the results of a pairwise_linestring_intersection result and splits them into row-appropriate points in one set and linestrings in another set. This is potentially a good place to move the logic into C++, though I don't think it will be a major profiling issue initially.

Also improves touches and crosses where the predicate had been written too tightly to the test cases.


In addition, a bug with pairwise_multipoint_equals_count is fixed when lhs contains more than 1 multipoints, but all multipoints are empty. The bug causes the API to raise a cuda invalid configuration error.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

thomcom and others added 27 commits May 31, 2023 19:05
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
…erly.py

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
@thomcom thomcom self-assigned this Jun 7, 2023
@thomcom thomcom requested a review from a team as a code owner June 7, 2023 17:05
@thomcom thomcom requested a review from trxcllnt June 7, 2023 17:05
@thomcom thomcom requested review from harrism and removed request for trxcllnt June 12, 2023 18:03
@thomcom thomcom added bug Something isn't working 4 - Needs Reviewer Waiting for reviewer to review or respond non-breaking Non-breaking change labels Jun 12, 2023
Comment on lines +124 to +126
rhs_sizes_less_line_intersection_size[
rhs_sizes_less_line_intersection_size <= 0
] = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why is it possible that there can be rows that has fewer vertices in the linestring to the number of intersection points?

  2. Why do you need to explicitly set these rows to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating this logic. This step is important to cover the tests of polygon.contains(linestring) where the LineString is contained only by the border off the polygon. In the above line, the number of points in the linestring is reduced by the number of points in the border-overlap region:

rhs_sizes_less_line_intersection_count = rhs.sizes - linestring_intersects_count

This is because LineString segments that overlap with the boundary of a polygon are not used in the counting of the vertices that may be contained in the polygon. Consider the following three examples:

image

Note that each shape has examples of the LineString overlapping the boundary of the polygon. In the first case, an edge overlaps from the corner, then goes inside to an interior point, then returns to the edge, then overlaps to the corner. The length of the LineString is 5. The first overlapping segment subtracts 2 from the length of the linestring, and the second overlapping segment subtracts another 2, leaving a linestring of length 1. 1 point is contained, and therefor the linestring is contained.

In the second example the same pattern occurs except the middle point goes out of the polygon instead of in. The same points are subtracted from the length of the linestring, but the number of contained points is 0, less than the (remaining) length of the linestring of 1, so it is not contained.

Finally, the LineString only overlaps with the edge of the polygon, on the left and on the bottom. This returns a LineString from pairwise-linestring-intersection of length 3, which is subtracted from the length of the LineString, also 3. rhs_sizes_less_line_intersection_size is now 0 for this example LineString, so contains + point_intersects_count == 0, which would return contains = True. Therefore I set the minimum LineString length to 1, so that some LineString point must be contained in order for .contains to evaluate to true.

Happy to chat this through with you in person if you can think of a more parsimonious approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that each shape has examples of the LineString overlapping the boundary of the polygon. In the first case, an edge overlaps from the corner, then goes inside to an interior point, then returns to the edge, then overlaps to the corner. The length of the LineString is 5. The first overlapping segment subtracts 2 from the length of the linestring, and the second overlapping segment subtracts another 2, leaving a linestring of length 1. 1 point is contained, and therefor the linestring is contained.

In the second example the same pattern occurs except the middle point goes out of the polygon instead of in. The same points are subtracted from the length of the linestring, but the number of contained points is 0, less than the (remaining) length of the linestring of 1, so it is not contained.

Pardon me from not following the reasoning here. I took linestring_intersects_count meaning the number of vertices of the linestring that intersects with the ring of the polygon. Isn't for both cases, the value is 4? The subtracted result is 1 right?

Finally, the LineString only overlaps with the edge of the polygon, on the left and on the bottom. This returns a LineString from pairwise-linestring-intersection of length 3, which is subtracted from the length of the LineString, also 3. rhs_sizes_less_line_intersection_size is now 0 for this example LineString, so contains + point_intersects_count == 0, which would return contains = True. Therefore I set the minimum LineString length to 1, so that some LineString point must be contained in order for .contains to evaluate to true.

This sounds a bit hacky to me. Also, at least the comparison should be == not <= right?

Comment on lines +127 to +128
final_result = contains + point_intersects_count == (
rhs_sizes_less_line_intersection_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems differ from your statement in the PR description, your description basically says:

final_result = contains == rhs.sizes - linestring_points_on_boundary

To make this equal to the formula here, it seems like we need:

linestring_points_on_boundary == linestring_intersects_counts - point_intersects_count

Though I don't see the connection here. Can you show me some hints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the description I believe I misspoke about the number of points that are contained in the polygon as demonstrated in the code:

final_result = contains + point_intersects_count == rhs.sizes - linestring_points_on_boundary

IIRC, you are concerned that linestring_points_on_boundary in your example doesn't take into account the number of point intersections, which are ignored in the description of this PR but not in _compute_polygon_linestring_contains.

contains equals the number of the LineString vertices that are contained by the polygon.

point_intersects_count equals the number of LineString vertices that intersect with the polygon: that is, the places where the LineString does not share an overlapping boundary with the polygon, but instead has a vertex that is colinear with the boundary of the polygon, but each segment that shares the vertex is not colinear. When a point_intersects_count intersection occurs, there must either: exist a point that is inside the polygon that will be counted by contains or a point outside of the polygon that will not be counted.

When a vertex is colinear with the boundary of the polygon this is counted as "in" the polygon for a reason related to our above conversation, where the minimum LineString length must be 1. Consider the following example:

image

In the first case, point_intersects_count == 2, as the LineString vertices are colinear with the boundary of the polygon. contains + point_intersects_count == rhs.sizes and the LineString is contained.

In the second case, point_intersects_count == 0, as the LineString vertices are not colinear- it is a genuine intersection. contains + point_intersects_count < rhs.sizes and the LineString is not contained.

linestring_intersects_count is the number of LineString vertices that are from segments that are colinear with the polygon boundary. They are used only to reduce the number of points in the LineString that should be tested for .contains.

The final equation would be
final_result = contains + point_intersects_count == rhs.sizes - linestring_intersects_count

equal to

final_result = contains == rhs.sizes - linestring_intersects_count - point_intersects_count as I think you are alluding to above, except that the LineString must maintain a size of at least 0 as you asked about above. Maybe I'm missing something here that would allow me to subtract point_intersects_count instead of using the line that sets the minimum rhs_sizes_less_line_intersection_size to 1. I'll be thinking of that.

Consider the final case in this response,

image

In this case point_intersects_count = 1, and due to the overlapping segment, rhs_sizes_less_line_intersection_count == 1, and the LineString is contained. I could do contains == rhs.sizes - linestring_intersects_count - point_intersects_count, which in this case would be (0 == 0), but that would break the third case in my earlier response, where the LineString is entirely colinear with the boundary of the polygon. In those cases the LineString is not contained, and as such it must be that rhs_sizes_less_line_intersection_count > 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you arrive at the same equality as I did above.

final_result = contains == rhs.sizes - linestring_intersects_count - point_intersects_count

Great. I'm still confused with the discrepency between this and code. But perhaps focus on Mark's question below first.

@isVoid
Copy link
Contributor

isVoid commented Jun 13, 2023

The sum of the points of the LineString that are contained in the Polygon is equal to the length of the LineString less the number of points in the LineString that are contained in the boundary of the Polygon.

I suggest we use stricter mathematical language here.

  • "sum of the points", there are infinitie points on a linestring, I believe you mean vertices.
  • "length of the linestring", you probably mean the total number of vertices rather than the measurement of the linestring.

aka:

LineString is contained in the Polygon if:

The number of linestring vertices contained in the polygon equals the number of linestring vertices subtracts the number of vertices that falls on the polygon boundary.

aka:

LineString is contained in the Polygon if:

All vertices of the linestring either falls on the boundary of the polygon or is contained in the polygon.

Right?

@harrism
Copy link
Member

harrism commented Jun 14, 2023

LineString is contained in the Polygon if:

All vertices of the linestring either falls on the boundary of the polygon or is contained in the polygon.

I don't think that's sufficient (but neither is the one written by Thomson that you quoted). Because of concavities and holes, even if all vertices of a linestring fall on the boundary or are contained by the polygon, intermediate points in the edges could fall outside of the polygon where it is concave or has holes.

I think a more correct definition is:

LineString is contained in the Polygon if:

All vertices of the linestring either fall on the boundary of the polygon or are contained in the polygon, and no segment of the linestring passes outside of the polygon.

This latter bit is tricky, because the linestring can pass completely outside and then back inside within a single segment. I think you must check whether all intersection points are also vertices of the linestring, and all intersection segments are colinear with polygon segments.

@thomcom
Copy link
Contributor Author

thomcom commented Jul 17, 2023

Resolves #1185

Comment on lines +331 to +335
x x x
|\\ | /|
| xx- |
| | |
---x---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not right? Why the double backslash? Also missing x vertices on the bottom edge of the poly. I think this is more like it.

    x     o   x
    |\    |  /|
    |  \  |/  |
    |   x/*   |
    |     |   |
    x-----o---x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backslash is required for python inline comments like this, unfortunately. I like your diagram.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is similar to C style backslash escape.

Comment on lines +392 to +396
x------
| |
| ---x
| --- |
x------
Copy link
Member

@harrism harrism Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    ox------x
     |      |
     |    /-*--o
     | _/   |
    ox------x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ideal for ASCII...


See the docs for `_pli_points_to_multipoints` and
`_pli_lines_to_multipoints` for the rest of the explanation.
"""
in_sizes = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this whole function needed because of the output format of pairwise_linestring_intersection? Perhaps we need to reconsider the output format of that function for efficiency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

@isVoid isVoid requested a review from a team as a code owner August 2, 2023 11:03
@github-actions github-actions bot added the libcuspatial Relates to the cuSpatial C++ library label Aug 2, 2023
@isVoid
Copy link
Contributor

isVoid commented Aug 2, 2023

/merge

@rapids-bot rapids-bot bot merged commit 01a1078 into rapidsai:branch-23.08 Aug 2, 2023
@isVoid isVoid mentioned this pull request Aug 2, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Reviewer Waiting for reviewer to review or respond bug Something isn't working libcuspatial Relates to the cuSpatial C++ library non-breaking Non-breaking change Python Related to Python code
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[BUG]: Fix issue with Polygon.contains(LineString)
3 participants