Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add containspredicate. #1086

Merged
merged 45 commits into from
May 4, 2023

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Apr 19, 2023

Closes #1077
Depends on #1022

This PR is a precursor to #1064.

Description

This PR creates a new .contains predicate that uses the sum of the result of the .contains_properly predicate and the .intersects predicate to compute the .contains relationship, boundary inclusive.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@thomcom thomcom added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 19, 2023
@thomcom thomcom self-assigned this Apr 19, 2023
@thomcom thomcom requested a review from a team as a code owner April 19, 2023 20:10
@thomcom thomcom requested review from trxcllnt and harrism April 19, 2023 20:10
@github-actions github-actions bot added the Python Related to Python code label Apr 19, 2023
@thomcom thomcom requested review from isVoid and removed request for trxcllnt April 19, 2023 20:18
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention span cutting off. Part 1 review. Will continue after some rest.

def _convert_quadtree_result_from_part_to_polygon_indices(
self, lhs, point_result
):
"""Convert the result of a quadtree contains_properly call from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if this internal function is used for contains_properly? It seems like it can be applied generically to all other predicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only used in contains_properly. I will move it to binpred_utils.py if another predicate depends on it.

@harrism
Copy link
Member

harrism commented Apr 26, 2023

Perhaps use a more descriptive name for this PR. After about 6 of these PRs, my eyes have started glazing over whenever I see "refactor" and "contains" in a PR title. And if anyone reads our changelog, they are going to wonder why the heck we refactor contains so frequently.

@thomcom thomcom changed the title Refactor of contains and contains_properly. Add contains function. Apr 27, 2023
result = pairwise_multipoint_equals_count(lhs, rhs)
return result

def _basic_intersects_pli(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pli? Please avoid unnecessary abbreviation.

It also seems to me that the other intersection should be called _basic_point_intersects since that computes point intersections while this is _basic_intersects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pli: pairwise_linestring_intersection. This returns the unmodified results of the pairwise_linestring_intersection call, hence pli. I'll think about some renamings.

python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to separate this file into test_[name_of_binpreds] for each tested binpreds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only covers an extremely small range of within, overlaps, and intersects tests. I'm happy to write more tests in specific files once test dispatching with passing tests is merged in #1064.

return result


def _linestrings_is_degenerate(geoseries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this now that #1093 is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I'd like to leave that for a future issue/PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track this #1113

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly docs, a few structural questions.

python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
The size of a multipoint is the number of points in its single point.
The size of a point is 1.
"""
if contains_only_polygons(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be a lot simpler? Any geometry, regardless of complexity, stores a flat array of coordinates. The number of points is just the length of the xy points array, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this logic is because sizes returns the size of each row of the GeoSeries. For example

x = cuspatial.GeoSeries([
    LineString([(0, 0), (1, 1)]),
    LineString([(0, 0), (1, 1), (2, 2)]),
    LineString([(0, 0), (1, 1), (2, 2), (3, 3)]),
])
print(x.sizes)
0    2
1    3
2    4
dtype: int32

The .xy array in this case contains all of the points for all of the features in the GeoSeries:

x.lines.xy
0     0.0
1     0.0
2     1.0
3     1.0
4     0.0
5     0.0
6     1.0
7     1.0
8     2.0
9     2.0
10    0.0
11    0.0
12    1.0
13    1.0
14    2.0
15    2.0
16    3.0
17    3.0
dtype: float64

But has no information about the lengths of the individual geometries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum(x.sizes) == 9 and len(x.lines.xy) / 2 == 9. So why doesn't the latter serve the purpose?

Copy link
Contributor Author

@thomcom thomcom May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the sizes are used for each row of the GeoSeries to determine if the .contains (boundary inclusive) predicate has been met:

return contains + intersects >= rhs.sizes

Where contains is a cudf.Series of the number of points that satisfied .contains_properly for each row in the GeoSeries, and intersects is a cudf.Series of the number of points that satisfied .intersects for those same rows in the GeoSeries. So if I had a GeoSeries of a length-three polygon and a length 4 polygon, sizes would be

0    3
1    4
dtype: int32

and in the cast of a self-contains test contains would be

0    0
1    0
dtype: int32

and intersects would be

0    3
0    4
dtype: int32

so the predicate result of .contains is
`contains_properly + intersects >= rhs.sizes ==

0    True
1    True
dtype: bool

Does that properly answer your question? I'm not using the sum of the sizes in the GeoSeries, but each individually.

python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
thomcom and others added 7 commits May 2, 2023 15:21
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments relating to documentation above.

@harrism
Copy link
Member

harrism commented May 3, 2023

Also please address my two remaining open comments.

@thomcom
Copy link
Contributor Author

thomcom commented May 4, 2023

/merge

@rapids-bot rapids-bot bot merged commit 8b97a72 into rapidsai:branch-23.06 May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

Refactor .contains_properly into .contains with near-final architecture.
3 participants