Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Strange behavior with quadtree point in polygon #890

Closed
thomcom opened this issue Feb 2, 2023 · 2 comments · Fixed by #1381
Closed

[BUG]: Strange behavior with quadtree point in polygon #890

thomcom opened this issue Feb 2, 2023 · 2 comments · Fixed by #1381
Assignees
Labels
bug Something isn't working Needs Triage Need team to review and classify

Comments

@thomcom
Copy link
Contributor

thomcom commented Feb 2, 2023

Version

23.04

On which installation method(s) does this occur?

Rapids-Compose

Describe the issue

quadtree pip seems to experience a silent OOM or other error in certain larger data cases. This particular example requires the availability of our demo datasets taxi2015.csv and taxi_zones.zip. I'm not sure of another way to reproduce it, so I'm including the full example here. It doesn't always appear so I'm just remembering the issue here.

As you can see from the Relevant log output, quadtree returns 3.7m samples when only the first 120 polygons in zones are used. If the entire zones polygons list (263 polygons) is used, something happens and only 16314 rows are returned. This is unexpected behavior and needs investigation.

Minimum reproducible example

# https://github.com/rapidsai/cuspatial/issues/890
import cuspatial
import cudf
import geopandas
host_zones = geopandas.read_file('taxi_zones.zip')
host_lonlat = host_zones.to_crs(epsg=4326)
zones = cuspatial.from_geopandas(host_lonlat)
taxi2015 = cudf.read_csv('taxi2015.csv')
def quadtree(polygons, points):
    poly_points_x = polygons.polygons.x
    poly_points_y = polygons.polygons.y
    poly_offsets = polygons.polygons.part_offset
    poly_ring_offsets = polygons.polygons.ring_offset
    test_points_x = points.points.x
    test_points_y = points.points.y
    scale = 50
    max_depth = 7
    min_size = 125
    x_max = poly_points_x.max()
    x_min = poly_points_x.min()
    y_max = poly_points_y.max()
    y_min = poly_points_y.min()
    point_indices, quadtree = cuspatial.quadtree_on_points(
        test_points_x,
        test_points_y,
        x_min,
        x_max,
        y_min,
        y_max,
        scale,
        max_depth,
        min_size,
    )
    poly_bboxes = cuspatial.polygon_bounding_boxes(
        poly_offsets, poly_ring_offsets, poly_points_x, poly_points_y
    )
    intersections = cuspatial.join_quadtree_and_bounding_boxes(
        quadtree, poly_bboxes, x_min, x_max, y_min, y_max, scale, max_depth
    )
    polygons_and_points = cuspatial.quadtree_point_in_polygon(
        intersections,
        quadtree,
        point_indices,
        test_points_x,
        test_points_y,
        poly_offsets,
        poly_ring_offsets,
        poly_points_x,
        poly_points_y,
    )
    return polygons_and_points
def make_geoseries_from_lon_lat(lon, lat):
    # Scatter the two columns into one column
    assert len(lon) == len(lat)
    xy = cudf.Series(cp.zeros(len(lon) * 2))
    xy[::2] = lon
    xy[1::2] = lat

    return cuspatial.GeoSeries(cuspatial.core._column.geocolumn.GeoColumn._from_points_xy(xy._column))
dropoffs = make_geoseries_from_lon_lat(
    taxi2015['dropoff_longitude'],
    taxi2015['dropoff_latitude']
)
print(quadtree(zones['geometry'].iloc[0:120], dropoffs))
print(quadtree(zones['geometry'], dropoffs))

Relevant log output

polygon_index  point_index
0                    0          116
1                    0          387
2                    0          685
3                    0         2607
4                    0         3141
...                ...          ...
3686177            167     12304018
3686178            167     12323531
3686179            167     12351800
3686180            167     12444884
3686181            167     12484251

[3686182 rows x 2 columns]
       polygon_index  point_index
0                  0          116
1                  0          387
2                  0          685
3                  0         2608
4                  0         3142
...              ...          ...
16309              0     12504645
16310              1      7456107
16311              1      7530752
16312              1      7704910
16313              1     11938181

[16314 rows x 2 columns]

Environment details

rapids-compose 23.04

Other/Misc.

No response

@thomcom thomcom added bug Something isn't working Needs Triage Need team to review and classify labels Feb 2, 2023
@zhangjianting
Copy link
Contributor

If the point coordiantes are also using projection EPSG:4326 (lon/lat), it does not make sense to use a scale of 50. The width/height of NYC bounding boxes is no more than 1 degree in lon/lat.

@thomcom
Copy link
Contributor Author

thomcom commented Feb 7, 2023

The issue I'm asking about has to do with the hidden error that causes the full dataset to return fewer points than taking a partial slice of it.

print(quadtree(zones['geometry'], dropoffs)) returns only 16313 points
print(quadtree(zones['geometry'].iloc[0:120], dropoffs)) returns 3686181 points
print(quadtree(zones['geometry'].iloc[0:160], dropoffs)) returns an OOM error
print(quadtree(zones['geometry'].iloc[0:len(zones)-9] also quietly returns a smaller number of points, or any number down to 1. If I use len(zones)-10 I get an OOM error.

@jarmak-nv jarmak-nv moved this to Todo in cuSpatial Jun 2, 2023
rapids-bot bot pushed a commit that referenced this issue May 24, 2024
)

Followup to #1346.

* Fixes some typos/omissions in types and CMake.
* Adds a new test that OOMs when quadtree_point_in_polygon is passed too many input polygons.
* Fixes quadtree spatial join to handle overflow while counting and more conservatively allocate output buffers.

Fixes #890.

* [Failing test run](https://github.com/rapidsai/cuspatial/actions/runs/8979838628/job/24662981350#step:7:840)
* [Passing test run](https://github.com/rapidsai/cuspatial/actions/runs/8981106226/job/24666403165#step:7:840)

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Michael Wang (https://github.com/isVoid)

URL: #1381
@github-project-automation github-project-automation bot moved this from Todo to Done in cuSpatial May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Needs Triage Need team to review and classify
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants