Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have Cython methods accept pylibcudf.Column instead of cudf._lib.column.Column #1514

Merged
merged 2 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 33 additions & 27 deletions python/cuspatial/cuspatial/_lib/distance.pyx
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
# Copyright (c) 2022-2025, NVIDIA CORPORATION.

from libcpp.memory cimport make_shared, shared_ptr, unique_ptr
from libcpp.utility cimport move, pair

from cudf._lib.column cimport Column
from pylibcudf cimport Table as plc_Table
from cudf.core.column.column import Column
from pylibcudf cimport Column as plc_Column, Table as plc_Table
from pylibcudf.libcudf.column.column cimport column
from pylibcudf.libcudf.column.column_view cimport column_view
from pylibcudf.libcudf.table.table_view cimport table_view
Expand All @@ -26,7 +26,12 @@ from cuspatial._lib.cpp.types cimport collection_type_id, geometry_type_id
from cuspatial._lib.types cimport collection_type_py_to_c


cpdef haversine_distance(Column x1, Column y1, Column x2, Column y2):
cpdef haversine_distance(
plc_Column x1,
plc_Column y1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these just be imported as Column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have a namespace shadowing issue because we need cudf.core.column.Column (for the output) and pylibcudf.Column (to type these inputs). I decided to alias pylibcudf.Column to plc_Column, but happy to use another alias if you'd like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we mixing the use of cudf column and pylibcudf column here?

Copy link
Contributor Author

@mroeschke mroeschke Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary motivation of this PR is based on removing cudf._lib in favor of pylibcudf rapidsai/cudf#17317, so cudf._lib.column.Column is going away. The libcudf-like APIs (mainly accessing a column_view in cuspatial) will now be available on pylibcudf.Column, but to be compatible with cudf Python APIs we still need to return a cudf Python column.

We also have to do this cudf column -> pylibcudf column -> cudf column conversion currently in cudf itself before calling a Cythonized function. Potentially one of these days if we can connect a cudf column to a pylibcudf column via subclass or make cudf work entirely with pylibcudf columns, then we could avoid this conversion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in an ideal world I'd like to see if we can implement most of the features with just pylibcudf and not having to do the round trip. I think the strong typing in cython functions without any supports of protocol / duck typing makes that a bit harder to achieve that. However, I wonder if cuspatial has a smaller API surface that can help make directly using pylibcudf a bit easier?

As always, thanks for consistent work on removing cudf internals in cuspatial!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I wonder if cuspatial has a smaller API surface that can help make directly using pylibcudf a bit easier?

Yes definitely. I could do a follow up PR to move the pylibcudf column -> cudf column conversion from the Cython layer to the Python layer. (I could have done that in this PR but there would have been more visual noise)

plc_Column x2,
plc_Column y2,
):
cdef column_view c_x1 = x1.view()
cdef column_view c_y1 = y1.view()
cdef column_view c_x2 = x2.view()
Expand All @@ -37,13 +42,13 @@ cpdef haversine_distance(Column x1, Column y1, Column x2, Column y2):
with nogil:
c_result = move(cpp_haversine_distance(c_x1, c_y1, c_x2, c_y2))

return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))


def directed_hausdorff_distance(
Column xs,
Column ys,
Column space_offsets,
plc_Column xs,
plc_Column ys,
plc_Column space_offsets,
):
cdef column_view c_xs = xs.view()
cdef column_view c_ys = ys.view()
Expand All @@ -60,8 +65,9 @@ def directed_hausdorff_distance(
)
)

owner_col = Column.from_unique_ptr(
move(result.first), data_ptr_exposed=True
owner_col = Column.from_pylibcudf(
plc_Column.from_libcudf(move(result.first)),
data_ptr_exposed=True
)
cdef plc_Table plc_owner_table = plc_Table(
[owner_col.to_pylibcudf(mode="read")] * result.second.num_columns()
Expand All @@ -75,8 +81,8 @@ def directed_hausdorff_distance(
def pairwise_point_distance(
lhs_point_collection_type,
rhs_point_collection_type,
Column points1,
Column points2,
plc_Column points1,
plc_Column points2,
):
cdef collection_type_id lhs_point_multi_type = collection_type_py_to_c(
lhs_point_collection_type
Expand All @@ -102,12 +108,12 @@ def pairwise_point_distance(
c_multipoints_lhs.get()[0],
c_multipoints_rhs.get()[0],
))
return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))


def pairwise_linestring_distance(
Column multilinestrings1,
Column multilinestrings2
plc_Column multilinestrings1,
plc_Column multilinestrings2
):
cdef shared_ptr[geometry_column_view] c_multilinestring_lhs = \
make_shared[geometry_column_view](
Expand All @@ -128,13 +134,13 @@ def pairwise_linestring_distance(
c_multilinestring_rhs.get()[0],
))

return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))


def pairwise_point_linestring_distance(
point_collection_type,
Column points,
Column linestrings,
plc_Column points,
plc_Column linestrings,
):
cdef collection_type_id points_multi_type = collection_type_py_to_c(
point_collection_type
Expand All @@ -158,13 +164,13 @@ def pairwise_point_linestring_distance(
c_multilinestrings.get()[0],
))

return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))


def pairwise_point_polygon_distance(
point_collection_type,
Column multipoints,
Column multipolygons
plc_Column multipoints,
plc_Column multipolygons
):
cdef collection_type_id points_multi_type = collection_type_py_to_c(
point_collection_type
Expand All @@ -189,12 +195,12 @@ def pairwise_point_polygon_distance(
c_multipoints.get()[0], c_multipolygons.get()[0]
))

return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))


def pairwise_linestring_polygon_distance(
Column multilinestrings,
Column multipolygons
plc_Column multilinestrings,
plc_Column multipolygons
):
cdef shared_ptr[geometry_column_view] c_multilinestrings = \
make_shared[geometry_column_view](
Expand All @@ -215,10 +221,10 @@ def pairwise_linestring_polygon_distance(
c_multilinestrings.get()[0], c_multipolygons.get()[0]
))

return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))


def pairwise_polygon_distance(Column lhs, Column rhs):
def pairwise_polygon_distance(plc_Column lhs, plc_Column rhs):
cdef shared_ptr[geometry_column_view] c_lhs = \
make_shared[geometry_column_view](
lhs.view(),
Expand All @@ -238,4 +244,4 @@ def pairwise_polygon_distance(Column lhs, Column rhs):
c_lhs.get()[0], c_rhs.get()[0]
))

return Column.from_unique_ptr(move(c_result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(c_result)))
43 changes: 28 additions & 15 deletions python/cuspatial/cuspatial/_lib/intersection.pyx
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2025, NVIDIA CORPORATION.

from libcpp.memory cimport make_shared, shared_ptr
from libcpp.utility cimport move

from cudf._lib.column cimport Column
from cudf.core.column.column import Column
from pylibcudf cimport Column as plc_Column

from cuspatial._lib.types import CollectionType, GeometryType

Expand All @@ -21,7 +22,7 @@ from cuspatial._lib.types cimport (
)


def pairwise_linestring_intersection(Column lhs, Column rhs):
def pairwise_linestring_intersection(plc_Column lhs, plc_Column rhs):
"""
Compute the intersection of two (multi)linestrings.
"""
Expand Down Expand Up @@ -51,22 +52,34 @@ def pairwise_linestring_intersection(Column lhs, Column rhs):
c_lhs.get()[0], c_rhs.get()[0]
))

geometry_collection_offset = Column.from_unique_ptr(
move(c_result.geometry_collection_offset)
geometry_collection_offset = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.geometry_collection_offset))
)

types_buffer = Column.from_unique_ptr(move(c_result.types_buffer))
offset_buffer = Column.from_unique_ptr(move(c_result.offset_buffer))
points = Column.from_unique_ptr(move(c_result.points))
segments = Column.from_unique_ptr(move(c_result.segments))
lhs_linestring_id = Column.from_unique_ptr(
move(c_result.lhs_linestring_id)
types_buffer = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.types_buffer))
)
lhs_segment_id = Column.from_unique_ptr(move(c_result.lhs_segment_id))
rhs_linestring_id = Column.from_unique_ptr(
move(c_result.rhs_linestring_id)
offset_buffer = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.offset_buffer))
)
points = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.points))
)
segments = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.segments))
)
lhs_linestring_id = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.lhs_linestring_id))
)
lhs_segment_id = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.lhs_segment_id))
)
rhs_linestring_id = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.rhs_linestring_id))
)
rhs_segment_id = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.rhs_segment_id))
)
rhs_segment_id = Column.from_unique_ptr(move(c_result.rhs_segment_id))

# Map linestring type codes from libcuspatial to cuspatial
types_buffer[types_buffer == GeometryType.LINESTRING.value] = (
Expand Down
10 changes: 5 additions & 5 deletions python/cuspatial/cuspatial/_lib/linestring_bounding_boxes.pyx
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Copyright (c) 2020-2024, NVIDIA CORPORATION.
# Copyright (c) 2020-2025, NVIDIA CORPORATION.

from libcpp.memory cimport unique_ptr
from libcpp.utility cimport move

from cudf._lib.column cimport Column
from pylibcudf cimport Table as plc_Table
from cudf.core.column.column import Column
from pylibcudf cimport Column as plc_Column, Table as plc_Table
from pylibcudf.libcudf.column.column_view cimport column_view
from pylibcudf.libcudf.table.table cimport table

Expand All @@ -13,8 +13,8 @@ from cuspatial._lib.cpp.linestring_bounding_boxes cimport (
)


cpdef linestring_bounding_boxes(Column poly_offsets,
Column x, Column y,
cpdef linestring_bounding_boxes(plc_Column poly_offsets,
plc_Column x, plc_Column y,
double R):
cdef column_view c_poly_offsets = poly_offsets.view()
cdef column_view c_x = x.view()
Expand Down
30 changes: 18 additions & 12 deletions python/cuspatial/cuspatial/_lib/nearest_points.pyx
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
# Copyright (c) 2022-2025, NVIDIA CORPORATION.

from libcpp.utility cimport move

from cudf._lib.column cimport Column
from cudf.core.column.column import Column
from pylibcudf cimport Column as plc_Column
from pylibcudf.libcudf.column.column_view cimport column_view

from cuspatial._lib.cpp.nearest_points cimport (
Expand All @@ -14,9 +15,9 @@ from cuspatial._lib.utils cimport unwrap_pyoptcol


def pairwise_point_linestring_nearest_points(
Column points_xy,
Column linestring_part_offsets,
Column linestring_points_xy,
plc_Column points_xy,
plc_Column linestring_part_offsets,
plc_Column linestring_points_xy,
multipoint_geometry_offset=None,
multilinestring_geometry_offset=None,
):
Expand All @@ -41,17 +42,22 @@ def pairwise_point_linestring_nearest_points(

multipoint_geometry_id = None
if multipoint_geometry_offset is not None:
multipoint_geometry_id = Column.from_unique_ptr(
move(c_result.nearest_point_geometry_id.value()))
multipoint_geometry_id = Column.from_pylibcudf(plc_Column.from_libcudf(
move(c_result.nearest_point_geometry_id.value())))

multilinestring_geometry_id = None
if multilinestring_geometry_offset is not None:
multilinestring_geometry_id = Column.from_unique_ptr(
move(c_result.nearest_linestring_geometry_id.value()))
multilinestring_geometry_id = Column.from_pylibcudf(
plc_Column.from_libcudf(
move(c_result.nearest_linestring_geometry_id.value())
)
)

segment_id = Column.from_unique_ptr(move(c_result.nearest_segment_id))
point_on_linestring_xy = Column.from_unique_ptr(
move(c_result.nearest_point_on_linestring_xy))
segment_id = Column.from_pylibcudf(
plc_Column.from_libcudf(move(c_result.nearest_segment_id))
)
point_on_linestring_xy = Column.from_pylibcudf(plc_Column.from_libcudf(
move(c_result.nearest_point_on_linestring_xy)))

return (
multipoint_geometry_id,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Copyright (c) 2023-2024, NVIDIA CORPORATION.
# Copyright (c) 2023-2025, NVIDIA CORPORATION.

from libcpp.memory cimport make_shared, shared_ptr, unique_ptr
from libcpp.utility cimport move

from cudf._lib.column cimport Column
from cudf.core.column.column import Column
from pylibcudf cimport Column as plc_Column
from pylibcudf.libcudf.column.column cimport column

from cuspatial._lib.cpp.column.geometry_column_view cimport (
Expand All @@ -16,8 +17,8 @@ from cuspatial._lib.cpp.types cimport collection_type_id, geometry_type_id


def pairwise_multipoint_equals_count(
Column _lhs,
Column _rhs,
plc_Column _lhs,
plc_Column _rhs,
):
cdef shared_ptr[geometry_column_view] lhs = \
make_shared[geometry_column_view](
Expand All @@ -41,4 +42,4 @@ def pairwise_multipoint_equals_count(
)
)

return Column.from_unique_ptr(move(result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(result)))
19 changes: 10 additions & 9 deletions python/cuspatial/cuspatial/_lib/pairwise_point_in_polygon.pyx
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
# Copyright (c) 2022-2025, NVIDIA CORPORATION.

from libcpp.memory cimport unique_ptr
from libcpp.utility cimport move

from cudf._lib.column cimport Column
from cudf.core.column.column import Column
from pylibcudf cimport Column as plc_Column
from pylibcudf.libcudf.column.column cimport column
from pylibcudf.libcudf.column.column_view cimport column_view

Expand All @@ -13,12 +14,12 @@ from cuspatial._lib.cpp.pairwise_point_in_polygon cimport (


def pairwise_point_in_polygon(
Column test_points_x,
Column test_points_y,
Column poly_offsets,
Column poly_ring_offsets,
Column poly_points_x,
Column poly_points_y
plc_Column test_points_x,
plc_Column test_points_y,
plc_Column poly_offsets,
plc_Column poly_ring_offsets,
plc_Column poly_points_x,
plc_Column poly_points_y
):
cdef column_view c_test_points_x = test_points_x.view()
cdef column_view c_test_points_y = test_points_y.view()
Expand All @@ -41,4 +42,4 @@ def pairwise_point_in_polygon(
)
)

return Column.from_unique_ptr(move(result))
return Column.from_pylibcudf(plc_Column.from_libcudf(move(result)))
Loading
Loading