Add transpose API to pylibcudf #16749

mroeschke · 2024-09-04T23:07:10Z

Description

Contributes to #15162

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

mroeschke · 2024-09-05T23:16:47Z

Looks like this test expects the data pointer to be exposed after transpose

______________________________ test_df_transpose _______________________________
[gw7] linux -- Python 3.10.14 /opt/conda/envs/test/bin/python3.10

manager = <SpillManager device_memory_limit=N/A | 0B spilled | 57B (28%) unspilled (unspillable)>

    def test_df_transpose(manager: SpillManager):
        df1 = cudf.DataFrame({"a": [1, 2]})
        df2 = df1.transpose()
        # For now, all buffers are marked as exposed
        assert df1._data._data["a"].data.owner.exposed
>       assert df2._data._data[0].data.owner.exposed
E       assert False
E        +  where False = <cudf.core.buffer.spillable_buffer.SpillableBufferOwner object at 0x7f6b538952d0>.exposed
E        +    where <cudf.core.buffer.spillable_buffer.SpillableBufferOwner object at 0x7f6b538952d0> = SpillableBuffer(owner=<cudf.core.buffer.spillable_buffer.SpillableBufferOwner object at 0x7f6b538952d0>, offset=0, size=8).owner
E        +      where SpillableBuffer(owner=<cudf.core.buffer.spillable_buffer.SpillableBufferOwner object at 0x7f6b538952d0>, offset=0, size=8) = <cudf.core.column.numerical.NumericalColumn object at 0x7f6b53886830>\n[\n  1\n]\ndtype: int64.data

tests/test_spilling.py:580: AssertionError

Would this require cudf._lib.column.Column.from_pylibcudf to implement the data_ptr_exposed keyword?

https://github.com/rapidsai/cudf/blob/branch-24.10/python/cudf/cudf/_lib/column.pyx#L602-L605

python/pylibcudf/pylibcudf/tests/test_transpose.py

python/pylibcudf/pylibcudf/transpose.pyx

wence- · 2024-09-06T10:26:15Z

python/cudf/cudf/_lib/transpose.pyx

-    # Notice, the data pointer of `result_owner` has been exposed
-    # through `c_result.second` at this point.
-    result_owner = Column.from_unique_ptr(
-        move(c_result.first), data_ptr_exposed=True
-    )
-    return columns_from_table_view(
-        c_result.second,
-        owners=[result_owner] * c_result.second.num_columns()
+    input_table = plc.table.Table(
+        [col.to_pylibcudf(mode="read") for col in source_columns]
    )
+    _, result_table = plc.transpose.transpose(input_table)
+    return [Column.from_pylibcudf(col) for col in result_table.columns()]


@madsbk: can you remind me what it means that the result_owner is exposed through the table (c_result.second).

Is it that we have, now, two Buffers that point to the same data, and therefore if we were to spill one, we would need to spill the other?

I think this is right, and so yes, I think we do need (@mroeschke) to have a way of marking a column's data as exposed when we import it from pylibcudf.

Is it that we have, now, two Buffers that point to the same data, and therefore if we were to spill one, we would need to spill the other?

Yes

I think this is right, and so yes, I think we do need (@mroeschke) to have a way of marking a column's data as exposed when we import it from pylibcudf.

There is a data_ptr_exposed keyword in from_pylibcudf that currently isn't implemented. I think we need to pass that parameter through to the exposed keyword in as_buffer?

Yes, sounds right

It looks like this was addressed in #16760

Yes, it looks like the necessary parameter was handled there so this PR should be safe to merge now.

…ranspose

galipremsagar · 2024-09-25T22:10:59Z

/merge

Add docs and initial implementation for tranpose

2a11f73

mroeschke added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package labels Sep 4, 2024

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue and removed pylibcudf Issues specific to the pylibcudf package labels Sep 4, 2024

mroeschke added 5 commits September 4, 2024 16:07

Wrong cdef variable syntax

44a503f

Style add small test

7b77ab4

Parameterize on more cases

28ef562

Fix doc title

0ce3edf

More unit tests, incoporate change into cudf._libs

e3b7a40

mroeschke marked this pull request as ready for review September 5, 2024 20:19

mroeschke requested a review from a team as a code owner September 5, 2024 20:19

mroeschke requested review from wence- and Matt711 September 5, 2024 20:19

Matt711 reviewed Sep 6, 2024

View reviewed changes

python/pylibcudf/pylibcudf/tests/test_transpose.py Outdated Show resolved Hide resolved

wence- reviewed Sep 6, 2024

View reviewed changes

mroeschke added 8 commits September 6, 2024 16:36

Merge remote-tracking branch 'upstream/branch-24.10' into pylibcudf/t…

d01c4c1

…ranspose

use pandas transpose as comparison

7f8484f

Have function just return Table

9c065aa

Merge remote-tracking branch 'upstream/branch-24.10' into pylibcudf/t…

49a0680

…ranspose

Pass through data_pointer_exposed, type owner_table

023da03

Ensure exposed to passed to children and null_mask too

fe0bcc3

typo

4c17577

Add version skip

8d70df7

mroeschke mentioned this pull request Sep 10, 2024

[FEA] Implement all libcudf modules required by cuDF Python in pylibcudf #15162

Open

Matt711 added the pylibcudf Issues specific to the pylibcudf package label Sep 19, 2024

Merge branch 'branch-24.10' into pylibcudf/transpose

48a1e15

Matt711 approved these changes Sep 25, 2024

View reviewed changes

galipremsagar approved these changes Sep 25, 2024

View reviewed changes

rapids-bot bot merged commit 503ce03 into rapidsai:branch-24.10 Sep 25, 2024
99 checks passed

mroeschke deleted the pylibcudf/transpose branch September 25, 2024 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transpose API to pylibcudf #16749

Add transpose API to pylibcudf #16749

mroeschke commented Sep 4, 2024

mroeschke commented Sep 5, 2024 •

edited

Loading

wence- Sep 6, 2024

madsbk Sep 6, 2024

mroeschke Sep 7, 2024

madsbk Sep 9, 2024

Matt711 Sep 25, 2024

vyasr Sep 25, 2024

galipremsagar commented Sep 25, 2024

Add transpose API to pylibcudf #16749

Add transpose API to pylibcudf #16749

Conversation

mroeschke commented Sep 4, 2024

Description

Checklist

mroeschke commented Sep 5, 2024 • edited Loading

wence- Sep 6, 2024

Choose a reason for hiding this comment

madsbk Sep 6, 2024

Choose a reason for hiding this comment

mroeschke Sep 7, 2024

Choose a reason for hiding this comment

madsbk Sep 9, 2024

Choose a reason for hiding this comment

Matt711 Sep 25, 2024

Choose a reason for hiding this comment

vyasr Sep 25, 2024

Choose a reason for hiding this comment

galipremsagar commented Sep 25, 2024

mroeschke commented Sep 5, 2024 •

edited

Loading