Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap custom iterator result #17251

Merged
merged 6 commits into from
Nov 8, 2024
Merged

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Nov 6, 2024

Description

Fixes: #17165
Fixes: #14481

This PR properly wraps the result of custom iterator.

In [2]: import pandas as pd

In [3]: s = pd.Series([10, 1, 2, 3, 4, 5]*1000000)


# Without custom_iter:

In [4]: %timeit for i in s: True
6.34 s ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# This PR:

In [4]: %timeit for i in s: True
6.16 s ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# On `branch-24.12`:
1.53 s ± 6.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I think custom_iter has to exist. Here is why, invoking any sort of iteration on GPU objects will raise errors and thus in the end we fall-back to CPU. Instead of trying to move the objects from host to device memory (if the object is on host memory only), we will avoid a CPU-to-GPU transfer.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Nov 6, 2024
@galipremsagar galipremsagar added bug Something isn't working non-breaking Non-breaking change labels Nov 6, 2024
@galipremsagar galipremsagar marked this pull request as ready for review November 6, 2024 04:56
@galipremsagar galipremsagar requested a review from a team as a code owner November 6, 2024 04:56
@galipremsagar galipremsagar changed the title properly wrap custom iterator result Wrap custom iterator result Nov 6, 2024
@vyasr
Copy link
Contributor

vyasr commented Nov 6, 2024

@galipremsagar could you add a test that shows the undesirable fallback behavior of iter? You should be able to do this by monkey-patching os.environ with CUDF_PANDAS_FAIL_ON_FALLBACK and showing that fallback is triggered undesirably. That ought to be possible without a subprocess, but you might have to define a new custom proxy type without custom_iter to show it.

If that turns out to be too much work we can just document it, but I want to make it as obvious as possible to future devs why this is necessary.

@galipremsagar
Copy link
Contributor Author

@galipremsagar could you add a test that shows the undesirable fallback behavior of iter? You should be able to do this by monkey-patching os.environ with CUDF_PANDAS_FAIL_ON_FALLBACK and showing that fallback is triggered undesirably. That ought to be possible without a subprocess, but you might have to define a new custom proxy type without custom_iter to show it.

If that turns out to be too much work we can just document it, but I want to make it as obvious as possible to future devs why this is necessary.

I agree, I'll add a test and document the reason behind the custom iterator existence.

@vyasr
Copy link
Contributor

vyasr commented Nov 7, 2024

This PR also closes #14481 right?

@galipremsagar
Copy link
Contributor Author

I agree, I'll add a test and document the reason behind the custom iterator existence.

Done.

This PR also closes #14481 right?

Yes, this also closes #14481

@galipremsagar
Copy link
Contributor Author

@vyasr @Matt711 This is now ready for review.

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question and tiny suggestions

python/cudf/cudf/pandas/fast_slow_proxy.py Show resolved Hide resolved
python/cudf/cudf_pandas_tests/test_cudf_pandas.py Outdated Show resolved Hide resolved
python/cudf/cudf_pandas_tests/test_cudf_pandas.py Outdated Show resolved Hide resolved
Co-authored-by: Matthew Murray <41342305+Matt711@users.noreply.github.com>
@galipremsagar
Copy link
Contributor Author

Going ahead and merging this PR. @vyasr incase you have additional comments let me know I can address in a follow-up.

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 0f1ae26 into rapidsai:branch-24.12 Nov 8, 2024
102 checks passed
@vyasr
Copy link
Contributor

vyasr commented Nov 8, 2024

No worries, this looks great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.pandas Issues specific to cudf.pandas non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
3 participants