Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expose stream-ordered APIs in pylibcudf #15163

Open
vyasr opened this issue Feb 27, 2024 · 2 comments
Open

[FEA] Expose stream-ordered APIs in pylibcudf #15163

vyasr opened this issue Feb 27, 2024 · 2 comments
Assignees
Labels
feature request New feature or request pylibcudf Issues specific to the pylibcudf package

Comments

@vyasr
Copy link
Contributor

vyasr commented Feb 27, 2024

Is your feature request related to a problem? Please describe.
There is currently no way to run cuDF operations in a stream-ordered manner. Since cuDF is deeply tied to the pandas API, there are also limits to how much stream-ordering may be exposed in the public API. pylibcudf has no such restrictions and should allow complete control over streams.

Describe the solution you'd like
libcudf APIs are being incrementally modified to support stream-ordering. Once #13744 is complete, pylibcudf should expose the same functionality in its APIs.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
#13087 and #13509 are examples of where stream-ordered Python APIs could be useful.

@bdice
Copy link
Contributor

bdice commented Jan 30, 2025

Streams are now available in all libcudf public APIs! #13744 is now complete! 🎉

We also have streams in the public API of RMM: rapidsai/rmm#1770

Adding streams to pylibcudf should now be fully unblocked.

@JigaoLuo
Copy link

JigaoLuo commented Feb 3, 2025

Thanks for adding stream support in libcudf. I came across this while exploring Parquet reading in libcudf and have also submitted a request to expose stream support in Python for a similar performance boost.

FEA:
Parquet reading in libcudf now supports CUDA streams, but this functionality is not yet exposed in Python via cudf or pylibcudf. This omission limits potential performance gains for Python users. Adding multi-stream support for Parquet reading in Python would greatly benefit users by improving I/O and computation pipelining, enabling better GPU utilization.

With stream support now available across all libcudf public APIs, it would be valuable to bring this feature to Python, allowing users to take full advantage of optimized Parquet reading within cudf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request pylibcudf Issues specific to the pylibcudf package
Projects
Status: Todo
Development

No branches or pull requests

4 participants