Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for strings::slice for wide strings #16574

Merged
merged 10 commits into from
Sep 5, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Improves performance of wide strings (avg > 64 bytes) when using cudf::strings::slice_strings.
Addresses some concerns from issue #15924

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 15, 2024
@davidwendt davidwendt self-assigned this Aug 15, 2024
@github-actions github-actions bot added the CMake CMake build issue label Aug 15, 2024
@davidwendt
Copy link
Contributor Author

Using the code from the notebook in #15924 this improves the slice time significantly from

1m_data.txt: last position slice time = 0.6162 seconds

to

1m_data.txt: last position slice time = 0.0345 seconds

@github-actions github-actions bot removed the CMake CMake build issue label Sep 3, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 3, 2024
@davidwendt davidwendt marked this pull request as ready for review September 3, 2024 14:22
@davidwendt davidwendt requested a review from a team as a code owner September 3, 2024 14:22
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work.

cpp/src/strings/slice.cu Outdated Show resolved Hide resolved
Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Looks good otherwise!

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 949f171 into rapidsai:branch-24.10 Sep 5, 2024
86 checks passed
@davidwendt davidwendt deleted the perf-strings-slice branch September 5, 2024 13:52
rjzamora pushed a commit to rjzamora/cudf that referenced this pull request Sep 6, 2024
…#16574)

Improves performance of wide strings (avg > 64 bytes) when using `cudf::strings::slice_strings`.
Addresses some concerns from issue rapidsai#15924

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: rapidsai#16574
res-life pushed a commit to res-life/cudf that referenced this pull request Sep 11, 2024
…#16574)

Improves performance of wide strings (avg > 64 bytes) when using `cudf::strings::slice_strings`.
Addresses some concerns from issue rapidsai#15924

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: rapidsai#16574
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants