Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add option to disable measuring buffer copy time during serialization. #11995

Open
liurenjie1024 opened this issue Jan 22, 2025 · 0 comments
Assignees
Labels
feature request New feature or request

Comments

@liurenjie1024
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
During our benchmark, we realized that measuring buffer copy adds a lot of overhead during serialization, and disabling it could improve about 25 - 30 % throughtput for large partition.

Describe the solution you'd like
Add option to disable it. It will be split into several parts:

  1. Add option to disable it in spark rapids. This will show/hide the metrics, but will not disable measuring in kudo serializer actually.
  2. Add option to disable it in kudo serializer.
  3. Actually enable/disable it in spark rapids when calling kudo serializer.
@liurenjie1024 liurenjie1024 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 22, 2025
@liurenjie1024 liurenjie1024 self-assigned this Jan 22, 2025
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 29, 2025
liurenjie1024 added a commit that referenced this issue Feb 6, 2025
<!--

Thank you for contributing to RAPIDS Accelerator for Apache Spark!

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
   work on top of it, then remove the `[WIP]` label (if present).

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please avoid
rebasing your branch during the review process, as this causes the
context
of any comments made by reviewers to be lost. If conflicts occur during
review then they should be resolved by merging into the branch used for
   making the pull request.

Many thanks in advance for your cooperation!

-->

This is the first step of #11995 .

It adds an option to disable measuring copy buffer time in spark-rapids.
It doesn't actually disable it for kudo serializer, but could hide
metrics.

---------

Signed-off-by: Ray Liu <liurenjie2008@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants