-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add throughput metrics for REDUCTION_BENCH/REDUCTION_NVBENCH benchmarks #16126
Conversation
…s or GlobalMem BW for nvbench, for reduction benchmarks
Thanks for working on this @jihoonson |
Thanks @davidwendt, I will fix the copyrights. Do you think it's a good idea to fix them in the reduction benchmark files as well in this PR? I can do that if so. |
Any existing file you change should have 2024 added/updated to its range if it does not already have it. |
Thanks @davidwendt. Fixed the copyrights as suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. 👍
Minor updates with new function argument usage.
cpp/benchmarks/reduction/anyall.cpp
Outdated
|
||
// The benchmark takes a column and produces one scalar. | ||
set_items_processed(state, column_size + 1); | ||
set_bytes_processed(state, estimate_size(std::move(values)) + cudf::size_of(output_dtype)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_bytes_processed(state, estimate_size(std::move(values)) + cudf::size_of(output_dtype)); | |
set_bytes_processed(state, estimate_size(*values) + cudf::size_of(output_dtype)); |
similarly at other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think values->view()
is more clear but leave it up to you if you'd rather use *values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @karthikeyann, thanks for the review. I just want to better understand your comment. Your seem to be suggesting to pass a column_view
instead of moving the column
. This has been done in 40804e2. Or, are you suggesting to use the *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw David's comment above. I also find values->view()
more explicit and clear, so would like to keep this pattern unless you feel strongly about it.
/ok to test |
/ok to test |
@davidwendt @karthikeyann thanks for the review! This PR seems to have passed all checks. What will be the next step? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one!
/ok to test |
Hmm I'm not sure why the job |
Looks like something got stuck. I kicked off a re-run. |
Thanks! It's all green now 🙂 |
/merge |
Description
This PR addresses #13735 for reduction benchmarks. There are 3 new utils added.
int64_t estimate_size(cudf::table_view)
returns a size estimate for the given table. Addbytes_per_second
to groupby max benchmark. #13984 was a previous attempt to add a similar utility, but this implementation usescudf::row_bit_count()
as suggested in Addbytes_per_second
to groupby max benchmark. #13984 (comment) instead of manually estimating the size.void set_items_processed(State& state, int64_t items_processed_per_iteration)
is a thin wrapper ofState.SetItemsProcessed()
. This wrapper takesitems_processed_per_iteration
as a parameter instead oftotal_items_processed
. This could be useful to avoid repeatingState.iterations() * items_processed_per_iteration
in each benchmark class.void set_throughputs(nvbench::state& state)
is added as a workaround for Throughput statistics are not calculated when reads/writes are declared afterstate.exec()
NVIDIA/nvbench#175. We sometimes want to set throughput statistics afterstate.exec()
calls especially when it is hard to estimate the result size upfront.Here are snippets of reduction benchmarks after this change.
Note that, when the data type is a 1-byte-width type in the google benchmark result summary,
bytes_per_second
appears to be smaller thanitems_per_second
. This is because the former is a multiple of 1000 whereas the latter is a multiple of 1024. They are in fact the same number.Implementation-wise, these are what I'm not sure if I made a best decision.
Checklist