[FEA] Add `bytes_per_second` to all libcudf benchmarks #13735

GregoryKimball · 2023-07-23T21:11:18Z

Many libcudf benchmarks report the bytes_per_second processed as part of the output data. This value is useful for comparing benchmarks because it normalizes the increasing execution time from processing more data. Also, bytes_per_second and real_time_s together let us compute bytes_processed values which serve as a useful Y-axis.

As of the end of 23.12 development, many benchmarks still do not report bytes_per_second in the output data. Here is a figure summarizing the metrics reported by the benchmark suite.

benchmark	status	notes
`APPLY_BOOLEAN_MASK`		see #13937
`BINARYOP`		see #13938
`COPY_IF_ELSE`		see #13960
`GROUPBY`		see #13984
`GROUPBY_NV`
`HASHING`		see #13967, #13965
`JOIN`
`JOIN_NV`
`QUANTILES`
`REDUCTION`		see #16126
`REPLACE`
`SEARCH`
`SEARCH_NV`
`SET_OPS_NV`
`SHIFT`		see #13950
`SORT`
`SORT_NV`
`STREAM_COMPACTION_NV`		see #14172
`TRANSPOSE`		see #14170

Note: For this tracking list, cuIO benchmarks are omitted because "encoded file size" serves a similar purpose.

The text was updated successfully, but these errors were encountered:

Blonck · 2023-08-22T13:44:45Z

Hi, I want to start contributing to cudf and would like to work on this issue.

Should I create one PR per benchmark or put all in one PR?

davidwendt · 2023-08-23T12:52:33Z

Should I create one PR per benchmark or put all in one PR?

Thanks for looking into this. Smaller PRs are easier to review so I would prefer one PR per benchmark.

Due to missing parentheses in APPLY_BOOLEAN_MASK benchmark the number of written bytes were not multiplied with the number of iterations of this benchmark. This patches relates to rapidsai#13735.

Due to missing parentheses in APPLY_BOOLEAN_MASK benchmark the number of written bytes were not multiplied by the number of iterations of this benchmark. This patch relates to rapidsai#13735.

To add `bytes_per_second` a call to `SetBytesProcessed()` with number of written and read bytes is added to the benchmark. This patch relates to rapidsai#13735.

To add `bytes_per_second`, a call to `SetBytesProcessed()` with the number of written and read bytes is added to the benchmark. This patch relates to rapidsai#13735.

…13937) Due to missing parentheses in APPLY_BOOLEAN_MASK benchmark, the number of written bytes were not multiplied by the number of iterations of this benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #13937

This patch relates to rapidsai#13735.

To add `bytes_per_second`, a call to `SetBytesProcessed()` with the number of written and read bytes is added to the benchmark. This patch relates to rapidsai#13735.

To add `bytes_per_second`, a call to `.SetBytesProcessed()` with the number of written and read bytes is added to the benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Yunsong Wang (https://github.com/PointKernel) URL: #13938

Adds `bytes_per_second to `COPY_IF_ELSE_BENCH`. This patch relates to rapidsai#13735.

Adds `bytes_per_second` to the `PARTITION_BENCH` benchmark. This patch relates to rapidsai#13735.

In the past, the HASING_NVBENCH benchmark treated the nulls parameter as a boolean. Any value other than 0.0 resulted in a null probability of 1.0. Now, the nulls parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to rapidsai#13735.

In the past, the HASING_NVBENCH benchmark treated the nulls parameter as a boolean. Any value other than 0.0 resulted in a null probability of 100% for the generated data. Now, the nulls parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to rapidsai#13735.

In the past, the `HASING_NVBENCH` benchmark treated the `nulls` parameter as a boolean. Any value other than 0.0 resulted in a null probability of 100% for the generated data. Now, the `nulls` parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to rapidsai#13735.

To calculate the number of bytes written and read by the benchmark a few helper function are introduced which calculate the payload size of a column, a table, and the results of a groupby. This patch relates to rapidsai#13735.

Adds `bytes_per_second` to the `COPY_IF_ELSE_BENCH` benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #13960

Adds `bytes_per_second` to the `PARTITION_BENCH` benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) - Mark Harris (https://github.com/harrism) Approvers: - David Wendt (https://github.com/davidwendt) - Mark Harris (https://github.com/harrism) URL: #13965

This patch adds memory statistics for the GROUPBY_NVBENCH benchmarks. For this purpose helper functions are introduced to compute the payload size for: - Column - Table - Groupby execution results This patch relates to rapidsai#13735.

In the past, the HASING_NVBENCH benchmark treated the `nulls` parameter as a boolean. Any value other than 0.0 resulted in a null probability of 1.0. Now, the `nulls` parameter directly determines the null probability. For instance, a value of 0.1 will generate 10% of the data as null. Moreover, setting nulls to 0.0 produces data without a null bitmask. Additionally, `bytes_per_second` are added to the benchmark. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) - Yunsong Wang (https://github.com/PointKernel) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) URL: #13967

This patch adds memory statistics for the GROUPBY_NVBENCH benchmarks. For this purpose helper functions are introduced to compute the payload size for: - Column - Table - Groupby execution results This patch relates to rapidsai#13735.

This patch relates to rapidsai#13735.

#14172) This patch relates to #13735. Benchmark: [benchmark_distinct_count.txt](https://github.com/rapidsai/cudf/files/12700496/benchmark_distinct_count.txt) Authors: - Martin Marenz (https://github.com/Blonck) - Mark Harris (https://github.com/harrism) Approvers: - David Wendt (https://github.com/davidwendt) - Karthikeyan (https://github.com/karthikeyann) - Mark Harris (https://github.com/harrism) URL: #14172

This patch relates to #13735. Benchmark: [transpose_benchmark.txt](https://github.com/rapidsai/cudf/files/12699834/transpose_benchmark.txt) Authors: - Martin Marenz (https://github.com/Blonck) - Mark Harris (https://github.com/harrism) Approvers: - Mark Harris (https://github.com/harrism) - Yunsong Wang (https://github.com/PointKernel) URL: #14170

Adds `bytes_per_second` to `SHIFT_BENCH`. This patch relates to #13735. Authors: - Martin Marenz (https://github.com/Blonck) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) - Mark Harris (https://github.com/harrism) URL: #13950

jihoonson · 2024-06-26T21:28:25Z

It is a good idea to have some metrics for data size. I wonder whether it would be even better if we have separate metrics for input and output data. For example, we could add custom metrics like input_items_per_second, output_items_per_second, and input_bytes_per_second and output_bytes_per_second. The first two are the number of items (usually rows), and the last two are the byte sizes of input and output.

…ks (#16126) This PR addresses #13735 for reduction benchmarks. There are 3 new utils added. - `int64_t estimate_size(cudf::table_view)` returns a size estimate for the given table. #13984 was a previous attempt to add a similar utility, but this implementation uses `cudf::row_bit_count()` as suggested in #13984 (comment) instead of manually estimating the size. - `void set_items_processed(State& state, int64_t items_processed_per_iteration)` is a thin wrapper of `State.SetItemsProcessed()`. This wrapper takes `items_processed_per_iteration` as a parameter instead of `total_items_processed`. This could be useful to avoid repeating `State.iterations() * items_processed_per_iteration` in each benchmark class. - `void set_throughputs(nvbench::state& state)` is added as a workaround for NVIDIA/nvbench#175. We sometimes want to set throughput statistics after `state.exec()` calls especially when it is hard to estimate the result size upfront. Here are snippets of reduction benchmarks after this change. ``` $ cpp/build/benchmarks/REDUCTION_BENCH ... ----------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------------------- Reduction/bool_all/10000/manual_time 10257 ns 26845 ns 68185 bytes_per_second=929.907M/s items_per_second=975.078M/s Reduction/bool_all/100000/manual_time 11000 ns 27454 ns 63634 bytes_per_second=8.46642G/s items_per_second=9.09075G/s Reduction/bool_all/1000000/manual_time 12671 ns 28658 ns 55261 bytes_per_second=73.5018G/s items_per_second=78.922G/s ... $ cpp/build/benchmarks/REDUCTION_NVBENCH ... ## rank_scan ### [0] NVIDIA RTX A5500 | T | null_probability | data_size | Samples | CPU Time | Noise | GPU Time | Noise | Elem/s | GlobalMem BW | BWUtil | |-----------------|------------------|-----------|---------|------------|--------|------------|-------|----------|--------------|-----------| | I32 | 0 | 10000 | 16992x | 33.544 us | 14.95% | 29.446 us | 5.58% | 82.321M | 5.596 TB/s | 728.54% | | I32 | 0.1 | 10000 | 16512x | 34.358 us | 13.66% | 30.292 us | 2.87% | 80.020M | 5.286 TB/s | 688.17% | | I32 | 0.5 | 10000 | 16736x | 34.058 us | 14.31% | 29.890 us | 3.40% | 81.097M | 5.430 TB/s | 706.89% | ... ``` Note that, when the data type is a 1-byte-width type in the google benchmark result summary, `bytes_per_second` appears to be smaller than `items_per_second`. This is because the former is a multiple of 1000 whereas the latter is a multiple of 1024. They are in fact the same number. Implementation-wise, these are what I'm not sure if I made a best decision. - Each of new utils above is declared and defined in different files. I did this because I could not find a good place to have them all, and they seem to belong to different utilities. Please let me know if there is a better place for them. - All the new utils are defined in the global namespace since other util functions seem to have been defined in the same way. Please let me know if this is not the convention. Authors: - Jihoon Son (https://github.com/jihoonson) Approvers: - Mark Harris (https://github.com/harrism) - David Wendt (https://github.com/davidwendt) URL: #16126

GregoryKimball · 2024-07-03T16:48:07Z

Thank you @jihoonson for your suggestion. I agree that we need better and more granular throughput metrics. I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks. From there we can add the separate input/output and bytes/items metrics as you suggest.

jihoonson · 2024-07-03T20:46:38Z

@GregoryKimball sounds good 🙂 As my first PR has been merged, I will continue working on the remaining benchmarks.

jihoonson · 2024-07-03T21:06:08Z

I would like to start with an input+output bytes per second metric that is consistently implemented in all benchmarks.

I started thinking that perhaps we should enforce all benchmarks to add such metrics. One way to do it is having new benchmark fixtures, which would explode at the end of the benchmark if those metrics are not set. Then we could update every benchmark to use the new fixtures. @GregoryKimball do you think this is a good idea?

GregoryKimball · 2024-08-01T23:25:20Z

we should enforce all benchmarks to add such metrics

Thank you @jihoonson for this suggestion. We might go this route at some point, but for now I think converting the rest of the benchmarks from gbench to nvbench and adding the missing metrics is higher priority.

GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment good first issue Good for newcomers libcudf Affects libcudf (C++/CUDA) code. labels Jul 23, 2023

GregoryKimball added this to the Benchmarking milestone Jul 23, 2023

GregoryKimball added this to libcudf Jul 23, 2023

GregoryKimball moved this to Pairing in libcudf Aug 8, 2023

GregoryKimball removed the status in libcudf Aug 8, 2023

Blonck mentioned this issue Aug 23, 2023

Fixed processed bytes calculation in APPLY_BOOLEAN_MASK benchmark. #13937

Merged

1 task

Blonck mentioned this issue Aug 23, 2023

Add bytes_per_second to compiled binaryop benchmark #13938

Merged

1 task

Blonck pushed a commit to Blonck/cudf that referenced this issue Aug 24, 2023

Add bytes_per_second to shift benchmark

b7f705e

This patch relates to rapidsai#13735.

Blonck mentioned this issue Aug 24, 2023

Add bytes_per_second to shift benchmark #13950

Merged

1 task

Blonck pushed a commit to Blonck/cudf that referenced this issue Aug 25, 2023

Add bytes_per_second to copy_if_else benchmark

808fdc1

Adds `bytes_per_second to `COPY_IF_ELSE_BENCH`. This patch relates to rapidsai#13735.

Blonck mentioned this issue Aug 25, 2023

Add bytes_per_second to copy_if_else benchmark #13960

Merged

1 task

Blonck pushed a commit to Blonck/cudf that referenced this issue Aug 25, 2023

Add bytes_per_second to hash_partition benchmark

17fe3c7

Adds `bytes_per_second` to the `PARTITION_BENCH` benchmark. This patch relates to rapidsai#13735.

Blonck mentioned this issue Aug 25, 2023

Add bytes_per_second to hash_partition benchmark #13965

Merged

1 task

Blonck mentioned this issue Aug 25, 2023

Enable fractional null probability for hashing benchmark #13967

Merged

1 task

Blonck mentioned this issue Aug 28, 2023

Add bytes_per_second to groupby max benchmark. #13984

Draft

1 task

Blonck added a commit to Blonck/cudf that referenced this issue Sep 22, 2023

Add bytes_per_second to transpose benchmark

e25536e

This patch relates to rapidsai#13735.

Blonck added a commit to Blonck/cudf that referenced this issue Sep 22, 2023

Add bytes_per_second to transpose benchmark

cc743eb

This patch relates to rapidsai#13735.

Blonck added a commit to Blonck/cudf that referenced this issue Sep 22, 2023

Add bytes_per_second to transpose benchmark

ad49fe4

This patch relates to rapidsai#13735.

Blonck added a commit to Blonck/cudf that referenced this issue Sep 22, 2023

Add bytes_per_second to transpose benchmark

a56aadb

This patch relates to rapidsai#13735.

Blonck mentioned this issue Sep 22, 2023

Add bytes_per_second to transpose benchmark #14170

Merged

1 task

Blonck added a commit to Blonck/cudf that referenced this issue Sep 22, 2023

Add bytes_per_second to distinct_count of stream_compaction nvbench.

676e82d

This patch relates to rapidsai#13735.

Blonck mentioned this issue Sep 22, 2023

Add bytes_per_second to distinct_count of stream_compaction nvbench. #14172

Merged

1 task

Blonck added a commit to Blonck/cudf that referenced this issue Sep 26, 2023

Add bytes_per_second to distinct_count of stream_compaction nvbench.

a6323f2

This patch relates to rapidsai#13735.

Blonck added a commit to Blonck/cudf that referenced this issue Sep 26, 2023

Add bytes_per_second to transpose benchmark

66f43e4

This patch relates to rapidsai#13735.

Blonck pushed a commit to Blonck/cudf that referenced this issue Sep 26, 2023

Add bytes_per_second to shift benchmark

0664c8b

This patch relates to rapidsai#13735.

GregoryKimball removed this from libcudf Oct 26, 2023

GregoryKimball added this to libcudf Nov 3, 2023

jihoonson mentioned this issue Jun 28, 2024

Add throughput metrics for REDUCTION_BENCH/REDUCTION_NVBENCH benchmarks #16126

Merged

3 tasks

kingcrimsontianyu self-assigned this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add `bytes_per_second` to all libcudf benchmarks #13735

[FEA] Add `bytes_per_second` to all libcudf benchmarks #13735

GregoryKimball commented Jul 23, 2023 •

edited by harrism

Loading

Blonck commented Aug 22, 2023

davidwendt commented Aug 23, 2023

jihoonson commented Jun 26, 2024

GregoryKimball commented Jul 3, 2024

jihoonson commented Jul 3, 2024

jihoonson commented Jul 3, 2024

GregoryKimball commented Aug 1, 2024 •

edited

Loading

[FEA] Add bytes_per_second to all libcudf benchmarks #13735

[FEA] Add bytes_per_second to all libcudf benchmarks #13735

Comments

GregoryKimball commented Jul 23, 2023 • edited by harrism Loading

Blonck commented Aug 22, 2023

davidwendt commented Aug 23, 2023

jihoonson commented Jun 26, 2024

GregoryKimball commented Jul 3, 2024

jihoonson commented Jul 3, 2024

jihoonson commented Jul 3, 2024

GregoryKimball commented Aug 1, 2024 • edited Loading

[FEA] Add `bytes_per_second` to all libcudf benchmarks #13735

[FEA] Add `bytes_per_second` to all libcudf benchmarks #13735

GregoryKimball commented Jul 23, 2023 •

edited by harrism

Loading

GregoryKimball commented Aug 1, 2024 •

edited

Loading