Add `bytes_per_second` to transpose benchmark #14170

Blonck · 2023-09-22T10:34:59Z

This patch relates to #13735.

Benchmark: transpose_benchmark.txt

Checklist

I am familiar with the Contributing Guidelines.

copy-pr-bot · 2023-09-22T10:35:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

davidwendt · 2023-09-22T12:26:59Z

/ok to test

This patch relates to rapidsai#13735.

harrism

Thanks for contributing this. Just one request for maintainability.

harrism · 2023-09-26T21:12:56Z

cpp/benchmarks/transpose/transpose.cpp

@@ -40,16 +40,29 @@ static void BM_transpose(benchmark::State& state)
    cuda_event_timer raii(state, true);
    auto output = cudf::transpose(input);
  }
+
+  // Collect memory statistics.
+  auto const bytes_read    = input.num_columns() * input.num_rows() * (sizeof(int32_t));


I would like to avoid potential future type mismatches that result in wrong bytes/s reports. So I think you should stash the type_id in a variable above:

constexpr auto column_type = cudf::type_id::INT32;

And then here use CUDF's id_to_type utility:

Suggested change

auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(int32_t));

auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(cudf::id_to_type(column_type)));

See https://docs.rapids.ai/api/libcudf/stable/group__utility__dispatcher#gad7e12b8accf60e7c0e500294e1ee8536

harrism · 2023-09-27T22:42:26Z

cpp/benchmarks/transpose/transpose.cpp

@@ -42,7 +44,7 @@ static void BM_transpose(benchmark::State& state)
  }

  // Collect memory statistics.
-  auto const bytes_read    = input.num_columns() * input.num_rows() * (sizeof(int32_t));
+  auto const bytes_read    = input.num_columns() * input.num_rows() * cudf::size_of(column_type);


💡 suggestion: ‏ This is one way to do it. But I think the way I suggested is a bit better because it all happens at compile time, whereas cudf::size_of() invokes the type dispatcher at run time. Not that it will affect benchmarks, but it just seems cleaner to use sizeof(cudf::id_to_type<column_type>).

harrism

Thanks @Blonck !

harrism · 2023-09-28T20:52:06Z

/ok to test

harrism

Thanks @Blonck !

PointKernel · 2023-09-28T23:21:02Z

@Blonck Can you please rebase with the latest branch-23.12 and fix the formatting issues?

cpp/benchmarks/transpose/transpose.cpp

wence- · 2023-10-02T08:07:11Z

cpp/benchmarks/transpose/transpose.cpp

+  auto const bytes_written = bytes_read;
+  // Account for nullability in input and output.
+  auto const null_bytes =
+    2 * input.num_columns() * cudf::bitmask_allocation_size_bytes(input.num_rows());


suggestion: ‏This one could also overflow, I think, perhaps:

Suggested change

2 * input.num_columns() * cudf::bitmask_allocation_size_bytes(input.num_rows());

2 * static_cast<uint64_t>(input.num_columns()) * cudf::bitmask_allocation_size_bytes(input.num_rows());

?

Are you sure about this one? Since the return type of cudf::bitmask_allocation_size_bytes is std::size_t which is either unsigned long or unsigned long long so for reasonable input sizes the integer type promotion will avoid the overflow (https://cppinsights.io/s/26f977cb).

That said, just having this discussion indicates I should've included an explicit cast upfront to clear up any potential confusion.

Left-to-right associativity means that this is evaluated as (2 * ncol) * nrow, the first multiplication is performed in size_type (AKA, int32_t), so that could overflow, no? Although I think these benchmarks are generally run with fewer than $2^{30}$ rows, there's in general no reason why they couldn't be (although the transpose performance will be terrible I grant you).

Why not just always put the thing that returns size_t (sizeof or bitmask_allocation_size_bytes) first in the arithmetic in all of these PRs?

Personally, I would keep the cast explicit to make visible what is happening, but I don't have a strong stance on it.

wence- · 2023-10-02T08:07:34Z

/ok to test

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

harrism · 2023-10-02T20:18:45Z

/ok to test

harrism · 2023-10-03T20:57:59Z

/ok to test

harrism · 2023-10-03T21:00:49Z

/ok to test

harrism · 2023-10-03T21:02:57Z

/merge

harrism · 2023-10-04T20:54:38Z

/ok to test

harrism · 2023-10-04T20:54:43Z

/merge

ttnghia · 2023-10-10T22:04:53Z

/merge

harrism · 2023-10-11T00:22:53Z

Wonder why my merge commands weren't accepted.

ttnghia · 2023-10-11T04:03:25Z

It means now github starts to realize that you are no longer cudf developer 😆

Blonck requested a review from a team as a code owner September 22, 2023 10:34

Blonck requested review from bdice and vuule September 22, 2023 10:35

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 22, 2023

davidwendt added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 22, 2023

Blonck added 2 commits September 26, 2023 15:29

Add bytes_per_second to transpose benchmark

66f43e4

This patch relates to rapidsai#13735.

Run clang-format on transpose benchmark

d2676b5

Blonck force-pushed the processed_bytes_transpose_bench branch from ded8fa9 to d2676b5 Compare September 26, 2023 13:46

Blonck requested review from a team as code owners September 26, 2023 13:46

Blonck requested a review from charlesbluca September 26, 2023 13:46

Blonck changed the base branch from branch-23.10 to branch-23.12 September 26, 2023 13:46

wence- removed request for a team and charlesbluca September 26, 2023 14:00

harrism requested changes Sep 26, 2023

View reviewed changes

harrism mentioned this pull request Sep 26, 2023

Add bytes_per_second to shift benchmark #13950

Merged

1 task

Blonck added 3 commits September 27, 2023 08:21

Refactor column type in transpose benchmark

f594a81

Merge branch 'branch-23.12' into processed_bytes_transpose_bench

dcd9ef1

Fix transpose benchmark type issue

60845b7

harrism reviewed Sep 27, 2023

View reviewed changes

Blonck added 2 commits September 28, 2023 14:48

Merge branch 'branch-23.12' into processed_bytes_transpose_bench

05e1afa

Change way column type is handled in transpose bench

e9280cc

harrism approved these changes Sep 28, 2023

View reviewed changes

PointKernel approved these changes Sep 28, 2023

View reviewed changes

Blonck added 2 commits September 30, 2023 14:00

Merge branch 'branch-23.12' into processed_bytes_transpose_bench

4187495

Fix code style in transpose benchmark

c50a5ef

bdice reviewed Sep 30, 2023

View reviewed changes

cpp/benchmarks/transpose/transpose.cpp Outdated Show resolved Hide resolved

Avoid potential integer overflow in transpose benchmark.

8083756

wence- reviewed Oct 2, 2023

View reviewed changes

Update cpp/benchmarks/transpose/transpose.cpp

e6d9f24

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

Fix code style in transpose benchmark

82cbcf9

Merge branch 'branch-23.12' into processed_bytes_transpose_bench

ed8aa1e

Blonck and others added 2 commits October 4, 2023 16:55

Merge branch 'branch-23.12' into processed_bytes_transpose_bench

943b9f3

Merge branch 'branch-23.12' into processed_bytes_transpose_bench

09f10d8

rapids-bot bot merged commit c0c7ed8 into rapidsai:branch-23.12 Oct 10, 2023
66 checks passed

GregoryKimball mentioned this pull request Nov 3, 2023

[FEA] Add bytes_per_second to all libcudf benchmarks #13735

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `bytes_per_second` to transpose benchmark #14170

Add `bytes_per_second` to transpose benchmark #14170

Blonck commented Sep 22, 2023

copy-pr-bot bot commented Sep 22, 2023

davidwendt commented Sep 22, 2023

harrism left a comment

harrism Sep 26, 2023

harrism Sep 27, 2023

harrism left a comment

harrism commented Sep 28, 2023

harrism left a comment

PointKernel commented Sep 28, 2023

wence- Oct 2, 2023

Blonck Oct 2, 2023

wence- Oct 2, 2023

harrism Oct 2, 2023

Blonck Oct 3, 2023

wence- commented Oct 2, 2023

harrism commented Oct 2, 2023

harrism commented Oct 3, 2023

harrism commented Oct 3, 2023

harrism commented Oct 3, 2023

harrism commented Oct 4, 2023

harrism commented Oct 4, 2023

ttnghia commented Oct 10, 2023

harrism commented Oct 11, 2023

ttnghia commented Oct 11, 2023

	auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(int32_t));
	auto const bytes_read = input.num_columns() * input.num_rows() * (sizeof(cudf::id_to_type(column_type)));

	2 * input.num_columns() * cudf::bitmask_allocation_size_bytes(input.num_rows());
	2 * static_cast<uint64_t>(input.num_columns()) * cudf::bitmask_allocation_size_bytes(input.num_rows());

Add bytes_per_second to transpose benchmark #14170

Add bytes_per_second to transpose benchmark #14170

Conversation

Blonck commented Sep 22, 2023

Checklist

copy-pr-bot bot commented Sep 22, 2023

davidwendt commented Sep 22, 2023

harrism left a comment

Choose a reason for hiding this comment

harrism Sep 26, 2023

Choose a reason for hiding this comment

harrism Sep 27, 2023

Choose a reason for hiding this comment

harrism left a comment

Choose a reason for hiding this comment

harrism commented Sep 28, 2023

harrism left a comment

Choose a reason for hiding this comment

PointKernel commented Sep 28, 2023

wence- Oct 2, 2023

Choose a reason for hiding this comment

Blonck Oct 2, 2023

Choose a reason for hiding this comment

wence- Oct 2, 2023

Choose a reason for hiding this comment

harrism Oct 2, 2023

Choose a reason for hiding this comment

Blonck Oct 3, 2023

Choose a reason for hiding this comment

wence- commented Oct 2, 2023

harrism commented Oct 2, 2023

harrism commented Oct 3, 2023

harrism commented Oct 3, 2023

harrism commented Oct 3, 2023

harrism commented Oct 4, 2023

harrism commented Oct 4, 2023

ttnghia commented Oct 10, 2023

harrism commented Oct 11, 2023

ttnghia commented Oct 11, 2023

Add `bytes_per_second` to transpose benchmark #14170

Add `bytes_per_second` to transpose benchmark #14170