Optimize the performance by passing MessageID implementations by pointers #968

BewareMyPower · 2023-02-27T15:48:21Z

Motivation

Currently there are three implementations of the MessageID interface:

messageID: 24 bytes
trackingMessageID: 64 bytes
chunkMessageID: 80 bytes

However, for all methods of them, the receiver is a value rather than a pointer. It's inefficient because each time a method is called, the copy would happen.

Reference: https://go.dev/tour/methods/8

Modifications

Change the receiver from value to pointer for all MessageID implementations.
Use pointers as the returned values and function parameters for these implementations everywhere.

The trackingMessageID.Undefined method is removed because it's never used now. Though it's a public method, the struct and its factory function are not exposed, so I think it's reasonable.

Remove the benchmark added in
#324. The result is obvious and this test is meaningless. I tried passing the trackingMessageID by pointer and the result reduced from 8.548 ns/op to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a trackingMessageID is 64 bytes. The overhead of accessing by pointers is far less than copying the extra bytes.

BewareMyPower · 2023-02-28T14:13:24Z

Wait for #970 to be merged first.

### Motivation Currently there are three implementations of the `MessageID` interface: - `messageID`: 24 bytes - `trackingMessageID`: 64 bytes - `chunkMessageID`: 80 bytes However, for all methods of them, the receiver is a value rather than a pointer. It's inefficient because each time a method is called, the copy would happen. Reference: https://go.dev/tour/methods/8 ### Modifications - Change the receiver from value to pointer for all `MessageID` implementations. - Use pointers as the returned values and function parameters for these implementations everywhere. The `trackingMessageID.Undefined` method is removed because it's never used now. Though it's a public method, the struct and its factory function are not exposed, so I think it's reasonable. Remove the benchmark added in apache#324. The result is obvious and this test is meaningless. I tried passing the `trackingMessageID` by pointer and the result reduced from 8.548 ns/op to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a `trackingMessageID` is 64 bytes. The overhead of accessing by pointers is far less than copying the extra bytes.

tisonkun

Generally looks good.

Do you have a bench result on this change? Since you declaim that it's a performance improvement.

Also, why do you remove impl_message_bench_test.go?

shibd · 2023-03-01T09:14:20Z

Also, why do you remove impl_message_bench_test.go?

This test has no assertions and should not be in unit tests.

tisonkun · 2023-03-01T09:46:00Z

This test has no assertions and should not be in unit tests.

I think they're go benches which will be only triggered if you run go test -bench

BewareMyPower · 2023-03-01T12:59:32Z

Do you have a bench result on this change? Since you declaim that it's a performance improvement.

I will paste a result soon. Though it might not be significant.

Also, why do you remove impl_message_bench_test.go?

See my PR description.

Remove the benchmark added in
#324. The result is obvious and this test is meaningless. I tried passing the trackingMessageID by pointer and the result reduced from 8.548 ns/op to 1.628 ns/op. It's obvious because a pointer is only 8 bytes while a trackingMessageID is 64 bytes. The overhead of accessing by pointers is far less than copying the extra bytes.

BewareMyPower · 2023-03-01T14:48:36Z

Test process

I created a branch from master and apply the following patch to disable ACK grouping (so that each MessageID could be passed)

diff --git a/perf/perf-consumer.go b/perf/perf-consumer.go
index 6b6e411..c7648e3 100644
--- a/perf/perf-consumer.go
+++ b/perf/perf-consumer.go
@@ -79,6 +79,7 @@ func consume(consumeArgs *ConsumeArgs, stop <-chan struct{}) {
                Topic:                          consumeArgs.Topic,
                SubscriptionName:               consumeArgs.SubscriptionName,
                EnableBatchIndexAcknowledgment: consumeArgs.EnableBatchIndexAck,
+               AckGroupingOptions:             &pulsar.AckGroupingOptions{MaxSize: 0},
        })

        if err != nil {

Then, build the perf tool.

cd perf
go build

Since this patch mainly affects the MessageID related operations, let's only test the consumer with each message acknowledged.

Start Pulsar 2.11 standalone. Create two subscriptions sub1 and sub2 on the same topic:

./perf consume --enable-batch-index-ack my-topic -s sub1
./perf consume --enable-batch-index-ack my-topic -s sub2

Then, produce some messages and wait for a while and stop the producer.

./perf produce --batching-num-messages=1 --rate=200000 my-topic

Then we can test the catch-up read. First, test without this patch:

go build
./perf consume --enable-batch-index-ack my-topic -s sub1

Then, apply this patch and repeat the steps above:

git merge bewaremypower/msg-id-pointer
go build
# NOTE: sub1 and sub2 have the same initial position
./perf consume --enable-batch-index-ack my-topic -s sub2

Before running the perf tool each time, restart the standalone.

Test result

Without this patch:

INFO[22:39:09.321] Stats - Consume rate: 8350.1 msg/s -   65.2 Mbps
INFO[22:39:19.321] Stats - Consume rate: 9100.0 msg/s -   71.1 Mbps
INFO[22:39:29.321] Stats - Consume rate: 8750.0 msg/s -   68.4 Mbps
INFO[22:39:39.321] Stats - Consume rate: 8870.0 msg/s -   69.3 Mbps
INFO[22:39:49.321] Stats - Consume rate: 9200.0 msg/s -   71.9 Mbps
INFO[22:39:59.321] Stats - Consume rate: 9020.0 msg/s -   70.5 Mbps
INFO[22:40:09.320] Stats - Consume rate: 8659.9 msg/s -   67.7 Mbps
INFO[22:40:19.321] Stats - Consume rate: 8810.1 msg/s -   68.8 Mbps
INFO[22:40:29.321] Stats - Consume rate: 8960.0 msg/s -   70.0 Mbps
INFO[22:40:39.321] Stats - Consume rate: 8550.0 msg/s -   66.8 Mbps
INFO[22:40:49.321] Stats - Consume rate: 8300.0 msg/s -   64.8 Mbps
INFO[22:40:59.322] Stats - Consume rate: 8420.0 msg/s -   65.8 Mbps

With this patch:

INFO[22:42:04.326] Stats - Consume rate: 8340.1 msg/s -   65.2 Mbps
INFO[22:42:14.327] Stats - Consume rate: 9930.0 msg/s -   77.6 Mbps
INFO[22:42:24.327] Stats - Consume rate: 8970.0 msg/s -   70.1 Mbps
INFO[22:42:34.326] Stats - Consume rate: 9410.0 msg/s -   73.5 Mbps
INFO[22:42:44.326] Stats - Consume rate: 9400.0 msg/s -   73.4 Mbps
INFO[22:42:54.326] Stats - Consume rate: 9260.0 msg/s -   72.3 Mbps
INFO[22:43:04.327] Stats - Consume rate: 9362.3 msg/s -   73.1 Mbps
INFO[22:43:14.326] Stats - Consume rate: 9317.7 msg/s -   72.8 Mbps
INFO[22:43:24.326] Stats - Consume rate: 9100.0 msg/s -   71.1 Mbps
INFO[22:43:34.326] Stats - Consume rate: 8900.0 msg/s -   69.5 Mbps
INFO[22:43:44.326] Stats - Consume rate: 8830.0 msg/s -   69.0 Mbps
INFO[22:43:54.327] Stats - Consume rate: 9180.0 msg/s -   71.7 Mbps

After this patch, the throughput is more stable and is a little higher than before.

@tisonkun

tisonkun

Thanks for your report.

Although I think the performance is not quite optimized, checking mid, ok looks weird to me.

I'm +1 as it simplifies code without perf regression.

### Motivation `toTrackingMessageID()` is a function that is widely used in `consumer` implementation. It can convert an interface type `MessageID` into an struct type `trackingMessageID`. In addition, the second return value also plays a role in checking the `MessageID` type. In other words, it indicates that `MessageID` **cannot** be a user-defined type. From the perspective of code readability, `toTrackingMessageID()` should not do both. **Note**: After #968 , `toTrackingMessageID()` returns only a pointer now. The role of original `ok` is replaced by nil pointer now. However, the main content discussed in this PR has not changed. For example. https://github.com/apache/pulsar-client-go/blob/e2ea255052e8a527091791ef368851d885ee2d45/pulsar/consumer_regex.go#L176-L181 This example is the correct usage. The `ok` returned by `toTrackingMessageID()` is used to reject user-defined `MessageID`. https://github.com/apache/pulsar-client-go/blob/e2ea255052e8a527091791ef368851d885ee2d45/pulsar/consumer_partition.go#L470-L473 This example is a bit vague. The actual effect here is the same as the previous example. But it return an error `failed to convert trackingMessageID` which is confusing. https://github.com/apache/pulsar-client-go/blob/e2ea255052e8a527091791ef368851d885ee2d45/pulsar/consumer_partition.go#L1816-L1820 In this case. We just want to convert `MessageID` into `trackingMessageID`. We do not care what it really is because it's not possible an invalid `MessageID` implementation. So, original `toTrackingMessageID()` needs to require a careful look to use it correctly. I think it would be better to split it into two different method. `toTrackingMessageID()` just do the struct conversion, which it's more clearly. And when the new messageID type is added, we can just modify the `checkMessageIDType`. ### Modifications - Refactor the `toTrackingMessageID()` - Add the `checkMessageIDType()` to check whether `MessageID` is user-defined.

BewareMyPower self-assigned this Feb 27, 2023

BewareMyPower added the type/enhancement label Feb 27, 2023

BewareMyPower requested review from RobertIndie, nodece, wolfstudy, merlimat, shibd and tisonkun February 27, 2023 15:50

BewareMyPower marked this pull request as draft February 27, 2023 16:07

BewareMyPower force-pushed the bewaremypower/msg-id-pointer branch from 6cff537 to 6fc2fe9 Compare February 27, 2023 16:26

BewareMyPower marked this pull request as ready for review February 27, 2023 16:26

BewareMyPower changed the title ~~Optimize performance by passing MessageID implementations by pointers~~ [Performance Optimization] Pass MessageID implementations by pointers Feb 27, 2023

BewareMyPower changed the title ~~[Performance Optimization] Pass MessageID implementations by pointers~~ Optimize the performance by passing MessageID implementations by pointers Feb 27, 2023

BewareMyPower force-pushed the bewaremypower/msg-id-pointer branch from 6fc2fe9 to 75ca0ae Compare February 27, 2023 16:36

BewareMyPower marked this pull request as draft February 28, 2023 02:56

BewareMyPower force-pushed the bewaremypower/msg-id-pointer branch from 75ca0ae to 00bec29 Compare February 28, 2023 05:47

BewareMyPower marked this pull request as ready for review February 28, 2023 05:48

BewareMyPower force-pushed the bewaremypower/msg-id-pointer branch 2 times, most recently from a83e350 to f7d69df Compare February 28, 2023 08:02

BewareMyPower force-pushed the bewaremypower/msg-id-pointer branch from 1c32ee4 to e29533b Compare March 1, 2023 02:32

nodece approved these changes Mar 1, 2023

View reviewed changes

tisonkun reviewed Mar 1, 2023

View reviewed changes

shibd approved these changes Mar 1, 2023

View reviewed changes

RobertIndie approved these changes Mar 1, 2023

View reviewed changes

tisonkun approved these changes Mar 1, 2023

View reviewed changes

BewareMyPower merged commit 42ded0d into apache:master Mar 1, 2023

BewareMyPower deleted the bewaremypower/msg-id-pointer branch March 1, 2023 16:04

Gleiphir2769 mentioned this pull request Mar 2, 2023

[refactor]: Refactor the toTrackingMessageID() #972

Merged

1 task

BewareMyPower mentioned this pull request Mar 2, 2023

Support grouping ACK requests by time and size #957

Merged

1 task

RobertIndie added this to the v0.10.0 milestone Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the performance by passing MessageID implementations by pointers #968

Optimize the performance by passing MessageID implementations by pointers #968

BewareMyPower commented Feb 27, 2023

BewareMyPower commented Feb 28, 2023

tisonkun left a comment

shibd commented Mar 1, 2023

tisonkun commented Mar 1, 2023 •

edited

Loading

BewareMyPower commented Mar 1, 2023

BewareMyPower commented Mar 1, 2023

tisonkun left a comment

Optimize the performance by passing MessageID implementations by pointers #968

Optimize the performance by passing MessageID implementations by pointers #968

Conversation

BewareMyPower commented Feb 27, 2023

Motivation

Modifications

BewareMyPower commented Feb 28, 2023

tisonkun left a comment

Choose a reason for hiding this comment

shibd commented Mar 1, 2023

tisonkun commented Mar 1, 2023 • edited Loading

BewareMyPower commented Mar 1, 2023

BewareMyPower commented Mar 1, 2023

Test process

Test result

tisonkun left a comment

Choose a reason for hiding this comment

tisonkun commented Mar 1, 2023 •

edited

Loading