Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

[FEATURE] Support kafka LogValidator validate inner records and compression codec when handle producer request with entryFormat=kafka #791

Merged
merged 37 commits into from
Oct 20, 2021

Conversation

wenbingshen
Copy link
Contributor

@wenbingshen wenbingshen commented Oct 7, 2021

fixes #687

Motivation

Previously, in order to support low version kafka clients, such as 0.10 version clients, we verified the data when processing fetch requests. Because the lower message format would maintain the internal message set, according to the production situation I encountered, in a certain sometimes there is an error in the offset of the internal message set, such as the bug in the low version of the sarama client.

The problem is that for the low version message format, no matter whether there is a problem with the internal message set, we will regenerate the message records, which is unnecessary in some cases.

Modifications

Support message set verification in production like kafka, and only need to verify when entryFormat=kafka.
Because when entryFormat=pulsar, we still cannot avoid doing message conversion during consumption.

@wenbingshen wenbingshen marked this pull request as draft October 7, 2021 03:11
@wenbingshen wenbingshen marked this pull request as ready for review October 7, 2021 09:37
@wenbingshen
Copy link
Contributor Author

If we can use this pr. Then we can use direct memory to rebuild kafka records like #673.
I can work for this after this pr has been merged.

@wenbingshen
Copy link
Contributor Author

@BewareMyPower PTAL if you have free time. Thanks. :)

@BewareMyPower
Copy link
Collaborator

Please rebase to master since it has conflicts with your previous PR.

@wenbingshen
Copy link
Contributor Author

Please rebase to master since it has conflicts with your previous PR.

Thanks for the reminder, I happen to be doing this job and the conflict has been resolved. :)

@BewareMyPower BewareMyPower added the type/feature Indicates new functionality label Oct 9, 2021
@wenbingshen
Copy link
Contributor Author

The result of my local performance test.

image

@wenbingshen
Copy link
Contributor Author

The failed test does not seem to be related to this pr.

@BewareMyPower
Copy link
Collaborator

Here's my test result for encoding before and after this PR. Before (Kafka) represents KafkaEntryFormatter before this PR.

Before (Kafka) Before (Pulsar) After (Kafka) After (Pulsar)
Fixed records for 100 times 12 1246 48 1267
Random records for 100 times 10 854 23 961
Fixed records for 1000 times 78 7630 257 7230
Random records for 1000 times 88 7376 170 6682

We can see the encoding time of KafkaEntryFormatter has increased 2 to 3 times after this PR because the time of PulsarEntryFormatter didn't change significantly.

I think we cannot accept this performance lost just for compatibility with some special cases for Kafka 0.10 client.

If you insisted on this change, please add a new EntryFormatter implementation.

@wenbingshen
Copy link
Contributor Author

Here's my test result for encoding before and after this PR. Before (Kafka) represents KafkaEntryFormatter before this PR.

Before (Kafka) Before (Pulsar) After (Kafka) After (Pulsar)
Fixed records for 100 times 12 1246 48 1267
Random records for 100 times 10 854 23 961
Fixed records for 1000 times 78 7630 257 7230
Random records for 1000 times 88 7376 170 6682
We can see the encoding time of KafkaEntryFormatter has increased 2 to 3 times after this PR because the time of PulsarEntryFormatter didn't change significantly.

I think we cannot accept this performance lost just for compatibility with some special cases for Kafka 0.10 client.

If you insisted on this change, please add a new EntryFormatter implementation.

Okay, if we add a new entryFormatter, does it mean that the previous KafkaEntryFormatter no longer needs to support 0.10 clients and only supports magic=2?

@wenbingshen wenbingshen marked this pull request as ready for review October 14, 2021 16:52
@wenbingshen
Copy link
Contributor Author

@BewareMyPower
I think the current failed test does not seem to be related to this pr, because I also saw the same error on October 10, 2021.

https://github.com/wenbingshen/Fail-tests/blob/master/io.streamnative.pulsar.handlers.kop.KafkaRequestTypeTest.txt

@wenbingshen
Copy link
Contributor Author

@BewareMyPower
Copy link
Collaborator

@BewareMyPower And can you help add docs? Thanks very much.

There's a doc-required label for this PR. I'll add related docs after this PR is merged.

@wenbingshen
Copy link
Contributor Author

@BewareMyPower And can you help add docs? Thanks very much.

There's a doc-required label for this PR. I'll add related docs after this PR is merged.

Good. Thanks. :)

@BewareMyPower
Copy link
Collaborator

Left some comments. In addition, I cannot see any different tests between KafkaV1EntryFormatter and KafkaMixedEntryFormatter. I think the newly added tests for mixed_kafka are not necessary, except tests under io.streamnative.pulsar.handlers.kop.compatibility.

We already have Kafka 0.10 client for tests. Does it mean only some older version Golang sarama client can make a difference between kafka and mixed_kafka?

Or could you do some simple tests just for EntryFormatter#encode method? Like creating a V0 message set, and show the different behaviors between kafka and mixed_kafka?

@BewareMyPower
Copy link
Collaborator

If the newer sarama client has fixed the problem. Could you give the version?

@wenbingshen
Copy link
Contributor Author

If the newer sarama client has fixed the problem. Could you give the version?

For reproducing the problem I described, please take a look at this pr: sarama-1002 sarama go low version client jump After allocating the internal message offset for compressed data, this will be redistributed by the Kafka server in Kafka, otherwise the message with the wrong attribute will be a corrupted message.

here sarama-1002 v1.16 @BewareMyPower

@wenbingshen
Copy link
Contributor Author

Left some comments. In addition, I cannot see any different tests between KafkaV1EntryFormatter and KafkaMixedEntryFormatter. I think the newly added tests for mixed_kafka are not necessary, except tests under io.streamnative.pulsar.handlers.kop.compatibility.

We already have Kafka 0.10 client for tests. Does it mean only some older version Golang sarama client can make a difference between kafka and mixed_kafka?

Or could you do some simple tests just for EntryFormatter#encode method? Like creating a V0 message set, and show the different behaviors between kafka and mixed_kafka?

All comments done. PTAL io.streamnative.pulsar.handlers.kop.format.EntryFormatterTest for the test to show the different behaviors between kafka and mixed_kafka.

@BewareMyPower
Copy link
Collaborator

I will review it soon.

@BewareMyPower
Copy link
Collaborator

Overall LGTM. Please check the comments.

@BewareMyPower BewareMyPower merged commit fa94761 into streamnative:master Oct 20, 2021
BewareMyPower added a commit that referenced this pull request Oct 20, 2021
…ession codec when handle producer request with entryFormat=kafka (#791)

fixes #687 

### Motivation
Previously, in order to support low version kafka clients, such as 0.10 version clients, we verified the data when processing fetch requests. Because the lower message format would maintain the internal message set, according to the production situation I encountered, in a certain sometimes there is an error in the offset of the internal message set, such as the bug in the low version of the sarama client.

The problem is that for the low version message format, no matter whether there is a problem with the internal message set, we will regenerate the message records, which is unnecessary in some cases.

### Modifications
Support message set verification in production like kafka, and only need to verify when entryFormat=kafka.
Because when entryFormat=pulsar, we still cannot avoid doing message conversion during consumption.

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
Co-authored-by: Kai Wang <kwang@streamnative.io>
Co-authored-by: Huanli Meng <48120384+Huanli-Meng@users.noreply.github.com>
@BewareMyPower BewareMyPower added doc This pr contains a document and removed doc-required This pr needs a document labels Jan 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
doc This pr contains a document type/feature Indicates new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Need Increase the guarantee of the correctness of the internal message offset
4 participants