Skip to content

[SPARK-41197][BUILD] Upgrade Kafka to 3.3.1#38715

Closed
tedyu wants to merge 3 commits intoapache:masterfrom
tedyu:k-33
Closed

[SPARK-41197][BUILD] Upgrade Kafka to 3.3.1#38715
tedyu wants to merge 3 commits intoapache:masterfrom
tedyu:k-33

Conversation

@tedyu
Copy link
Contributor

@tedyu tedyu commented Nov 18, 2022

What changes were proposed in this pull request?

This PR upgrades Kafka to 3.3.1 release.

The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds batch.size, it switches to the next partition. For spark kafka tests, this will result in records being sent to only one partition in some tests.
KafkaTestUtils.producerConfiguration is modified to use DefaultPartitioner.

Why are the changes needed?

Kafka 3.3.1 release has new features along with bug fixes: https://www.confluent.io/blog/apache-kafka-3-3-0-new-features-and-updates/

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing test suite

@github-actions github-actions bot added the BUILD label Nov 18, 2022
@tedyu
Copy link
Contributor Author

tedyu commented Nov 18, 2022

@HeartSaVioR
Can you take a look ?

@bjornjorgensen
Copy link
Contributor

bjornjorgensen commented Nov 19, 2022

Hi, 3.3.0 "A significant bug was found in the 3.3.0 release after artifacts were pushed to Apache and Maven central but prior to the release announcement. As a result, the decision was made to not announce 3.3.0 and instead release 3.3.1 with the fix. It is recommended that 3.3.0 not be used."

Can you try with 3.3.1

@tedyu
Copy link
Contributor Author

tedyu commented Nov 19, 2022

@bjornjorgensen
I have updated the PR.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @tedyu . Could you fix UT failures?

[info] *** 90 TESTS FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.kafka010.KafkaMicroBatchV2SourceSuite
[error] 	org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceWithAdminSuite
[error] 	org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceSuite
[error] 	org.apache.spark.sql.kafka010.KafkaContinuousSourceSuite
[error] 	org.apache.spark.sql.kafka010.KafkaMicroBatchV2SourceWithAdminSuite

@tedyu
Copy link
Contributor Author

tedyu commented Nov 21, 2022

I noticed the test failures.
Let me analyze the test output.

@bjornjorgensen
Copy link
Contributor

Can it have something to do with apache/kafka#12794 ?

@dengziming
Copy link
Member

dengziming commented Nov 22, 2022

These failures comes from apache/kafka#12049 and is described here: https://kafka.apache.org/documentation/#upgrade_33_notable
The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds batch.size, switches to the next partition. In spark kafka tests, this will result in records being sent to only one partition in some tests.
One simplest solution is add props.put("partitioner.class",classOf[org.apache.kafka.clients.producer.internals.DefaultPartitioner].getName) in KafkaTestUtils.producerConfiguration, or we can implement our own partitioner, or set a smallbatch.size config.

@HyukjinKwon
Copy link
Member

cc @dongjoon-hyun and @HeartSaVioR FYI

@tedyu
Copy link
Contributor Author

tedyu commented Nov 22, 2022

Thanks @dengziming for the information.

@tedyu
Copy link
Contributor Author

tedyu commented Nov 22, 2022

@HyukjinKwon @dongjoon-hyun @HeartSaVioR
Please take a look - tests pass.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41197] Upgrade Kafka version to 3.3 release [SPARK-41197][BUILD] Upgrade Kafka to 3.3.1 Nov 28, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @tedyu , @bjornjorgensen , @dengziming, @HyukjinKwon .
Merged to master for Apache Spark 3.4.

dongjoon-hyun pushed a commit that referenced this pull request Nov 28, 2022
This PR upgrades Kafka to 3.3.1 release.

The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds `batch.size`, it switches to the next partition. For spark kafka tests, this will result in records being sent to only one partition in some tests.
`KafkaTestUtils.producerConfiguration` is modified to use `DefaultPartitioner`.

Kafka 3.3.1 release has new features along with bug fixes: https://www.confluent.io/blog/apache-kafka-3-3-0-new-features-and-updates/

No

Existing test suite

Closes #38715 from tedyu/k-33.

Authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

This is merged via 0ff201c .

beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?
This PR upgrades Kafka to 3.3.1 release.

The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds `batch.size`, it switches to the next partition. For spark kafka tests, this will result in records being sent to only one partition in some tests.
`KafkaTestUtils.producerConfiguration` is modified to use `DefaultPartitioner`.

### Why are the changes needed?
Kafka 3.3.1 release has new features along with bug fixes: https://www.confluent.io/blog/apache-kafka-3-3-0-new-features-and-updates/

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing test suite

Closes apache#38715 from tedyu/k-33.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
This PR upgrades Kafka to 3.3.1 release.

The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds `batch.size`, it switches to the next partition. For spark kafka tests, this will result in records being sent to only one partition in some tests.
`KafkaTestUtils.producerConfiguration` is modified to use `DefaultPartitioner`.

Kafka 3.3.1 release has new features along with bug fixes: https://www.confluent.io/blog/apache-kafka-3-3-0-new-features-and-updates/

No

Existing test suite

Closes apache#38715 from tedyu/k-33.

Authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?
This PR upgrades Kafka to 3.3.1 release.

The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds `batch.size`, it switches to the next partition. For spark kafka tests, this will result in records being sent to only one partition in some tests.
`KafkaTestUtils.producerConfiguration` is modified to use `DefaultPartitioner`.

### Why are the changes needed?
Kafka 3.3.1 release has new features along with bug fixes: https://www.confluent.io/blog/apache-kafka-3-3-0-new-features-and-updates/

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing test suite

Closes apache#38715 from tedyu/k-33.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
This PR upgrades Kafka to 3.3.1 release.

The new default partitioner keeps track of how many bytes are produced per-partition and once the amount exceeds `batch.size`, it switches to the next partition. For spark kafka tests, this will result in records being sent to only one partition in some tests.
`KafkaTestUtils.producerConfiguration` is modified to use `DefaultPartitioner`.

Kafka 3.3.1 release has new features along with bug fixes: https://www.confluent.io/blog/apache-kafka-3-3-0-new-features-and-updates/

No

Existing test suite

Closes apache#38715 from tedyu/k-33.

Authored-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants