Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15661: KIP-951: protocol changes #14627

Merged
merged 5 commits into from
Nov 1, 2023

Conversation

chb2ab
Copy link
Contributor

@chb2ab chb2ab commented Oct 24, 2023

Separating out the protocol changes from #14444 in an effort to more quickly unblock the client side PR.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

https://issues.apache.org/jira/browse/KAFKA-15661

Testing

./gradlew core:test --tests kafka.server.KafkaApisTest

@msn-tldr
Copy link
Contributor

@chb2ab lgtm, thanks for raising this!

@jolshan
Copy link
Contributor

jolshan commented Oct 25, 2023

Hey there. If this is KAFKA-15661, then the title of the PR should be
KAFKA-15661: KIP-951: Server side and protocol changes

@@ -45,7 +45,9 @@
// Version 14 is the same as version 13 but it also receives a new error called OffsetMovedToTieredStorageException (KIP-405)
//
// Version 15 is the same as version 14 (KIP-903).
"validVersions": "0-15",
//
// Version 16 adds the 'NodeEndpoints' field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include the kip as was done for the other versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -102,6 +104,15 @@
"about": "The preferred read replica for the consumer to use on its next fetch request"},
{ "name": "Records", "type": "records", "versions": "0+", "nullableVersions": "0+", "about": "The record data."}
]}
]},
{ "name": "NodeEndpoints", "type": "[]NodeEndpoint", "versions": "16+", "taggedVersions": "16+", "tag": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not reuse the tags as I mentioned in the other PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to this in the other PR #14444 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like tags are scoped to the list level so this isn't really the same tag. They also need to be contiguous within their scope so this gives an error if I try to tag NodeEndpoints to something other than 0

Sharing the comment here for clarity.

@@ -360,7 +360,9 @@ public short partitionRecordVersion() {
}

public short fetchRequestVersion() {
if (this.isAtLeast(IBP_3_5_IV1)) {
if (this.isAtLeast(IBP_3_7_IV0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work because all the new fields are tagged (so no real changes in handling)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be the first approach mentioned in this comment #14444 (comment)

My understanding is clusters would first be upgraded to 3.7 and then the IBP would be bumped after all the brokers are upgraded, but please correct me if that's wrong. You're right though, all the fields are tagged and there's no change in handling on the broker side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to whether the code here that we want to merge will break the build if we haven't implemented the new fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah we shouldn't need any other changes, the fields are all tagged and not getting used anywhere so we can leave them as their default values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that if we set the version as unstable, we may not be able to use it here. 🤦‍♀️ Maybe we should remove the unstable version true if this causes issues in tests.

Sorry for confusion.

@msn-tldr
Copy link
Contributor

@chb2ab CMIIW, but @jolshan i think scope of the PR is just protocol changes, not server-side changes. Intention is to merge the protocol-changes soon, so client-side changes can also be reviewed utilising the change.
I see 2 options to make this happen

  1. Merge PR KAFKA-15661: KIP-951: Server side changes #14444, if its ready.
  2. If 1. isn't feasible, then only merge the protocol changes with the current PR.

Going through PR 14444, it seems close, @jolshan what do you think? Is it possible to merge in next 1-2 days?

@chb2ab chb2ab changed the title KIP-951: protocol changes KAFKA-15661: KIP-951: protocol changes Oct 26, 2023
@chb2ab
Copy link
Contributor Author

chb2ab commented Oct 26, 2023

I updated the title, this would be a partial implementation of KAFKA-15661.

I replied to the other comments as well and incorporated them into #14444, if there are no more comments there it would be easier to just merge that.

@jolshan
Copy link
Contributor

jolshan commented Oct 26, 2023

I still need to go over the changes in #14444. This would still probably be faster.

@jolshan
Copy link
Contributor

jolshan commented Oct 27, 2023

@chb2ab sorry for the confusion. I realized that marking the version unstable will probably cause issues if the ibp suggests using that version. Since the tagged fields are the only difference, let's remove the unstable version flag. Sorry for the back and forth. After that, I will check the tests again and hopefully this will be good to go.

#14627 (comment)

@chb2ab
Copy link
Contributor Author

chb2ab commented Oct 27, 2023

@jolshan np, I reverted that change.

@jolshan
Copy link
Contributor

jolshan commented Oct 27, 2023

I will just watch the build now. Thanks @chb2ab

@jolshan jolshan merged commit c8f687a into apache:trunk Nov 1, 2023
1 check failed
jolshan pushed a commit that referenced this pull request Nov 10, 2023
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

https://issues.apache.org/jira/browse/KAFKA-15661

Protocol changes: #14627

Testing
Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults
./gradlew core:test --tests kafka.server.KafkaApisTest

Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>,  Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
rreddy-22 pushed a commit to rreddy-22/kafka-rreddy that referenced this pull request Jan 2, 2024
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

https://issues.apache.org/jira/browse/KAFKA-15661

Protocol changes: apache#14627

Testing
Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults
./gradlew core:test --tests kafka.server.KafkaApisTest

Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>,  Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
yyu1993 pushed a commit to yyu1993/kafka that referenced this pull request Feb 15, 2024
Separating out the protocol changes from apache#14444 in an effort to more quickly unblock the client side PR.

This is the protocol changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are included in the response. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump is required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>
yyu1993 pushed a commit to yyu1993/kafka that referenced this pull request Feb 15, 2024
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

https://issues.apache.org/jira/browse/KAFKA-15661

Protocol changes: apache#14627

Testing
Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults
./gradlew core:test --tests kafka.server.KafkaApisTest

Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>,  Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
Separating out the protocol changes from apache#14444 in an effort to more quickly unblock the client side PR.

This is the protocol changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are included in the response. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump is required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

https://issues.apache.org/jira/browse/KAFKA-15661

Protocol changes: apache#14627

Testing
Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults
./gradlew core:test --tests kafka.server.KafkaApisTest

Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>,  Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
clolov pushed a commit to clolov/kafka that referenced this pull request Apr 5, 2024
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

https://issues.apache.org/jira/browse/KAFKA-15661

Protocol changes: apache#14627

Testing
Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults
./gradlew core:test --tests kafka.server.KafkaApisTest

Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>,  Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants