-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15661: KIP-951: protocol changes #14627
Conversation
@chb2ab lgtm, thanks for raising this! |
Hey there. If this is KAFKA-15661, then the title of the PR should be |
@@ -45,7 +45,9 @@ | |||
// Version 14 is the same as version 13 but it also receives a new error called OffsetMovedToTieredStorageException (KIP-405) | |||
// | |||
// Version 15 is the same as version 14 (KIP-903). | |||
"validVersions": "0-15", | |||
// | |||
// Version 16 adds the 'NodeEndpoints' field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please include the kip as was done for the other versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -102,6 +104,15 @@ | |||
"about": "The preferred read replica for the consumer to use on its next fetch request"}, | |||
{ "name": "Records", "type": "records", "versions": "0+", "nullableVersions": "0+", "about": "The record data."} | |||
]} | |||
]}, | |||
{ "name": "NodeEndpoints", "type": "[]NodeEndpoint", "versions": "16+", "taggedVersions": "16+", "tag": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not reuse the tags as I mentioned in the other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replied to this in the other PR #14444 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like tags are scoped to the list level so this isn't really the same tag. They also need to be contiguous within their scope so this gives an error if I try to tag NodeEndpoints to something other than 0
Sharing the comment here for clarity.
@@ -360,7 +360,9 @@ public short partitionRecordVersion() { | |||
} | |||
|
|||
public short fetchRequestVersion() { | |||
if (this.isAtLeast(IBP_3_5_IV1)) { | |||
if (this.isAtLeast(IBP_3_7_IV0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work because all the new fields are tagged (so no real changes in handling)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be the first approach mentioned in this comment #14444 (comment)
My understanding is clusters would first be upgraded to 3.7 and then the IBP would be bumped after all the brokers are upgraded, but please correct me if that's wrong. You're right though, all the fields are tagged and there's no change in handling on the broker side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to whether the code here that we want to merge will break the build if we haven't implemented the new fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah we shouldn't need any other changes, the fields are all tagged and not getting used anywhere so we can leave them as their default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that if we set the version as unstable, we may not be able to use it here. 🤦♀️ Maybe we should remove the unstable version true if this causes issues in tests.
Sorry for confusion.
@chb2ab CMIIW, but @jolshan i think scope of the PR is just protocol changes, not server-side changes. Intention is to merge the protocol-changes soon, so client-side changes can also be reviewed utilising the change.
Going through PR 14444, it seems close, @jolshan what do you think? Is it possible to merge in next 1-2 days? |
I updated the title, this would be a partial implementation of KAFKA-15661. I replied to the other comments as well and incorporated them into #14444, if there are no more comments there it would be easier to just merge that. |
I still need to go over the changes in #14444. This would still probably be faster. |
@chb2ab sorry for the confusion. I realized that marking the version unstable will probably cause issues if the ibp suggests using that version. Since the tagged fields are the only difference, let's remove the unstable version flag. Sorry for the back and forth. After that, I will check the tests again and hopefully this will be good to go. |
@jolshan np, I reverted that change. |
I will just watch the build now. Thanks @chb2ab |
This reverts commit d53c43f589bf66cebe4c1fbee92687d5f2e77a6a.
2d36fb6
to
23e3026
Compare
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client https://issues.apache.org/jira/browse/KAFKA-15661 Protocol changes: #14627 Testing Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults ./gradlew core:test --tests kafka.server.KafkaApisTest Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>, Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client https://issues.apache.org/jira/browse/KAFKA-15661 Protocol changes: apache#14627 Testing Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults ./gradlew core:test --tests kafka.server.KafkaApisTest Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>, Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
Separating out the protocol changes from apache#14444 in an effort to more quickly unblock the client side PR. This is the protocol changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are included in the response. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump is required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client https://issues.apache.org/jira/browse/KAFKA-15661 Protocol changes: apache#14627 Testing Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults ./gradlew core:test --tests kafka.server.KafkaApisTest Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>, Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
Separating out the protocol changes from apache#14444 in an effort to more quickly unblock the client side PR. This is the protocol changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are included in the response. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump is required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client https://issues.apache.org/jira/browse/KAFKA-15661 Protocol changes: apache#14627 Testing Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults ./gradlew core:test --tests kafka.server.KafkaApisTest Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>, Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
This is the server side changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are retrieved from the local cache through ReplicaManager and included in the response, falling back to the metadata cache if they are unavailable there. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump was required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client https://issues.apache.org/jira/browse/KAFKA-15661 Protocol changes: apache#14627 Testing Benchmarking described here https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client#KIP951:Leaderdiscoveryoptimisationsfortheclient-BenchmarkResults ./gradlew core:test --tests kafka.server.KafkaApisTest Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Fred Zheng <zhengyd2014@gmail.com>, Mayank Shekhar Narula <mayanks.narula@gmail.com>, Yang Yang <yayang@uber.com>, David Mao <dmao@confluent.io>, Kirk True <ktrue@confluent.io>
Separating out the protocol changes from #14444 in an effort to more quickly unblock the client side PR.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client
https://issues.apache.org/jira/browse/KAFKA-15661
Testing
./gradlew core:test --tests kafka.server.KafkaApisTest