KAFKA-12778: Fix QuorumController request timeouts and electLeaders#10688
KAFKA-12778: Fix QuorumController request timeouts and electLeaders#10688cmccabe merged 1 commit intoapache:trunkfrom
Conversation
mumrah
left a comment
There was a problem hiding this comment.
Thanks @cmccabe. Few minor things inline, but looks good overall.
Generally, I don't like that we need to explicitly extract the timeout from the request and pass it into the controller call. It would nice if this could be generalized or automated somehow. However, something like this is not needed for this minor fix.
There was a problem hiding this comment.
Is this related to timeouts, or is it just a different fix?
There was a problem hiding this comment.
This is a slightly different fix, although sort of related.
There was a problem hiding this comment.
nit: we can do without the curly braces here and above. However, these will soon be replaced by the actual impl
There was a problem hiding this comment.
Do we need this? Can't we just use the Time we pass into the constructor in tests? Not a big deal really, just wondering
There was a problem hiding this comment.
We have several helper classes for creating quorum controllers, so getting access to the time parameter that way would be difficult. Anyway, this is package-private, so it really only applies to unit tests specifically.
There was a problem hiding this comment.
Do we need all the records in a single batch, or could we create a batch for each topic (including its partitions)?
There was a problem hiding this comment.
Thinking about this more, it doesn't seem necessary to do this atomically. We can just do it non-atomically. It's not a logically indivisible operation like creating a topic.
Not all RPC requests to the quorum controller include a timeout, but we should honor the timeouts that do exist. For electLeaders, attempt to trigger a leader election for all partitions when the request specifies null for the topics argument. Add some unit tests for the above.
If we were starting all over again, we could give all RPC requests a timeout, located in the request header itself. But that isn't really the case today -- some requests don't have timeouts, some do, and we have to extract it from the specific request schema. |
…e-allocations-lz4 * apache-github/trunk: (155 commits) KAFKA-12728: Upgrade gradle to 7.0.2 and shadow to 7.0.0 (apache#10606) KAFKA-12778: Fix QuorumController request timeouts and electLeaders (apache#10688) KAFKA-12754: Improve endOffsets for TaskMetadata (apache#10634) Rework on KAFKA-3968: fsync the parent directory of a segment file when the file is created (apache#10680) MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (apache#10673) MINOR: prevent cleanup() from being called while Streams is still shutting down (apache#10666) KAFKA-8326: Introduce List Serde (apache#6592) KAFKA-12697: Add Global Topic and Partition count metrics to the Quorum Controller (apache#10679) KAFKA-12648: MINOR - Add TopologyMetadata.Subtopology class for subtopology metadata (apache#10676) MINOR: Update jacoco to 0.8.7 for JDK 16 support (apache#10654) MINOR: exclude all `src/generated` and `src/generated-test` (apache#10671) KAFKA-12772: Move all transaction state transition rules into their states (apache#10667) KAFKA-12758 Added `server-common` module to have server side common classes. (apache#10638) MINOR Removed copying storage libraries specifically as they are already copied. (apache#10647) KAFKA-5876: KIP-216 Part 4, Apply InvalidStateStorePartitionException for Interactive Queries (apache#10657) KAFKA-12747: Fix flakiness in shouldReturnUUIDsWithStringPrefix (apache#10643) MINOR: remove unnecessary placeholder from WorkerSourceTask#recordSent (apache#10659) MINOR: Remove unused `scalatest` definition from `dependencies.gradle` (apache#10655) MINOR: checkstyle version upgrade: 8.20 -> 8.36.2 (apache#10656) KAFKA-12464: minor code cleanup and additional logging in constrained sticky assignment (apache#10645) ...
Not all RPC requests to the quorum controller include a timeout, but we
should honor the timeouts that do exist.
For electLeaders, attempt to trigger a leader election for all
partitions when the request specifies null for the topics argument.