Skip to content

KAFKA-12778: Fix QuorumController request timeouts and electLeaders#10688

Merged
cmccabe merged 1 commit intoapache:trunkfrom
cmccabe:KAFKA-12778
May 14, 2021
Merged

KAFKA-12778: Fix QuorumController request timeouts and electLeaders#10688
cmccabe merged 1 commit intoapache:trunkfrom
cmccabe:KAFKA-12778

Conversation

@cmccabe
Copy link
Contributor

@cmccabe cmccabe commented May 13, 2021

Not all RPC requests to the quorum controller include a timeout, but we
should honor the timeouts that do exist.

For electLeaders, attempt to trigger a leader election for all
partitions when the request specifies null for the topics argument.

@cmccabe cmccabe added the kraft label May 13, 2021
@cmccabe cmccabe requested review from junrao and mumrah May 13, 2021 19:47
Copy link
Member

@mumrah mumrah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmccabe. Few minor things inline, but looks good overall.

Generally, I don't like that we need to explicitly extract the timeout from the request and pass it into the controller call. It would nice if this could be generalized or automated somehow. However, something like this is not needed for this minor fix.

Comment on lines +1016 to +1022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to timeouts, or is it just a different fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly different fix, although sort of related.

Comment on lines +1068 to +1070
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can do without the curly braces here and above. However, these will soon be replaced by the actual impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Comment on lines +1210 to +1212
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Can't we just use the Time we pass into the constructor in tests? Not a big deal really, just wondering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have several helper classes for creating quorum controllers, so getting access to the time parameter that way would be difficult. Anyway, this is package-private, so it really only applies to unit tests specifically.

Comment on lines 806 to 807
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all the records in a single batch, or could we create a batch for each topic (including its partitions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, it doesn't seem necessary to do this atomically. We can just do it non-atomically. It's not a logically indivisible operation like creating a topic.

Not all RPC requests to the quorum controller include a timeout, but we
should honor the timeouts that do exist.

For electLeaders, attempt to trigger a leader election for all
partitions when the request specifies null for the topics argument.

Add some unit tests for the above.
@cmccabe
Copy link
Contributor Author

cmccabe commented May 14, 2021

Generally, I don't like that we need to explicitly extract the timeout from the request and pass it into the controller call. It would nice if this could be generalized or automated somehow. However, something like this is not needed for this minor fix.

If we were starting all over again, we could give all RPC requests a timeout, located in the request header itself. But that isn't really the case today -- some requests don't have timeouts, some do, and we have to extract it from the specific request schema.

@cmccabe cmccabe merged commit f20fdbd into apache:trunk May 14, 2021
@cmccabe cmccabe deleted the KAFKA-12778 branch May 14, 2021 19:44
ijuma added a commit to ijuma/kafka that referenced this pull request May 16, 2021
…e-allocations-lz4

* apache-github/trunk: (155 commits)
  KAFKA-12728: Upgrade gradle to 7.0.2 and shadow to 7.0.0 (apache#10606)
  KAFKA-12778: Fix QuorumController request timeouts and electLeaders (apache#10688)
  KAFKA-12754: Improve endOffsets for TaskMetadata (apache#10634)
  Rework on KAFKA-3968: fsync the parent directory of a segment file when the file is created (apache#10680)
  MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (apache#10673)
  MINOR: prevent cleanup() from being called while Streams is still shutting down (apache#10666)
  KAFKA-8326: Introduce List Serde (apache#6592)
  KAFKA-12697: Add Global Topic and Partition count metrics to the Quorum Controller (apache#10679)
  KAFKA-12648: MINOR - Add TopologyMetadata.Subtopology class for subtopology metadata (apache#10676)
  MINOR: Update jacoco to 0.8.7 for JDK 16 support (apache#10654)
  MINOR: exclude all `src/generated` and `src/generated-test` (apache#10671)
  KAFKA-12772: Move all transaction state transition rules into their states (apache#10667)
  KAFKA-12758 Added `server-common` module to have server side common classes.  (apache#10638)
  MINOR Removed copying storage libraries specifically as they are already copied. (apache#10647)
  KAFKA-5876: KIP-216 Part 4, Apply InvalidStateStorePartitionException for Interactive Queries (apache#10657)
  KAFKA-12747: Fix flakiness in shouldReturnUUIDsWithStringPrefix (apache#10643)
  MINOR: remove unnecessary placeholder from WorkerSourceTask#recordSent (apache#10659)
  MINOR: Remove unused `scalatest` definition from `dependencies.gradle` (apache#10655)
  MINOR: checkstyle version upgrade: 8.20 -> 8.36.2 (apache#10656)
  KAFKA-12464: minor code cleanup and additional logging in constrained sticky assignment (apache#10645)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants