-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-12697: Add OfflinePartitionCount and PreferredReplicaImbalanceCount metrics to Quorum Controller #10572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @dielhennr, thanks for the PR! I think it would be simpler to have For the partition count, we have no choice but to maintain our own count, since we can't efficiently get that number just by taking the size() of any existing data structure. However, we should do this in Keep in mind that anything accessed by a metrics callback functions needs to be set in a thread-safe way. The metrics system uses a different set of threads than the controller. So you need to use a lock, a volatile, or an atomic variable when passing information from one to the other. A volatile is usually the best way to do things since it has the lowest overhead (certainly it's the best in this particular case, I think.) |
|
Can you create a PR that has just the global topics count and global partitions count? Then we can keep the remaining metrics in this PR. That will make it easier to review and get in. |
| */ | ||
| private final TimelineHashMap<Integer, TimelineHashMap<Uuid, int[]>> isrMembers; | ||
|
|
||
| private final Map<Uuid, Integer> offlinePartitionCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use regular maps here because they will not roll back to the desired state during a snapshot restore.
In any case, I don't see why we need this map. It's enough to know how many offline partitions there are, which we already have a count of below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was so that when a topic is removed, any offline partitions for that topic are decremented from the counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the information that is needed is already here. If you delete X partitions that had a leader of -1, you decrement the counter by X.
| String topicPart = topicInfo.name + "-" + record.partitionId() + " with topic ID " + | ||
| record.topicId(); | ||
| newPartitionInfo.maybeLogPartitionChange(log, topicPart, prevPartitionInfo); | ||
| if ((newPartitionInfo.leader != newPartitionInfo.preferredReplica()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a function like "hasPreferredLeader" on PartitionControlInfo, to make this simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do this in a follow-on
| unfenceBroker(i, ctx); | ||
| } | ||
|
|
||
| CreatableTopicResult foo = ctx.createTestTopic("foo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a nitpick, but can you put this one fewer lines? 2 lines should be enough for this. Same for the ones below.
| // nothing to do | ||
| } | ||
|
|
||
| @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, did this @Override line get deleted by accident?
cmccabe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e-allocations-lz4 * apache-github/trunk: (43 commits) KAFKA-12800: Configure generator to fail on trailing JSON tokens (apache#10717) MINOR: clarify message ordering with max in-flight requests and idempotent producer (apache#10690) MINOR: Add log identifier/prefix printing in Log layer static functions (apache#10742) MINOR: update java doc for deprecated methods (apache#10722) MINOR: Fix deprecation warnings in SlidingWindowedCogroupedKStreamImplTest (apache#10703) KAFKA-12499: add transaction timeout verification (apache#10482) KAFKA-12620 Allocate producer ids on the controller (apache#10504) MINOR: Kafka Streams code samples formating unification (apache#10651) KAFKA-12808: Remove Deprecated Methods under StreamsMetrics (apache#10724) KAFKA-12522: Cast SMT should allow null value records to pass through (apache#10375) KAFKA-12820: Upgrade maven-artifact dependency to resolve CVE-2021-26291 HOTFIX: fix checkstyle issue in KAFKA-12697 KAFKA-12697: Add OfflinePartitionCount and PreferredReplicaImbalanceCount metrics to Quorum Controller (apache#10572) KAFKA-12342: Remove MetaLogShim and use RaftClient directly (apache#10705) KAFKA-12779: KIP-740, Clean up public API in TaskId and fix TaskMetadata#taskId() (apache#10735) KAFKA-12814: Remove Deprecated Method StreamsConfig getConsumerConfigs (apache#10737) MINOR: Eliminate redundant functions in LogTest suite (apache#10732) MINOR: Remove unused maxProducerIdExpirationMs parameter in Log constructor (apache#10723) MINOR: Updating files with release 2.7.1 (apache#10660) KAFKA-12809: Remove deprecated methods of Stores factory (apache#10729) ...
The metrics are calculated by counting records as they are replayed e.g. replay(TopicRecord), replay(RemoveTopicRecord)
This was unit tested using MockControllerMetrics.
https://issues.apache.org/jira/browse/KAFKA-12697