Skip to content

Comments

KAFKA-5135: Controller Health Metrics (KIP-143)#2983

Closed
ijuma wants to merge 17 commits intoapache:trunkfrom
ijuma:kafka-5135-controller-health-metrics-kip-143
Closed

KAFKA-5135: Controller Health Metrics (KIP-143)#2983
ijuma wants to merge 17 commits intoapache:trunkfrom
ijuma:kafka-5135-controller-health-metrics-kip-143

Conversation

@ijuma
Copy link
Member

@ijuma ijuma commented May 5, 2017

No description provided.

@ijuma
Copy link
Member Author

ijuma commented May 5, 2017

This is an initial PR to get some feedback on whether I'm on the right track. Need to write tests and verify if any additional work needs to be done to clean-up metrics. cc @onurkaraman @junrao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a concurrency bug. You should be synchronizing access to brokerStateInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ElectingLeader state seems somewhat out of place. Should this reflect that a broker change event is occurring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KIP says BrokerChange indeed. However, we have LeaderElectionRateAndTimeMs that seems to measure the same code path. So, I thought it would make sense to use a similar name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BrokerTopicStats refactoring causes the PR to be much larger than I think it needs to be. Can we move this change out into a separate cleanup PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are mechanical, the diffstat is +483 −315 (which is manageable) and there's a separate commit for the refactoring. So, I think we don't need a separate PR. Let's see what @junrao says.

@asfbot
Copy link

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3554/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3548/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3545/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3571/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3565/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 5, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3561/
Test FAILed (JDK 8 and Scala 2.12).

@junrao
Copy link
Contributor

junrao commented May 6, 2017

@ijuma : Thanks for the patch. The general approach looks good.

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from bcc18df to 0217270 Compare May 6, 2017 15:43
@asfbot
Copy link

asfbot commented May 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3590/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3584/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3580/
Test FAILed (JDK 8 and Scala 2.12).

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from 0217270 to 59267fb Compare May 6, 2017 17:36
@asfbot
Copy link

asfbot commented May 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3595/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3589/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3585/
Test FAILed (JDK 8 and Scala 2.12).

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from 59267fb to c8cfd2f Compare May 7, 2017 23:04
@asfbot
Copy link

asfbot commented May 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/3626/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/3620/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented May 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3617/
Test FAILed (JDK 8 and Scala 2.12).

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from c8cfd2f to d467ca4 Compare May 15, 2017 00:01
@asfbot
Copy link

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3903/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3892/
Test PASSed (JDK 8 and Scala 2.12).

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from d467ca4 to 6227f86 Compare May 15, 2017 18:23
@asfbot
Copy link

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3930/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3941/
Test PASSed (JDK 7 and Scala 2.11).

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from 6227f86 to 35bc136 Compare May 15, 2017 21:37
@asfbot
Copy link

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3960/
Test PASSed (JDK 7 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3949/
Test PASSed (JDK 8 and Scala 2.12).

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from 7dc99cd to 0e2df6a Compare May 23, 2017 09:28
@asfbot
Copy link

asfbot commented May 23, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4286/
Test FAILed (JDK 8 and Scala 2.12).

@ijuma
Copy link
Member Author

ijuma commented May 23, 2017

Thanks for the review @junrao. I addressed your feedback. Also, while removing the controllerContext.stats.leaderElectionTimer.time block (which is now handled via the BrokerChange state metric), I noticed that some events had a redundant try/catch block (since ControllerEventManager.process also does the same). I've removed them as well.

@asfbot
Copy link

asfbot commented May 23, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4300/
Test PASSed (JDK 7 and Scala 2.11).

@asfbot
Copy link

asfbot commented May 23, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4287/
Test FAILed (JDK 8 and Scala 2.12).

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma : Thanks for the updated patch. LGTM. Just one minor comment.

EasyMock.expect(eventMock.process()).andAnswer(new IAnswer[Unit]() {
def answer(): Unit = {
while (processing)
Thread.`yield`()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the behavior of Thread.yield() can be platform dependent, would it be better to change processing to sth like a CountDownLatch()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. Both are correct (even if Thread.yield is a no-op, it won't matter here), but CountDownLatch is more concise and shows intent better. Updated the PR.

@ijuma ijuma force-pushed the kafka-5135-controller-health-metrics-kip-143 branch from e2a1dc7 to 99fb9a1 Compare May 23, 2017 21:54
@asfbot
Copy link

asfbot commented May 23, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4322/
Test FAILed (JDK 7 and Scala 2.11).

@junrao
Copy link
Contributor

junrao commented May 23, 2017

@ijuma : Thanks for the patch. LGTM. I will let you merge this when the tests complete.

@asfbot
Copy link

asfbot commented May 23, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4309/
Test PASSed (JDK 8 and Scala 2.12).

asfgit pushed a commit that referenced this pull request May 23, 2017
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jun Rao <junrao@gmail.com>, Onur Karaman <okaraman@linkedin.com>

Closes #2983 from ijuma/kafka-5135-controller-health-metrics-kip-143

(cherry picked from commit 516d845)
Signed-off-by: Ismael Juma <ismael@juma.me.uk>
@asfgit asfgit closed this in 516d845 May 23, 2017
@ijuma ijuma deleted the kafka-5135-controller-health-metrics-kip-143 branch September 5, 2017 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants