Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linkedin/kafka-monitor #97

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add linkedin/kafka-monitor #97

wants to merge 4 commits into from

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Nov 10, 2017

The best answer I've found to #80, on paper :) Remains to learn how to use it.

Adding the monitoring label because the combination of readiness alerts for key health like under-replicated partitions (https://github.com/Yolean/kubernetes-kafka/pull/95/files#diff-f8da94a0c2daaa5e09e08330d1ed122a) and end-to-end testing like kafka-monitor may pay off better than internal metrics.

In actual troubleshooting scenarios you'll probably still want to connect some JMX tool (allowed since #96) to really dig into the state of things.

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

The UI works, the problem was that I used kubectl port-forward and only forwarded port 8000. Actually once the UI is loaded you can switch to forwarding 8778, and you'll get fancy graphs :)

Metrics also works:

curl localhost:8778/jolokia/read/kmf.services:type=produce-service,name=*/records-produced-rate | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   268  100   268    0     0   2876      0 --:--:-- --:--:-- --:--:--  2913
{
  "request": {
    "mbean": "kmf.services:name=*,type=produce-service",
    "attribute": "records-produced-rate",
    "type": "read"
  },
  "value": {
    "kmf.services:name=single-cluster-monitor,type=produce-service": {
      "records-produced-rate": 54.47272973797498
    }
  },
  "timestamp": 1510324868,
  "status": 200
}

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Remaining issues:

  • Need a way to throttle load on test clusters like minikube. It's pretty significant by default.
  • Metrics are logged at INFO level, lots and lots of it. With GUI, curl and export to monitoring tools that shouldn't be necessary.

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Some info on Prometheus (non-)compatibility: jolokia/jolokia#206. Mentions https://github.com/fabric8io/agent-bond, and also the importance of a whitelist as we discovered in #49.

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Rate can probably be reduced using produce.record.delay.ms, see https://github.com/linkedin/kafka-monitor/wiki/Service-Configuration#produce-service-configuration-parameters.

Here is probably the logging statement. Should be possible to exclude using custom log4j config.

There's also a GraphiteMetricsReporterService so maybe it's trivial to produce a PrometheusMetricsReporterService.

Instead you need some kind of metrics export.

Currently I only get a lot of `records-produced-total` but no latencies etc.
@solsson solsson force-pushed the linkedin-kafka-monitor branch from 5f86aa9 to e5b1acf Compare September 29, 2018 12:49
@solsson solsson modified the milestones: 5.0 - Java 11, 5.1 Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants