Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JMX Scraper: Kafka server, producer and consumer YAMLs and integration tests added #1670

Merged
merged 8 commits into from
Feb 14, 2025

Conversation

robsunday
Copy link
Contributor

@robsunday robsunday commented Jan 24, 2025

Three target systems connected to Kafka have been added:

  1. Server
  2. Producer
  3. Consumer

To fully test Kafka metrics there is a need to spin multiple containers, so these three components can correctly interact and metrics can be correctly generated.
Base integration test class TargetSystemIntegrationTest was modified to support initialization of multiple containers.

There are still some TODOs left that will be fixed in subsequent PRs

Part of #1362

@@ -86,12 +88,23 @@ static void afterAll() {

@AfterEach
void afterEach() {
if (scraper != null && scraper.isRunning()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] I rearranged containers shutdown sequence to be in reverse order than starting to avoid unnecessary errors in the logs

}

protected void verifyMetrics() {
MetricsVerifier metricsVerifier = createMetricsVerifier();
await()
.atMost(Duration.ofSeconds(60))
.pollInterval(Duration.ofSeconds(1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Just to decrease CPU usage a bit if we do not have all metrics available immediately.

metric
.hasDescription(
"The average number of bytes consumed for all topics per second")
.hasUnit("By")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Changed from 'by' to follow semconv

metric
.hasDescription(
"The average number of records consumed for all topics per second")
.hasUnit("{record}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Multiple units in this file have been changed from "1" to annotated form to follow semconv

"sh",
"-c",
"echo 'Sending messages to test-topic-1'; "
+ "i=1; while true; do echo \"Message $i\"; sleep .25; i=$((i+1)); done | /opt/bitnami/kafka/bin/kafka-console-producer.sh --bootstrap-server "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Topic is automatically created when messages are sent to it - no need to manually create it

.withEnv("JMX_PORT", Integer.toString(jmxPort))
.withExposedPorts(jmxPort)
.waitingFor(Wait.forListeningPorts(jmxPort));
// .waitingFor(Wait.forLogMessage(".*Sending messages to test-topic-1.*", 1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Will be cleaned

metric: kafka.purgatory.size
type: updowncounter
desc: The number of requests waiting in purgatory
unit: "{request}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Please verify all annotations of units. In the original files they were either in plular form, or were missing (unit: "1" was used instead)

unit: ms
request-rate:
desc: The average number of requests sent per second
unit: '{request}'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] Please verify all annotations of units. In the original files they were either in plular form, or were missing (unit: "1" was used instead)

@robsunday robsunday marked this pull request as ready for review January 24, 2025 12:58
@robsunday robsunday requested a review from a team as a code owner January 24, 2025 12:58
bytes-consumed-rate:
metric: total.bytes-consumed-rate
desc: The average number of bytes consumed for all topics per second
unit: 'By'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] we can remove quotes here on the unit for simplicity/consistency.

Suggested change
unit: 'By'
unit: By

Comment on lines 41 to 46
record-error-rate:
desc: The average per-second number of record sends that resulted in errors for a topic
record-retry-rate:
desc: The average per-second number of retried record sends for a topic
record-send-rate:
desc: The average number of records sent per second for a topic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] because the unit is different for a few metrics means it's probably simpler to always describe the unit at the same level instead of only overriding when it's different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case I agree with your opinion - it is easier to grasp

@github-actions github-actions bot requested a review from SylvainJuge February 4, 2025 12:13
Code cleanups
Other code review related changes
Comment on lines 214 to 236
// TODO: Find out how to force Kafka to generate these metrics
// .add(
// "kafka.logs.flush.count",
// metric ->
// metric
// .hasDescription("Log flush count")
// .hasUnit("{flush}")
// .isCounter()
// .hasDataPointsWithoutAttributes())
// .add(
// "kafka.logs.flush.time.median",
// metric ->
// metric
// .hasDescription("Log flush time - 50th percentile")
// .hasUnit("ms")
// .isGauge()
// .hasDataPointsWithoutAttributes())
// .add(
// "kafka.logs.flush.time.99p",
// metric ->
// metric
// .hasDescription("Log flush time - 99th percentile")
// .hasUnit("ms")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tracking issue might be more visible than these comments. I think you should open an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #1726

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Wowza that's a lot of metrics. :) Makes one wish that kafka had native otel support ootb... 🤔

@robsunday
Copy link
Contributor Author

Looks good to me. Wowza that's a lot of metrics. :) Makes one wish that kafka had native otel support ootb... 🤔

Yes, and newer Kafka versions have surely even more.

@trask
Copy link
Member

trask commented Feb 10, 2025

@robsunday can you merge main into your PR branch? that will fix the required gradle-wrapper-validation check, thanks!

JMX Metrics: metric descriptions updated to match common convention
@trask trask merged commit b92a465 into open-telemetry:main Feb 14, 2025
17 checks passed
breedx-splk pushed a commit to breedx-splk/opentelemetry-java-contrib that referenced this pull request Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants