-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry support. #556
Conversation
3e066fe
to
0929c8c
Compare
pom.xml
Outdated
@@ -106,6 +106,9 @@ | |||
<jaeger.version>1.6.0</jaeger.version> | |||
<opentracing.version>0.33.0</opentracing.version> | |||
<opentracing-kafka-client.version>0.1.15</opentracing-kafka-client.version> | |||
<opentelemetry.version>1.7.0-alpha</opentelemetry.version> | |||
<opentelemetry-stable.version>1.7.0</opentelemetry-stable.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why there are two opentelemetry version properties (one with -stable
and the other seems to be an alpha) using both in different places to import dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"My" OpenTelemetry artifacts did yet made it to "stable" version, where Jaeger artifcat alfready did.
Hence the 2 diff versions.
But we agreed it's OK to use alpha for those -- as they are the initial ones, and haven't been changed.
I'll check if there is a stable version, but shouldn't make a diff.
(if I change it again now, I need to re-run the whole cycle, for the N-th time ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check if there is a stable version? This "alpha" here bothers me a little bit :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All alphas ... but that's OK, since, afaik / imho, the 1st didn't change at all ...
@@ -62,12 +61,13 @@ | |||
private static final int DEFAULT_EMBEDDED_HTTP_SERVER_PORT = 8080; | |||
|
|||
@SuppressWarnings({"checkstyle:NPathComplexity"}) | |||
@SuppressFBWarnings("REC_CATCH_EXCEPTION") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran mvn clean install -DskipTests spotbugs:check
it spit out this warning / error ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the reason, this is the detection of the problem ... I would like to know why there is now a problem that needs this suppress warning that wasn't there before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, I was just following spotbugs error report.
Perhaps checkstyle plugin upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to understand what's the error and then at least opening an issue with the reason to track and doing a change later to avoid doing it in the current PR. If spotbugs is raising this maybe we don't see a potential bug ... it's worth investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the issue of "eating" the exception:
Which in our case that's not (completely) the case -- we log and exit the app ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link you pasted seems to suggest that we should not catch just Exception
as the main does but RuntimeException
or being more specific. Would it be the solution instead of suppress the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho, we can leave it as is ... as that's from main() and we log and exit.
@@ -24,6 +25,7 @@ | |||
* @param jmxCollectorRegistry JmxCollectorRegistry instance for scraping metrics from JMX endpoints | |||
* @param meterRegistry MeterRegistry instance for scraping metrics exposed through Vert.x | |||
*/ | |||
@SuppressFBWarnings("SIC_INNER_SHOULD_BE_STATIC_ANON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, checkstyle / spotbugs complained ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same ... as you can see, I didn't change anything else in this class.
src/main/java/io/strimzi/kafka/bridge/tracing/OpenTelemetryHandle.java
Outdated
Show resolved
Hide resolved
6ca8160
to
cea0565
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the backwards compatibility? I.e. that the old OpenTracing tracing works?
src/main/java/io/strimzi/kafka/bridge/converter/DefaultDeserializer.java
Outdated
Show resolved
Hide resolved
src/main/java/io/strimzi/kafka/bridge/tracing/OpenTracingHandle.java
Outdated
Show resolved
Hide resolved
src/test/java/io/strimzi/kafka/bridge/tracing/OpenTelemetryTest.java
Outdated
Show resolved
Hide resolved
src/main/java/io/strimzi/kafka/bridge/tracing/TracingConstants.java
Outdated
Show resolved
Hide resolved
src/main/java/io/strimzi/kafka/bridge/tracing/TracingHandle.java
Outdated
Show resolved
Hide resolved
452827f
to
494dc8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good news ... the OpenTracing still seems to work fine :-). But I didn't managed to get anything using OTel. When I try to follow the docs PR from @PaulRMellor, it always fails to connect to my Jaeger. Do you have some example configurations for Jaeger Operator and for the agent to play with?
bin/kafka_bridge_run.sh
Outdated
# enabling OpenTelemetry with Jaeger | ||
if [ -n "$OTEL_SERVICE_NAME" ]; then | ||
export OTEL_TRACES_EXPORTER="jaeger" | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should this be in this script and not in bin/docker/kafka_bridge_run.sh
? That script is supposed to be used by users running the bridge on their own. they would not configure OTEL_SERVICE_NAME
but they would edit the configuration file. Should this be configured in the config file instead?
config/application.properties
Outdated
#bridge.tracing=jaeger-otel | ||
#bridge.tracing.service-name=strimzi-kafka-bridge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the exporter configuration I commented abotu above => should this also configure the address of the Jaeger agent etc.? Or maybe not => we didn't do anything special for OpenTracing configuration. I guess the bridge.tracing.service-name
is confusing. Why do we have it here and the others not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno. @ppatierno ? ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alesj but the current bridge doesn't have such field in the properties file. It has just bridge.tracing=jaeger
that you changed into bridge.tracing=jaeger-otel
. You added bridge.tracing.service-name
, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you could customize your spans / traces names.
It can be fixed to strimzi-kafka-bridge
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the point is, how does it differ from the environment variable and from the system property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we follow the same order -- env, sys, config.
I'm happy to drop any or all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same order as what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As how OTel lookups up config -- env, sys, custom config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... the bridge is using the Vert.x Config for getting the configuration.
As you can see here we provides two stores for configuration: file (the application.properties) and env vars. Any env var value overrides the same parameter specified in the properties file (because how stores are added matters and in our case is fileStore
first and then envStore
as explained here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll remove sys prop lookup.
Where I think the lookup order is then OK -- env first, if no env, then config prop.
config/application.properties
Outdated
#bridge.tracing=jaeger | ||
# OpenTelemetry support | ||
#bridge.tracing=jaeger-otel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be my misunderstanding. But do we still need the jaeger
part here? Shouldn't the type be just something like opentracing
or otel
? I thought OpenTelemetry should abstract from the implementations. Not necessarily saying it is wrong, just curious if it is still needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed of course.
It was jaeger
for some unknown reason -- as if there was ever any other option? -- so @ppatierno ? ^^
... and I just followed that unknown reason ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase the question ... what it the exact relationship between OTel and Jaeger and how do things like Zipkin fit in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't. :-)
Jaeger is a fixed choice, so imho not a good choice of config value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean that it is a fixed choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ... at the end it's that exporter config + right deps on the classpath.
I can add Zipkin, etc ... if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not have to add Zipkin. But this clearly impacts the configuration and the initial question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason was this
if ("jaeger".equals(bridgeConfig.getTracing())) { |
bridge.tracing
is used to enable tracing and we used jaeger thinking about potential future value with other tracing systems. It's even reflected from the CO when configuring the tracing on the bridge through the STRIMZI_TRACING env var here https://github.com/strimzi/strimzi-kafka-bridge/blob/main/bin/docker/kafka_bridge_config_generator.sh#L4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK ... this needs to be fixed then as well.
So I guess `jaeger-otel' is not a good name?
Or, actually, we need a new differentiator? By tracing type not impl ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK ... this needs to be fixed then as well.
OK, this is fine, it takes the env var ...
<dependency> | ||
<groupId>io.grpc</groupId> | ||
<artifactId>grpc-netty-shaded</artifactId> | ||
<version>${grpc.version}</version> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting back to this ... are we actually using GRPC? The OpenTracing IMHO used Thrift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [Jaeger](https://www.jaegertracing.io/docs/1.21/apis/#protobuf-via-grpc-stable) exporter. This exporter uses gRPC for its communications protocol.
public String serviceName(BridgeConfig config) { | ||
String serviceName = System.getenv(envName()); | ||
if (serviceName == null) { | ||
serviceName = System.getProperty(OPENTELEMETRY_SERVICE_NAME_PROPERTY_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for getting it from the properties suddenly? Do we have 3 way s how to configure the service? In envvar., system propeerty and in the config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how OTel uses this -- env, sys -- but it can be left out.
@ppatierno ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the question for me? The tracing with Jaeger used this way in the main application
https://github.com/strimzi/strimzi-kafka-bridge/blob/main/src/main/java/io/strimzi/kafka/bridge/Application.java#L139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we go with all 3 checks or not? Or just exact config property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alesj I didn't get the question for me? Isn't it the code you added. If you want to know how bridge configuration works, take a look at the previous comment about using Vert.x Config so in our case (due to stores order) env vars first and application.properties file ... but no system properties at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as "said" just before, I'll remove sys prop lookup.
This is the env vars I have (plus enabled OTEL_TRACES_EXPORTER=jaeger;JAEGER_SERVICE_NAME=strimzi-kafka-bridge and it works for me from IDE. |
Ok, so how do you do it on Kube with Jaeger operator? |
I think the only thing missing here is the OTEL_EXPORTER_JAEGER_ENDPOINT in k8s. |
src/main/java/io/strimzi/kafka/bridge/tracing/OpenTelemetryHandle.java
Outdated
Show resolved
Hide resolved
5229002
to
9e909ca
Compare
0689fa7
to
52d88be
Compare
bin/docker/kafka_bridge_run.sh
Outdated
# enabling OpenTelemetry with Jaeger by default | ||
if [ -n "$OTEL_SERVICE_NAME" ] && [ -z "$OTEL_TRACES_EXPORTER" ]; then | ||
export OTEL_TRACES_EXPORTER="jaeger" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be gated by STRIMZI_TRACING
as well? That is what enabled the tracing in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo no need for STRIMZI_TRACING
gates.
I'll remove this snippet, and just leave similar one in bin/kafka_bridge_run.sh
Signed-off-by: Ales Justin <ales.justin@gmail.com>
Closed in favor of a new PR: #633 |
Signed-off-by: Ales Justin ales.justin@gmail.com