Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry support. #556

Closed
wants to merge 1 commit into from
Closed

Add OpenTelemetry support. #556

wants to merge 1 commit into from

Conversation

alesj
Copy link
Contributor

@alesj alesj commented Nov 7, 2021

Signed-off-by: Ales Justin ales.justin@gmail.com

config/application.properties Outdated Show resolved Hide resolved
pom.xml Outdated
@@ -106,6 +106,9 @@
<jaeger.version>1.6.0</jaeger.version>
<opentracing.version>0.33.0</opentracing.version>
<opentracing-kafka-client.version>0.1.15</opentracing-kafka-client.version>
<opentelemetry.version>1.7.0-alpha</opentelemetry.version>
<opentelemetry-stable.version>1.7.0</opentelemetry-stable.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there are two opentelemetry version properties (one with -stable and the other seems to be an alpha) using both in different places to import dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"My" OpenTelemetry artifacts did yet made it to "stable" version, where Jaeger artifcat alfready did.
Hence the 2 diff versions.
But we agreed it's OK to use alpha for those -- as they are the initial ones, and haven't been changed.
I'll check if there is a stable version, but shouldn't make a diff.
(if I change it again now, I need to re-run the whole cycle, for the N-th time ...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if there is a stable version? This "alpha" here bothers me a little bit :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All alphas ... but that's OK, since, afaik / imho, the 1st didn't change at all ...

@@ -62,12 +61,13 @@
private static final int DEFAULT_EMBEDDED_HTTP_SERVER_PORT = 8080;

@SuppressWarnings({"checkstyle:NPathComplexity"})
@SuppressFBWarnings("REC_CATCH_EXCEPTION")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran mvn clean install -DskipTests spotbugs:check it spit out this warning / error ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the reason, this is the detection of the problem ... I would like to know why there is now a problem that needs this suppress warning that wasn't there before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, I was just following spotbugs error report.
Perhaps checkstyle plugin upgrade?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to understand what's the error and then at least opening an issue with the reason to track and doing a change later to avoid doing it in the current PR. If spotbugs is raising this maybe we don't see a potential bug ... it's worth investigating.

Copy link
Contributor Author

@alesj alesj Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the issue of "eating" the exception:

Which in our case that's not (completely) the case -- we log and exit the app ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link you pasted seems to suggest that we should not catch just Exception as the main does but RuntimeException or being more specific. Would it be the solution instead of suppress the warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, we can leave it as is ... as that's from main() and we log and exit.

@@ -24,6 +25,7 @@
* @param jmxCollectorRegistry JmxCollectorRegistry instance for scraping metrics from JMX endpoints
* @param meterRegistry MeterRegistry instance for scraping metrics exposed through Vert.x
*/
@SuppressFBWarnings("SIC_INNER_SHOULD_BE_STATIC_ANON")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, checkstyle / spotbugs complained ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ... as you can see, I didn't change anything else in this class.

@alesj alesj force-pushed the otel1 branch 2 times, most recently from 6ca8160 to cea0565 Compare January 19, 2022 09:37
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the backwards compatibility? I.e. that the old OpenTracing tracing works?

config/application.properties Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
src/main/java/io/strimzi/kafka/bridge/Application.java Outdated Show resolved Hide resolved
src/main/java/io/strimzi/kafka/bridge/MetricsReporter.java Outdated Show resolved Hide resolved
@alesj alesj force-pushed the otel1 branch 2 times, most recently from 452827f to 494dc8b Compare January 31, 2022 19:17
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good news ... the OpenTracing still seems to work fine :-). But I didn't managed to get anything using OTel. When I try to follow the docs PR from @PaulRMellor, it always fails to connect to my Jaeger. Do you have some example configurations for Jaeger Operator and for the agent to play with?

Comment on lines 16 to 20
# enabling OpenTelemetry with Jaeger
if [ -n "$OTEL_SERVICE_NAME" ]; then
export OTEL_TRACES_EXPORTER="jaeger"
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be in this script and not in bin/docker/kafka_bridge_run.sh? That script is supposed to be used by users running the bridge on their own. they would not configure OTEL_SERVICE_NAME but they would edit the configuration file. Should this be configured in the config file instead?

Comment on lines 7 to 8
#bridge.tracing=jaeger-otel
#bridge.tracing.service-name=strimzi-kafka-bridge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the exporter configuration I commented abotu above => should this also configure the address of the Jaeger agent etc.? Or maybe not => we didn't do anything special for OpenTracing configuration. I guess the bridge.tracing.service-name is confusing. Why do we have it here and the others not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno. @ppatierno ? ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alesj but the current bridge doesn't have such field in the properties file. It has just bridge.tracing=jaeger that you changed into bridge.tracing=jaeger-otel. You added bridge.tracing.service-name, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you could customize your spans / traces names.
It can be fixed to strimzi-kafka-bridge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the point is, how does it differ from the environment variable and from the system property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we follow the same order -- env, sys, config.
I'm happy to drop any or all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same order as what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As how OTel lookups up config -- env, sys, custom config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... the bridge is using the Vert.x Config for getting the configuration.
As you can see here we provides two stores for configuration: file (the application.properties) and env vars. Any env var value overrides the same parameter specified in the properties file (because how stores are added matters and in our case is fileStore first and then envStore as explained here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll remove sys prop lookup.
Where I think the lookup order is then OK -- env first, if no env, then config prop.

#bridge.tracing=jaeger
# OpenTelemetry support
#bridge.tracing=jaeger-otel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be my misunderstanding. But do we still need the jaeger part here? Shouldn't the type be just something like opentracing or otel? I thought OpenTelemetry should abstract from the implementations. Not necessarily saying it is wrong, just curious if it is still needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed of course.
It was jaeger for some unknown reason -- as if there was ever any other option? -- so @ppatierno ? ^^
... and I just followed that unknown reason ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase the question ... what it the exact relationship between OTel and Jaeger and how do things like Zipkin fit in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't. :-)
Jaeger is a fixed choice, so imho not a good choice of config value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean that it is a fixed choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ... at the end it's that exporter config + right deps on the classpath.
I can add Zipkin, etc ... if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not have to add Zipkin. But this clearly impacts the configuration and the initial question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason was this

if ("jaeger".equals(bridgeConfig.getTracing())) {
... this bridge.tracing is used to enable tracing and we used jaeger thinking about potential future value with other tracing systems. It's even reflected from the CO when configuring the tracing on the bridge through the STRIMZI_TRACING env var here https://github.com/strimzi/strimzi-kafka-bridge/blob/main/bin/docker/kafka_bridge_config_generator.sh#L4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK ... this needs to be fixed then as well.

So I guess `jaeger-otel' is not a good name?
Or, actually, we need a new differentiator? By tracing type not impl ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK ... this needs to be fixed then as well.

OK, this is fine, it takes the env var ...

Comment on lines +240 to +304
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>${grpc.version}</version>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back to this ... are we actually using GRPC? The OpenTracing IMHO used Thrift.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [Jaeger](https://www.jaegertracing.io/docs/1.21/apis/#protobuf-via-grpc-stable) exporter. This exporter uses gRPC for its communications protocol.

public String serviceName(BridgeConfig config) {
String serviceName = System.getenv(envName());
if (serviceName == null) {
serviceName = System.getProperty(OPENTELEMETRY_SERVICE_NAME_PROPERTY_KEY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for getting it from the properties suddenly? Do we have 3 way s how to configure the service? In envvar., system propeerty and in the config file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how OTel uses this -- env, sys -- but it can be left out.
@ppatierno ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the question for me? The tracing with Jaeger used this way in the main application
https://github.com/strimzi/strimzi-kafka-bridge/blob/main/src/main/java/io/strimzi/kafka/bridge/Application.java#L139

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we go with all 3 checks or not? Or just exact config property?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alesj I didn't get the question for me? Isn't it the code you added. If you want to know how bridge configuration works, take a look at the previous comment about using Vert.x Config so in our case (due to stores order) env vars first and application.properties file ... but no system properties at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as "said" just before, I'll remove sys prop lookup.

@alesj
Copy link
Contributor Author

alesj commented Feb 3, 2022

The good news ... the OpenTracing still seems to work fine :-). But I didn't managed to get anything using OTel. When I try to follow the docs PR from @PaulRMellor, it always fails to connect to my Jaeger. Do you have some example configurations for Jaeger Operator and for the agent to play with?

This is the env vars I have (plus enabled jaeger-otel tracing in config)

OTEL_TRACES_EXPORTER=jaeger;JAEGER_SERVICE_NAME=strimzi-kafka-bridge

and it works for me from IDE.

@scholzj
Copy link
Member

scholzj commented Feb 3, 2022

This is the env vars I have (plus enabled jaeger-otel tracing in config)

OTEL_TRACES_EXPORTER=jaeger;JAEGER_SERVICE_NAME=strimzi-kafka-bridge

and it works for me from IDE.

Ok, so how do you do it on Kube with Jaeger operator?

@alesj
Copy link
Contributor Author

alesj commented Feb 3, 2022

This is the env vars I have (plus enabled jaeger-otel tracing in config)
OTEL_TRACES_EXPORTER=jaeger;JAEGER_SERVICE_NAME=strimzi-kafka-bridge
and it works for me from IDE.

Ok, so how do you do it on Kube with Jaeger operator?

I think the only thing missing here is the OTEL_EXPORTER_JAEGER_ENDPOINT in k8s.

@alesj alesj force-pushed the otel1 branch 3 times, most recently from 5229002 to 9e909ca Compare February 7, 2022 11:16
config/application.properties Outdated Show resolved Hide resolved
Comment on lines 71 to 74
# enabling OpenTelemetry with Jaeger by default
if [ -n "$OTEL_SERVICE_NAME" ] && [ -z "$OTEL_TRACES_EXPORTER" ]; then
export OTEL_TRACES_EXPORTER="jaeger"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be gated by STRIMZI_TRACING as well? That is what enabled the tracing in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo no need for STRIMZI_TRACING gates.
I'll remove this snippet, and just leave similar one in bin/kafka_bridge_run.sh

bin/kafka_bridge_run.sh Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
Signed-off-by: Ales Justin <ales.justin@gmail.com>
@alesj
Copy link
Contributor Author

alesj commented Jun 29, 2022

Closed in favor of a new PR: #633

@alesj alesj closed this Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants