-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preventing trace breaking in Otel java 2.0 upgrade #728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed PR summary, great testing and explanation.
.../java/software/amazon/opentelemetry/javaagent/providers/AwsAppSignalsCustomizerProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain this is the correct approach.
I see the following rationale offered:
But today AppSignals has to stick on grpc because:
SMP exporter is grpc metrics exporter;
In AppSignals enablement public document, we ask EC2 and ECS users set the endpoints to be a grpc endpoint, which is port 4315 hardcoded in CloudWatch Agent.
but none of these properties are immutable. Documentation can change, agent configuration can change, defaults can change, particularly at a major version boundary.
I also don't see any discussion of why this change was made upstream and whether any of that rationale does or does not apply to this situation and why.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #728 +/- ##
=============================================
- Coverage 85.71% 50.53% -35.19%
- Complexity 19 265 +246
=============================================
Files 3 39 +36
Lines 49 1312 +1263
Branches 5 144 +139
=============================================
+ Hits 42 663 +621
- Misses 3 616 +613
- Partials 4 33 +29 ☔ View full report in Codecov by Sentry. |
Addressed. |
IIUC, its to avoid the breaking changes for the next release and to help customers with some time to move to http/protobuf, correct?. should this change be eventually removed and default to |
Our next action is to update the EC2 and ECS user instruction, using http/protobuf(4316) for all exporters. This PR would stay here for a long time for backward compatibility, as long as CWAgent does not break the convention that port 4315 is grpc, 4316 is http. |
if (traceEndpoint != null && traceEndpoint.contains(":4315")) { | ||
overrides.put("otel.exporter.otlp.traces.protocol", "grpc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this assumption is appropriate. The specification provides a mechanism for specifying the protocol to use independent of the endpoint. A customer may have reason to utilize HTTP/JSON over port 4315 and this weird opaque rejection of their intent is not good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SMP flag is enabled, it means customer uses CWAgent for receiving AppSignals metrics. The AppSignals and CWAgent convention is using 4315 for grpc, 4316 for http. These 2 ports are already used and hardcoded by CWAgent.
So, if both SMP is enabled and SDK uses endpoint 4315, it must be grpc protocol.
If CWAgent breaks the contract, does not use 4315 grpc 4316 http in the future, it is a breaking change for all of exisitng AppSignals SDKs running on EKS/ECS/EC2. If it happens, the only way is bump both SDKs and CWAgent major version and call out in user instruction.
// disable logs exporter by default | ||
if (System.getProperty("otel.logs.exporter", System.getenv("OTEL_LOGS_EXPORTER")) == null) { | ||
overrides.put("otel.logs.exporter", "none"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add details about why to the overview. In general, the overview/commit message is out of date with the new approach. Lets bring it up to date to clarify what we are doing and (more importantly) why.
"otel.exporter.otlp.traces.endpoint", | ||
System.getenv("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT")); | ||
// using grpc if trace endpoint is :4315 | ||
if (traceEndpoint != null && traceEndpoint.contains(":4315")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a logline similar to how we callout changes to exportInterval
.
Issue #, if available:
OpenTelemetry java 2.0 introduces a breaking change that the default OTLP protocol has been changed from
grpc
tohttp/protobuf
in order to align with specification. It does not impact ADOT users since ADOT Collector configurations turn on both grpc and http ports for otlp receiver. But for existing EC2 and ECS users, AppSignals has to stick on grpc for traces because: In AppSignals enablement public document, we ask EC2 and ECS users set the endpoints to be a grpc endpoint, which is port 4315 hardcoded in CloudWatch Agent.we want to move the default transport protocol to be
http/protobuf
.So, in order to avoid any impact on users who are already onboarded AppSignals, who use
:4315
endpoint for their traces, we will override the protocol from the SDK side to be gRPC.Description of changes:
In order to avoid any impact on users who are already onboarded AppSignals, who use
:4315
endpoint for their traces, we will override the protocol from the SDK side to be gRPC.Testing:
This change is manually verified in local by:
Command example,
Collector config
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.