Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preventing trace breaking in Otel java 2.0 upgrade #728

Closed
wants to merge 2 commits into from
Closed

Preventing trace breaking in Otel java 2.0 upgrade #728

wants to merge 2 commits into from

Conversation

wangzlei
Copy link
Contributor

@wangzlei wangzlei commented Jan 25, 2024

Issue #, if available:
OpenTelemetry java 2.0 introduces a breaking change that the default OTLP protocol has been changed from grpc to http/protobuf in order to align with specification. It does not impact ADOT users since ADOT Collector configurations turn on both grpc and http ports for otlp receiver. But for existing EC2 and ECS users, AppSignals has to stick on grpc for traces because: In AppSignals enablement public document, we ask EC2 and ECS users set the endpoints to be a grpc endpoint, which is port 4315 hardcoded in CloudWatch Agent.

we want to move the default transport protocol to be http/protobuf.
So, in order to avoid any impact on users who are already onboarded AppSignals, who use :4315 endpoint for their traces, we will override the protocol from the SDK side to be gRPC.

Description of changes:

In order to avoid any impact on users who are already onboarded AppSignals, who use :4315 endpoint for their traces, we will override the protocol from the SDK side to be gRPC.

Testing:

This change is manually verified in local by:

  • Build ADOT javaagnet after updating the otel dependency version to 2.0.0.
  • Launch an OpenTelemetry Collector, respectively disable grpc and http protocol in receiver.
  • Launch a SpringBoot application with ADOT javaagent, verify 3 different cases:
    • Set environment variable OTEL_EXPORTER_OTLP_PROTOCOL to http/protobuf
    • Set environment variable OTEL_EXPORTER_OTLP_PROTOCOL to grpc
    • Not set environment variable OTEL_EXPORTER_OTLP_PROTOCOL

Command example,

OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4315 OTEL_METRICS_EXPORTER=none OTEL_SMP_ENABLED=true OTEL_AWS_SMP_EXPORTER_ENDPOINT=http://localhost:4315 java -javaagent:/Users/wangzl/workspace/appsig/aws-otel-java-instrumentation/otelagent/build/libs/aws-opentelemetry-agent-1.32.0-SNAPSHOT.jar -jar ./build/libs/spring-boot-0.0.1-SNAPSHOT.jar Application

Collector config

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4315
      http:
        endpoint: 0.0.0.0:4315

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@wangzlei wangzlei requested a review from a team as a code owner January 25, 2024 00:13
Copy link
Contributor

@thpierce thpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed PR summary, great testing and explanation.

@vasireddy99 vasireddy99 requested a review from Aneurysm9 January 25, 2024 16:21
Copy link
Member

@Aneurysm9 Aneurysm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain this is the correct approach.

I see the following rationale offered:

But today AppSignals has to stick on grpc because:

SMP exporter is grpc metrics exporter;
In AppSignals enablement public document, we ask EC2 and ECS users set the endpoints to be a grpc endpoint, which is port 4315 hardcoded in CloudWatch Agent.

but none of these properties are immutable. Documentation can change, agent configuration can change, defaults can change, particularly at a major version boundary.

I also don't see any discussion of why this change was made upstream and whether any of that rationale does or does not apply to this situation and why.

@codecov-commenter
Copy link

Codecov Report

Attention: 110 lines in your changes are missing coverage. Please review.

Comparison is base (09e6487) 85.71% compared to head (1c1243f) 50.53%.
Report is 234 commits behind head on main.

Files Patch % Lines
...ent/providers/AwsAppSignalsCustomizerProvider.java 24.59% 42 Missing and 4 partials ⚠️
...gent/providers/AwsSpanMetricsProcessorBuilder.java 0.00% 20 Missing ⚠️
...ders/AttributePropagatingSpanProcessorBuilder.java 0.00% 16 Missing ⚠️
...viders/AwsMetricAttributesSpanExporterBuilder.java 0.00% 11 Missing ⚠️
...try/javaagent/providers/AwsSpanProcessingUtil.java 90.16% 1 Missing and 5 partials ⚠️
...vaagent/providers/AwsMetricAttributeGenerator.java 96.89% 2 Missing and 3 partials ⚠️
...y/javaagent/providers/AwsSpanMetricsProcessor.java 91.48% 0 Missing and 4 partials ⚠️
...t/providers/AttributePropagatingSpanProcessor.java 94.59% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@              Coverage Diff              @@
##               main     #728       +/-   ##
=============================================
- Coverage     85.71%   50.53%   -35.19%     
- Complexity       19      265      +246     
=============================================
  Files             3       39       +36     
  Lines            49     1312     +1263     
  Branches          5      144      +139     
=============================================
+ Hits             42      663      +621     
- Misses            3      616      +613     
- Partials          4       33       +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wangzlei wangzlei requested a review from Aneurysm9 January 30, 2024 18:32
@wangzlei
Copy link
Contributor Author

I'm not certain this is the correct approach.

I see the following rationale offered:

But today AppSignals has to stick on grpc because:
SMP exporter is grpc metrics exporter;
In AppSignals enablement public document, we ask EC2 and ECS users set the endpoints to be a grpc endpoint, which is port 4315 hardcoded in CloudWatch Agent.

but none of these properties are immutable. Documentation can change, agent configuration can change, defaults can change, particularly at a major version boundary.

I also don't see any discussion of why this change was made upstream and whether any of that rationale does or does not apply to this situation and why.

Addressed.

@vasireddy99
Copy link
Contributor

vasireddy99 commented Jan 30, 2024

IIUC, its to avoid the breaking changes for the next release and to help customers with some time to move to http/protobuf, correct?. should this change be eventually removed and default to http/protobuf , move the appsignals to 4316 or is the plan to continue to suggest users using 4315/grpc, keep this custom change as is?

@vasireddy99 vasireddy99 self-requested a review January 30, 2024 19:28
@wangzlei
Copy link
Contributor Author

wangzlei commented Jan 30, 2024

id the breaking changes for the next release and to help customers with some time to move to http/protobuf, correct?. should this change be eventually removed and default to http/protobuf , move the appsignals to 4316 or is the plan to continue to suggest users using 4315/grpc, keep this custom change as is?

Our next action is to update the EC2 and ECS user instruction, using http/protobuf(4316) for all exporters. This PR would stay here for a long time for backward compatibility, as long as CWAgent does not break the convention that port 4315 is grpc, 4316 is http.

@wangzlei wangzlei changed the title make grpc protocal to be default Preventing trace breaking in Otel java 2.0 upgrade Jan 31, 2024
Comment on lines +81 to +82
if (traceEndpoint != null && traceEndpoint.contains(":4315")) {
overrides.put("otel.exporter.otlp.traces.protocol", "grpc");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this assumption is appropriate. The specification provides a mechanism for specifying the protocol to use independent of the endpoint. A customer may have reason to utilize HTTP/JSON over port 4315 and this weird opaque rejection of their intent is not good.

Copy link
Contributor Author

@wangzlei wangzlei Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SMP flag is enabled, it means customer uses CWAgent for receiving AppSignals metrics. The AppSignals and CWAgent convention is using 4315 for grpc, 4316 for http. These 2 ports are already used and hardcoded by CWAgent.
So, if both SMP is enabled and SDK uses endpoint 4315, it must be grpc protocol.

If CWAgent breaks the contract, does not use 4315 grpc 4316 http in the future, it is a breaking change for all of exisitng AppSignals SDKs running on EKS/ECS/EC2. If it happens, the only way is bump both SDKs and CWAgent major version and call out in user instruction.

@wangzlei wangzlei requested a review from Aneurysm9 January 31, 2024 17:57
Comment on lines +71 to +74
// disable logs exporter by default
if (System.getProperty("otel.logs.exporter", System.getenv("OTEL_LOGS_EXPORTER")) == null) {
overrides.put("otel.logs.exporter", "none");
}
Copy link
Contributor

@thpierce thpierce Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add details about why to the overview. In general, the overview/commit message is out of date with the new approach. Lets bring it up to date to clarify what we are doing and (more importantly) why.

"otel.exporter.otlp.traces.endpoint",
System.getenv("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"));
// using grpc if trace endpoint is :4315
if (traceEndpoint != null && traceEndpoint.contains(":4315")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a logline similar to how we callout changes to exportInterval.

@wangzlei wangzlei closed this by deleting the head repository Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants