Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-xray] Fault/Error metrics were not being generated for spans produced by AWS SDK instrumentation #919

Closed
scaugrated opened this issue Jun 14, 2023 · 1 comment

Comments

@scaugrated
Copy link
Contributor

Component(s)

No response

Is your feature request related to a problem? Please describe.

We observed that Fault/Error metrics were not being generated for spans produced by AWS SDK instrumentation. Here are the details of the investigation by @thpierce :

  • Fault/Error metrics are generated based on the http.status_code attribute found in spans.
  • This attribute is being populated in the following call chain:
  • The described workflow works exactly as expected when the AWS SDK calls an AWS API and gets back a response status 200 - it will construct a response object with that status code and return it, triggering the workflow.
  • However, if the API returns a non-200 status code (e.g. an error or fault code), the AWS SDK simply throws an exception. This means two things:
    • TracingExecutionInterceptor.afterExecution is not called at all, instead TracingExecutionInterceptor.onExecutionFailure is called, which does not call into HttpCommonAttributesExtractor.onEnd at all.
    • Even if onExecutionFailure called onEnd, the response would be null and getStatusCode would not be called.
  • The net result is that http.status_code is not set, so no metrics are produced. This is clearly by design as AwsSdkHttpAttributesGetter implements the generic HttpCommonAttributesGetter, which has the following JavaDoc for getStatusCode: "This is called from Instrumenter.end(Context, Object, Object, Throwable) only when response is non-null.".
    • Looking at other implementations of HttpCommonAttributesGetter like AkkaHttpClientAttributesGetter, we can see that getStatusCode would fail with a NPE if we called it with a null response, so this contract is assumed by other implementations.

Describe the solution you'd like

Fundamentally, the problem is that the common HTTP instrumentation code assumes that status codes can only be delivered via response objects, but the AWS SDK delivers status codes via exceptions.

We look forward to working with the community to provide a comprehensive solution to solve this problem.

In the short term, we have come up with a solution relying on the fact that the exception thrown by the AWS SDK is stored within the produced spans and is accessible in the AwsSpanMetricsProcessor, where we generate Fault/Error metrics.

Describe alternatives you've considered

No response

Additional context

No response

@scaugrated
Copy link
Contributor Author

close this issue and reopen in open-telemetry/opentelemetry-java-instrumentation#8795

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants