-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent error status for gRPC and HTTP #3110
Comments
@thoslin would something like this work?
|
@MrAlias this is exactly what i'm looking for. thanks. can we consider submitting a PR for this? |
RESOURCE_EXHAUSTED, for example, seems like an ERROR I would definitely like to notice in my tracing system by default (might correspond to HTTP 503 or 429 depending on context). All in all, since gRPC has no client/server distinction baked into the error codes (like HTTP 4xx, 5xx), I would suggest to set everything as Error, except OK (unset), i.e. I think the current status quo is fine. |
@Oberon00 we can consider both client and server side error by extending that table. I think that's how convention for HTTP works. for example,
So we may have similar definition for INVALID_ARGUMENT, if it is SpanKind.SERVER, status is unset, but if it is SpanKind.CLIENT, then status is Error. |
Does this work?
|
@thoslin Nice table 👍 I think that if SpanKind.SERVER Status is Error then SpanKind.SERVER Status should be Error as well. This would be alligned with the HTTP convention.
Here is my proposal:
|
The determination here will be a forcing function for how these codes are used in the wild. The majority of our code lines up with these but there are certainly differences in how my team has interpreted and used some of them. The fact that there are already 3 interpretations listed here in this ticket makes me think its not that cut and dry. I don't think it would be a huge lift to migrate to whatever is decided here but Is it the job of the open telemetry to be this forcing function? One other option would be to allow it to be customizable. Either way would be a step forward. |
I was thinking about the same.
|
FWIW, here are some places were maps to grpc codes exist: https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto |
@fantapop Thanks this would make
|
Let's discuss this in tomorrow's Spec meeting. We need to clarify at least the basic codes. |
@carlosalberto @tigrannajaryan Is there any update on how we want to address this issue? |
Is there any update on this? |
If there would be no objections, next week I am going to create a PR that aligns the gRPC sem. conv. to the latest table proposed here |
@carlosalberto Can you assign me? |
PR created #3333 |
Triaging this as "accepted" since I think this is an omission compared to http conventions. |
Fixes #3110 ## Changes The idea behind the PR is to make the spam statuses gRPC convention similar to HTTP semantic conventions. The gRPC statuses -> HTTP status codes mapping is not anywhere strictly defined. However, there is are some approximations which can be found: - grpc/grpc@bb04e07#diff-c94ff143c8f378e6925a985fa18528a8254a6d7fc34bc855e1de13f1e7f3e464 - https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto I got confused if we should treat `INTERNAL` as `Error` for `SpanKind.SERVER` because of: - https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/handler_server.go#L90C8-L92 - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/http_util.go#L73 On the other hand, [the description](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md) of `INTERNAL` says: > Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. Therefore, I decided to leave it as `Error`. Also because this is backwards compatible (at least for this gRPC status). --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Fixes open-telemetry/opentelemetry-specification#3110 ## Changes The idea behind the PR is to make the spam statuses gRPC convention similar to HTTP semantic conventions. The gRPC statuses -> HTTP status codes mapping is not anywhere strictly defined. However, there is are some approximations which can be found: - grpc/grpc@bb04e07#diff-c94ff143c8f378e6925a985fa18528a8254a6d7fc34bc855e1de13f1e7f3e464 - https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto I got confused if we should treat `INTERNAL` as `Error` for `SpanKind.SERVER` because of: - https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/handler_server.go#L90C8-L92 - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/http_util.go#L73 On the other hand, [the description](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md) of `INTERNAL` says: > Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. Therefore, I decided to leave it as `Error`. Also because this is backwards compatible (at least for this gRPC status). --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Fixes open-telemetry/opentelemetry-specification#3110 ## Changes The idea behind the PR is to make the spam statuses gRPC convention similar to HTTP semantic conventions. The gRPC statuses -> HTTP status codes mapping is not anywhere strictly defined. However, there is are some approximations which can be found: - grpc/grpc@bb04e07#diff-c94ff143c8f378e6925a985fa18528a8254a6d7fc34bc855e1de13f1e7f3e464 - https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto I got confused if we should treat `INTERNAL` as `Error` for `SpanKind.SERVER` because of: - https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/handler_server.go#L90C8-L92 - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/http_util.go#L73 On the other hand, [the description](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md) of `INTERNAL` says: > Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. Therefore, I decided to leave it as `Error`. Also because this is backwards compatible (at least for this gRPC status). --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
…3333) Fixes open-telemetry#3110 ## Changes The idea behind the PR is to make the spam statuses gRPC convention similar to HTTP semantic conventions. The gRPC statuses -> HTTP status codes mapping is not anywhere strictly defined. However, there is are some approximations which can be found: - grpc/grpc@bb04e07#diff-c94ff143c8f378e6925a985fa18528a8254a6d7fc34bc855e1de13f1e7f3e464 - https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto I got confused if we should treat `INTERNAL` as `Error` for `SpanKind.SERVER` because of: - https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/handler_server.go#L90C8-L92 - https://github.com/grpc/grpc-go/blob/a02aae6168aa683a1f106e285ec10eb6bc1f6ded/internal/transport/http_util.go#L73 On the other hand, [the description](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md) of `INTERNAL` says: > Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors. Therefore, I decided to leave it as `Error`. Also because this is backwards compatible (at least for this gRPC status). --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Hi, we found that the error status convention is different between gRPC and HTTP, for example, not found error in gRPC will set status to error, while in HTTP it is unset.
gPRC: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/rpc/#grpc-status
HTTP: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/http/#status
This causes trouble for us. we have some gRPC services instrumented using the offical OpenTelemetry SDK following the specification, and it will set error status for not found errors during instrumentation. And as we're using Splunk APM service, which treats error status as server side errors https://docs.splunk.com/Observability/apm/apm-spans-traces/apm-errors.html#how-are-error-spans-detected, so we run into a situation where alerts will be triggered by not found errors.
We could work around this by tuning our alert rules to distinguish between gRPC and HTTP, but I'd like to check if it is possible to update the convention for gRPC to have more fine-grained handling as HTTP and not set not found error as error status?
The text was updated successfully, but these errors were encountered: