Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opentelemetry-operations-go/exporter/trace rpc error: code = Unavailable #208

Closed
eqinox76 opened this issue Sep 23, 2021 · 10 comments · Fixed by #550
Closed

opentelemetry-operations-go/exporter/trace rpc error: code = Unavailable #208

eqinox76 opened this issue Sep 23, 2021 · 10 comments · Fixed by #550
Assignees
Labels
enhancement New feature or request priority: p3

Comments

@eqinox76
Copy link

Hey all,

our monitoring sometimes shows the following error message from the traces exporter:

rpc error: code = Unavailable desc = The service is currently unavailable.

stack:
/home/go/pkg/mod/go.opentelemetry.io/otel@v1.0.0/handler.go:106                                   Handle
/home/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.0.0/trace/batch_span_processor.go:248            (*batchSpanProcessor).processQueue
/home/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.0.0/trace/batch_span_processor.go:111            NewBatchSpanProcessor.func1
/opt/go/1.16.3/x64/src/runtime/asm_amd64.s:1371                                                   goexit

We are using the following library versions:

	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.24.0
	go.opentelemetry.io/otel v1.0.0
	go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.0.0
	go.opentelemetry.io/otel/sdk v1.0.0
	go.opentelemetry.io/otel/trace v1.0.0
	github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.0.0-RC2

I think the error is common among grpc systems when something is restarting or connections are lost. Maybe providing a retry option like grpc_retry would help to reduce data loss and the occurrence of such low level alerts.

Regards,
Carsten

@dashpole
Copy link
Contributor

Would any of the options for the OTLP exporter solve your use-case? If possible, it would be nice to support similar grpc-related options to the otlptracegrpc options.

@dashpole dashpole added the enhancement New feature or request label Oct 13, 2021
@damemi
Copy link
Contributor

damemi commented Nov 22, 2022

It looks like the cloud trace client has retry logic built in by default. I'm wondering if I followed that correctly, or if maybe something in the batch processor is dropping that retry handling?

@dashpole
Copy link
Contributor

Can we test it with our mock cloud trace implementation?

@damemi damemi self-assigned this Nov 22, 2022
@damemi
Copy link
Contributor

damemi commented Dec 7, 2022

Re-reading it, I think that retry logic is only on by default for CreateSpans, but our exporter uses BatchWriteSpans. Still need to test this, but adding a note for when I can get to it

@damemi
Copy link
Contributor

damemi commented Dec 21, 2022

Opened a PR to add this in #550, but going to double check with cloud trace folks first to see if there's a specific reason that BatchWriteSpans doesn't retry by default like CreateSpan before merging it

@damemi
Copy link
Contributor

damemi commented Jan 6, 2023

We ended up changing the underlying client defaults after opening googleapis/google-cloud-go#7184. Those changes should get copied from google internal to Github soon, and the clients will be regenerated.

At that point, we'll need to bump our dependencies to pull in the new defaults but otherwise shouldn't require any code changes for this feature. We might be able to use the tests I started to add in #550 though.

However with this we will want to remove our default enabled retry_on_failure collector exporter helper, since it's redundant.

@damemi
Copy link
Contributor

damemi commented Jan 24, 2023

To update this: batch retry is now enabled by default at head in the cloud trace libraries: https://github.com/googleapis/google-cloud-go/blob/19e9d033c263e889d32b74c4c853c440ce136d68/trace/apiv2/trace_client.go#L64-L75

Once a new trace client release is tagged, I'll update #550 to try bumping that dependency here and add tests to verify

@cpheps
Copy link

cpheps commented Sep 11, 2023

@damemi @dashpole I see this retry is now removed as of OTel v0.84.0. I had a question about using the OTel persistent queue. I believe when using the OTel retry an item doesn't leave the queue until it successfully sends. So if it fails and the system shuts down the persistent queue will keep it on disk and retry.

Using the the google-cloud retry does this behavior change? If the retry fails does the item stay in the OTel sending queue or is it removed and then is discarded by the google-cloud library if it fails?

This is probably and edge case but if a collector has a persistent queue and is shut down while things are retrying are they now lost where before with the OTel retry they would be persisted on disk?

@dashpole
Copy link
Contributor

I believe when using the OTel retry an item doesn't leave the queue until it successfully sends.

That doesn't seem like the right behavior if retry is disabled. If you are able to reproduce it, can you open a separate issue?

@cpheps
Copy link

cpheps commented Sep 13, 2023

That doesn't seem like the right behavior if retry is disabled. If you are able to reproduce it, can you open a separate issue?

Yeah let me double check. I may be missing part of how the exporter helper works with retry and sending queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: p3
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants