Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit option for SDK otlp exporter to disable retry #4148

Closed

Conversation

jack-berg
Copy link
Member

That have been several attempts at further refining the spec around OTLP retry configuration options. What is clear is that SDK OTLP exporters MUST handle transient errors with an exponential retry strategy. But we failed to align on the details of that strategy and what options should be configurable.

I don't have the ability to drive the answer to all the questions, but believe we can agree on a narrower scope.

We're trying to add retry configuration to file configuration in open-telemetry/opentelemetry-configuration#97. In opentelemetry-configuration, we have a policy stating we only add configuration properties for options described in the spec. This is important for a number of reasons, including that spec'd options have clearly defined default. We use JSON schema to model the schema in opentelemetry-configuration and are unable to encode the default value for properties. Because of this, we defer to the default values as defined in the spec.

Coming back to retry. This PR makes it explicit that SDK OTLP exporters SHOULD have an option to disable the retry strategy, and defines that by default retry is enabled (which is most aligned the current wording "Transient errors MUST be handled with a retry strategy.").

Hopefully despite the different implementations of retry, we can all agree that 1. It should be able to be enabled / disabled. 2. It should be enabled by default.

@brettmc
Copy link
Contributor

brettmc commented Jul 20, 2024

Can we block this pending a decision on #4138 (comment) (ie, is disabled required if we opt to implement something like a none retry policy) ?

edit: withdrawn, no objections from me :)

destination until the network is restored or the destination has recovered.

SDKs SHOULD provide a configuration option to disable the retry strategy. This
option MAY be named `disabled`, AND MUST be `false` by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the conditional interpretation of the required enablement.

Suggested change
option MAY be named `disabled`, AND MUST be `false` by default.
option MAY be named `disabled`. The retry strategy MUST be enabled by default.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 31, 2024
@zeitlinger
Copy link
Member

Can we block this pending a decision on #4138 (comment) (ie, is disabled required if we opt to implement something like a none retry policy) ?

I find disabled clear and easy, regardless of whether a retry strategy is added

@brettmc
Copy link
Contributor

brettmc commented Aug 5, 2024

I find disabled clear and easy, regardless of whether a retry strategy is added

That's a good point, and it also looks to be the preferred way to disable things.

@github-actions github-actions bot removed the Stale label Aug 6, 2024
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 13, 2024
@brettmc
Copy link
Contributor

brettmc commented Aug 13, 2024

not stale, don't close.

@tigrannajaryan
Copy link
Member

Hopefully despite the different implementations of retry, we can all agree that 1. It should be able to be enabled / disabled.

@jack-berg can you clarify why we want to be able to disable it?

@jack-berg
Copy link
Member Author

@jack-berg can you clarify why we want to be able to disable it?

Couple of things come to mind:

  • To reflect the reality of implementations. All implementations I'm aware of have this option.
  • To opt-out of the built in retry mechanism in favor of your own.

@tigrannajaryan
Copy link
Member

  • To opt-out of the built in retry mechanism in favor of your own.

Do we expect the SDK exporter to return an error in the form that is indicates to the caller a transient, retryable error? I don't think the exporter spec allows that. The Export() function can return a failure and that failure indicates that the data must be dropped (e.g. it was bad data that cannot be encoded). Quote from the trace spec:

Failure - exporting failed. The batch must be dropped. For example, this can happen when the batch contains bad data and cannot be serialized.

Based on this returned value the caller cannot possibly make a decision to retry on their own. It would be wrong for the caller to retry.

Did you mean some other way to provide your own retry mechanism?

@jack-berg
Copy link
Member Author

I don't think the exporter spec allows that.

I would reframe this as the spec doesn't require implementations to provide that. It seems reasonable for language implementation to make the to allow exporters (but not require) to return information about what went wrong in some sort of structured way. Normally, the caller might just log debugging information with this, but there might be enough information to do something else. In java, we recently evolved our CompletableResultCode to optionally include exception information, and we're working on updating OTLP exporters to include exceptions with a set of structured fields describing what happened.

But without updating the spec to recommend or require transmitting this kind of info, we can't say that users can provide their own retry mechanisms. So while I think this is a good thing to aspire to, I concede this argument for the purposes of this PR given the current state of things.

But my other argument still is relevant:

To reflect the reality of implementations. All implementations I'm aware of have this option.

I don't feel especially strongly about this myself. Without this, users of declarative configuration (formerly file configuration) just have to accept that for the time being, retry is always enabled and there are no configuration options. This isn't the end of the world.

destination until the network is restored or the destination has recovered.

SDKs SHOULD provide a configuration option to disable the retry strategy. This
option MAY be named `disabled`, AND MUST be `false` by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only a MAY because of existing solutions? I felt it should be stronger, a SHOULD or even MUST since I'm pretty sure elsewhere in the spec defines that toggles like this must be something like disabled and not enabled.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 29, 2024
Copy link

github-actions bot commented Sep 6, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants