Skip to content

Commit

Permalink
Remove retry_on_failure from the googlecloud exporters (#25900)
Browse files Browse the repository at this point in the history
**Description:**

In
#19203,
we disabled retry_on_failure by default because retry is now handled by
the google cloud client libraries.

However, many users still enable it because the error message encourages
users to enable retry_on_failure. This is causing problems, especially
with metrics because retrying a batch of metrics produces confusing
errors (like out-of-order or duplicate timeseries errors) and spams
users logs more than necessary.

This PR removes the retry_on_failure config from the GMP and googlecloud
exporters.
  • Loading branch information
dashpole authored Aug 22, 2023
1 parent c0040c0 commit 5ab2db6
Show file tree
Hide file tree
Showing 14 changed files with 35 additions and 103 deletions.
27 changes: 27 additions & 0 deletions .chloggen/remove-gcp-retryonfailure.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: googlecloudexporter

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: remove retry_on_failure from the googlecloud exporter. The exporter itself handles retries, and retrying can cause issues.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [57233]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user, api]
7 changes: 1 addition & 6 deletions exporter/googlecloudexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,11 +216,6 @@ The following configuration options are supported:
- `prefix`: Match resource keys by prefix.
- `regex`: Match resource keys by regex.
- `compression` (optional): Enable gzip compression for gRPC requests (valid vlaues: `gzip`).
- `retry_on_failure` (optional): Configuration for how to handle retries when sending data to Google Cloud fails.
- `enabled` (default = false)
- `initial_interval` (default = 5s): Time to wait after the first failure before retrying; ignored if `enabled` is `false`
- `max_interval` (default = 30s): Is the upper bound on backoff; ignored if `enabled` is `false`
- `max_elapsed_time` (default = 120s): Is the maximum amount of time spent trying to send a batch; ignored if `enabled` is `false`
- `sending_queue` (optional): Configuration for how to buffer traces before sending.
- `enabled` (default = true)
- `num_consumers` (default = 10): Number of consumers that dequeue batches; ignored if `enabled` is `false`
Expand All @@ -229,7 +224,7 @@ The following configuration options are supported:
- `num_seconds` is the number of seconds to buffer in case of a backend outage
- `requests_per_second` is the average number of requests per seconds.

Note: These `retry_on_failure` and `sending_queue` are provided (and documented) by the [Exporter Helper](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration)
Note: The `sending_queue` is provided (and documented) by the [Exporter Helper](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration)

Beyond standard YAML configuration as outlined in the sections that follow,
exporters that leverage the net/http package (all do today) also respect the
Expand Down
1 change: 0 additions & 1 deletion exporter/googlecloudexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ type Config struct {
// Timeout for all API calls. If not set, defaults to 12 seconds.
exporterhelper.TimeoutSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
exporterhelper.QueueSettings `mapstructure:"sending_queue"`
exporterhelper.RetrySettings `mapstructure:"retry_on_failure"`
}

func (cfg *Config) Validate() error {
Expand Down
9 changes: 0 additions & 9 deletions exporter/googlecloudexporter/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ import (
"time"

"github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector"
"github.com/cenkalti/backoff/v4"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component"
Expand Down Expand Up @@ -65,14 +64,6 @@ func TestLoadConfig(t *testing.T) {
},
},
},
RetrySettings: exporterhelper.RetrySettings{
Enabled: true,
InitialInterval: 10 * time.Second,
MaxInterval: 1 * time.Minute,
MaxElapsedTime: 10 * time.Minute,
RandomizationFactor: backoff.DefaultRandomizationFactor,
Multiplier: backoff.DefaultMultiplier,
},
QueueSettings: exporterhelper.QueueSettings{
Enabled: true,
NumConsumers: 2,
Expand Down
12 changes: 3 additions & 9 deletions exporter/googlecloudexporter/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,8 @@ func NewFactory() exporter.Factory {

// createDefaultConfig creates the default configuration for exporter.
func createDefaultConfig() component.Config {
retrySettings := exporterhelper.NewDefaultRetrySettings()
retrySettings.Enabled = false
return &Config{
TimeoutSettings: exporterhelper.TimeoutSettings{Timeout: defaultTimeout},
RetrySettings: retrySettings,
QueueSettings: exporterhelper.NewDefaultQueueSettings(),
Config: collector.DefaultConfig(),
}
Expand All @@ -71,8 +68,7 @@ func createLogsExporter(
// Disable exporterhelper Timeout, since we are using a custom mechanism
// within exporter itself
exporterhelper.WithTimeout(exporterhelper.TimeoutSettings{Timeout: 0}),
exporterhelper.WithQueue(eCfg.QueueSettings),
exporterhelper.WithRetry(eCfg.RetrySettings))
exporterhelper.WithQueue(eCfg.QueueSettings))
}

// createTracesExporter creates a trace exporter based on this config.
Expand All @@ -94,8 +90,7 @@ func createTracesExporter(
// Disable exporterhelper Timeout, since we are using a custom mechanism
// within exporter itself
exporterhelper.WithTimeout(exporterhelper.TimeoutSettings{Timeout: 0}),
exporterhelper.WithQueue(eCfg.QueueSettings),
exporterhelper.WithRetry(eCfg.RetrySettings))
exporterhelper.WithQueue(eCfg.QueueSettings))
}

// createMetricsExporter creates a metrics exporter based on this config.
Expand All @@ -117,6 +112,5 @@ func createMetricsExporter(
// Disable exporterhelper Timeout, since we are using a custom mechanism
// within exporter itself
exporterhelper.WithTimeout(exporterhelper.TimeoutSettings{Timeout: 0}),
exporterhelper.WithQueue(eCfg.QueueSettings),
exporterhelper.WithRetry(eCfg.RetrySettings))
exporterhelper.WithQueue(eCfg.QueueSettings))
}
2 changes: 1 addition & 1 deletion exporter/googlecloudexporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ go 1.20

require (
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector v0.42.0
github.com/cenkalti/backoff/v4 v4.2.1
github.com/stretchr/testify v1.8.4
go.opentelemetry.io/collector/component v0.83.0
go.opentelemetry.io/collector/confmap v0.83.0
Expand All @@ -22,6 +21,7 @@ require (
cloud.google.com/go/trace v1.10.1 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.18.0 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.42.0 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/fsnotify/fsnotify v1.6.0 // indirect
github.com/go-logr/logr v1.2.4 // indirect
Expand Down
5 changes: 0 additions & 5 deletions exporter/googlecloudexporter/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ googlecloud/customname:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
metric:
prefix: prefix
skip_create_descriptor: true
Expand Down
45 changes: 0 additions & 45 deletions exporter/googlecloudexporter/testdata/legacyconfig.yaml

This file was deleted.

7 changes: 1 addition & 6 deletions exporter/googlemanagedprometheusexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,6 @@ The following configuration options are supported:
- `resource_filters` (optional): Provides a list of filters to match resource attributes which will be included in metric labels.
- `prefix` (optional): Match resource attribute keys by prefix.
- `regex` (optional): Match resource attribute keys by regex.
- `retry_on_failure` (optional): Configuration for how to handle retries when sending data to Google Cloud fails.
- `enabled` (default = false)
- `initial_interval` (default = 5s): Time to wait after the first failure before retrying; ignored if `enabled` is `false`
- `max_interval` (default = 30s): Is the upper bound on backoff; ignored if `enabled` is `false`
- `max_elapsed_time` (default = 120s): Is the maximum amount of time spent trying to send a batch; ignored if `enabled` is `false`
- `sending_queue` (optional): Configuration for how to buffer traces before sending.
- `enabled` (default = true)
- `num_consumers` (default = 10): Number of consumers that dequeue batches; ignored if `enabled` is `false`
Expand All @@ -47,7 +42,7 @@ The following configuration options are supported:
- `num_seconds` is the number of seconds to buffer in case of a backend outage
- `requests_per_second` is the average number of requests per seconds.

Note: These `retry_on_failure` and `sending_queue` are provided (and documented) by the [Exporter Helper](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration)
Note: The `sending_queue` is provided (and documented) by the [Exporter Helper](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#configuration)

## Example Configuration

Expand Down
1 change: 0 additions & 1 deletion exporter/googlemanagedprometheusexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ type Config struct {
// Timeout for all API calls. If not set, defaults to 12 seconds.
exporterhelper.TimeoutSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
exporterhelper.QueueSettings `mapstructure:"sending_queue"`
exporterhelper.RetrySettings `mapstructure:"retry_on_failure"`
}

// GMPConfig is a subset of the collector config applicable to the GMP exporter.
Expand Down
9 changes: 0 additions & 9 deletions exporter/googlemanagedprometheusexporter/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ import (
"testing"
"time"

"github.com/cenkalti/backoff/v4"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component"
Expand Down Expand Up @@ -50,14 +49,6 @@ func TestLoadConfig(t *testing.T) {
},
},
},
RetrySettings: exporterhelper.RetrySettings{
Enabled: true,
InitialInterval: 10 * time.Second,
MaxInterval: 1 * time.Minute,
MaxElapsedTime: 10 * time.Minute,
RandomizationFactor: backoff.DefaultRandomizationFactor,
Multiplier: backoff.DefaultMultiplier,
},
QueueSettings: exporterhelper.QueueSettings{
Enabled: true,
NumConsumers: 2,
Expand Down
6 changes: 1 addition & 5 deletions exporter/googlemanagedprometheusexporter/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,8 @@ func NewFactory() exporter.Factory {

// createDefaultConfig creates the default configuration for exporter.
func createDefaultConfig() component.Config {
retrySettings := exporterhelper.NewDefaultRetrySettings()
retrySettings.Enabled = false
return &Config{
TimeoutSettings: exporterhelper.TimeoutSettings{Timeout: defaultTimeout},
RetrySettings: retrySettings,
QueueSettings: exporterhelper.NewDefaultQueueSettings(),
GMPConfig: GMPConfig{
MetricConfig: MetricConfig{
Expand Down Expand Up @@ -68,6 +65,5 @@ func createMetricsExporter(
// Disable exporterhelper Timeout, since we are using a custom mechanism
// within exporter itself
exporterhelper.WithTimeout(exporterhelper.TimeoutSettings{Timeout: 0}),
exporterhelper.WithQueue(eCfg.QueueSettings),
exporterhelper.WithRetry(eCfg.RetrySettings))
exporterhelper.WithQueue(eCfg.QueueSettings))
}
2 changes: 1 addition & 1 deletion exporter/googlemanagedprometheusexporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ go 1.20
require (
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector v0.42.0
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector/googlemanagedprometheus v0.42.0
github.com/cenkalti/backoff/v4 v4.2.1
github.com/stretchr/testify v1.8.4
go.opentelemetry.io/collector v0.83.0
go.opentelemetry.io/collector/component v0.83.0
Expand All @@ -25,6 +24,7 @@ require (
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.18.0 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.42.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/fsnotify/fsnotify v1.6.0 // indirect
Expand Down
5 changes: 0 additions & 5 deletions exporter/googlemanagedprometheusexporter/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,6 @@ exporters:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
googlemanagedprometheus/customprefix:
metric:
prefix: my-metric-domain.com
Expand Down

0 comments on commit 5ab2db6

Please sign in to comment.