Offline Queuing of Logs and Metrics #812

skrawn · 2024-03-06T16:14:57Z

I was trying to deploy the OpenTelemetry Collector on some embedded Linux devices and since these devices are on cellular connections, they are not always able to reach the internet. But I'd still like to be able to queue metrics and logs for upload once the device re-establishes connection. Am I correct that this will work with the sending_queue, retry_on_failure and file_storage components, like this:

extensions:
  file_storage:
    directory: /etc/otelcol/offline
    compaction:
      on_start: true
      directory: /etc/otelcol/offline
      max_transaction_size: 65_536
    fsync: true

exporters:
  googlemanagedprometheus:
    metric:
      compression: gzip
    retry_on_failure:
      enabled: true
      max_elapsed_time: 86400s
    sending_queue:
      enabled: true
      storage: file_storage
      num_consumers: 1
      queue_size: 1000

The reason I ask is that retry_on_failure was not implemented on the Google Managed Prometheus exporter and so requests that timeout due to network failures result in metrics getting discarded. If the retry_on_failure component works as expected, I'll probably try to implement it for the GMP exporter. I also see that there are some problems with retry_on_failure depending on the exporter, like this one for the Google Cloud exporter, so maybe there is some limitation within the collector that I am not aware of?

The text was updated successfully, but these errors were encountered:

damemi · 2024-03-06T16:20:09Z

The issues with using retry_on_failure in the Google Cloud exporter would be the same with using it in the GMP exporter. For handling network outages, we would probably want to enable the write-ahead-log option that the GCP metrics exporter has (but this requires local storage for the WAL file).

skrawn · 2024-03-06T16:48:58Z

The issues with using retry_on_failure in the Google Cloud exporter would be the same with using it in the GMP exporter. For handling network outages, we would probably want to enable the write-ahead-log option that the GCP metrics exporter has (but this requires local storage for the WAL file).

Oh I see, I didn't notice the GC exporter had this. I may just be able to use that...

dashpole · 2024-03-06T16:52:32Z

The googlemanagedprometheus and googlecloud exporters have their own intelligent retry mechanisms built-in. retry_on_failure would add a second layer of retries, and will also retry requests which are guaranteed to fail (it isn't as smart as the built-in retry). This can cause additional problems, which is why we've removed the retry_on_failure helper from the exporter.

skrawn · 2024-03-06T18:47:04Z

I see, I appreciate the context. I'll this close issue and work with the WAL options.

skrawn closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline Queuing of Logs and Metrics #812

Offline Queuing of Logs and Metrics #812

skrawn commented Mar 6, 2024 •

edited

Loading

damemi commented Mar 6, 2024

skrawn commented Mar 6, 2024

dashpole commented Mar 6, 2024

skrawn commented Mar 6, 2024

Offline Queuing of Logs and Metrics #812

Offline Queuing of Logs and Metrics #812

Comments

skrawn commented Mar 6, 2024 • edited Loading

damemi commented Mar 6, 2024

skrawn commented Mar 6, 2024

dashpole commented Mar 6, 2024

skrawn commented Mar 6, 2024

skrawn commented Mar 6, 2024 •

edited

Loading