Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline Queuing of Logs and Metrics #812

Closed
skrawn opened this issue Mar 6, 2024 · 4 comments
Closed

Offline Queuing of Logs and Metrics #812

skrawn opened this issue Mar 6, 2024 · 4 comments

Comments

@skrawn
Copy link

skrawn commented Mar 6, 2024

I was trying to deploy the OpenTelemetry Collector on some embedded Linux devices and since these devices are on cellular connections, they are not always able to reach the internet. But I'd still like to be able to queue metrics and logs for upload once the device re-establishes connection. Am I correct that this will work with the sending_queue, retry_on_failure and file_storage components, like this:

extensions:
  file_storage:
    directory: /etc/otelcol/offline
    compaction:
      on_start: true
      directory: /etc/otelcol/offline
      max_transaction_size: 65_536
    fsync: true

exporters:
  googlemanagedprometheus:
    metric:
      compression: gzip
    retry_on_failure:
      enabled: true
      max_elapsed_time: 86400s
    sending_queue:
      enabled: true
      storage: file_storage
      num_consumers: 1
      queue_size: 1000

The reason I ask is that retry_on_failure was not implemented on the Google Managed Prometheus exporter and so requests that timeout due to network failures result in metrics getting discarded. If the retry_on_failure component works as expected, I'll probably try to implement it for the GMP exporter. I also see that there are some problems with retry_on_failure depending on the exporter, like this one for the Google Cloud exporter, so maybe there is some limitation within the collector that I am not aware of?

@damemi
Copy link
Contributor

damemi commented Mar 6, 2024

The issues with using retry_on_failure in the Google Cloud exporter would be the same with using it in the GMP exporter. For handling network outages, we would probably want to enable the write-ahead-log option that the GCP metrics exporter has (but this requires local storage for the WAL file).

@skrawn
Copy link
Author

skrawn commented Mar 6, 2024

The issues with using retry_on_failure in the Google Cloud exporter would be the same with using it in the GMP exporter. For handling network outages, we would probably want to enable the write-ahead-log option that the GCP metrics exporter has (but this requires local storage for the WAL file).

Oh I see, I didn't notice the GC exporter had this. I may just be able to use that...

@dashpole
Copy link
Contributor

dashpole commented Mar 6, 2024

The googlemanagedprometheus and googlecloud exporters have their own intelligent retry mechanisms built-in. retry_on_failure would add a second layer of retries, and will also retry requests which are guaranteed to fail (it isn't as smart as the built-in retry). This can cause additional problems, which is why we've removed the retry_on_failure helper from the exporter.

@skrawn
Copy link
Author

skrawn commented Mar 6, 2024

I see, I appreciate the context. I'll this close issue and work with the WAL options.

@skrawn skrawn closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants