Add support for timeout-based Retry_Limit

**Is your feature request related to a problem? Please describe.**
Discussions link: https://github.com/fluent/fluent-bit/discussions/10360 

Refer to https://docs.fluentbit.io/manual/administration/scheduling-and-retries#configure-retries , there are 3 types of values for scheduler Retry_Limit: N, no_limits and no_retries.

For our case, the output is Loki and we set Retry_Limit with no_limites in order to not lost any logs when Loki is down.
In Loki site, we allow to accept these old logs within 2 days by setting proper value of ingester.max_chunk_age.
Now our scenario is that in a rainy case, we manually make Loki to offline and let fluent-bit to buffer/cache local chunk/logs more than 2 days, then online the Loki.
With the scheduler of retry, local chunks will be flushed to Loki one by one with each dedicated task_id.
However, for these timestamps that older than 48h, Loki will reject to accept them by reporting "write operation failed, older acceptable timestamp is xxxx" error. For fluent-bit, the flush action is not successful and will loop in the endless retries.

**Describe the solution you'd like**

Above is the actual issue for us, when Loki could be back within 48h, all local buffered chunks could be flushed to Loki with no_limits successfully, but when it is larger than 48h, some logs will be sent to Loki endless and Loki will always reject them.

Therefore, I would like to propose another values of Retry_Limit which is a timeout-based value.

For example, when Retry_Limit is set to 24h, then for one specific retry action will be same as no_limits from start, but when the retry action(maybe one task_id) reached 24h, it will stop the retry.

**Describe alternatives you've considered**


**Additional context**


Appreciate for any comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for timeout-based Retry_Limit #10369

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for timeout-based Retry_Limit #10369

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions