Skip to content

buffer: add feature to evacuate chunk files when retry limit #4986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

daipom
Copy link
Contributor

@daipom daipom commented May 30, 2025

Which issue(s) this PR fixes:
None.

What this PR does / why we need it:
Add feature to evacuate chunk files when retry limit.
When reached the retry limit, buf_file and buf_file_single evacuates all the chunk files (and the meta files) in the queue to the following dir before purging.

  • (root_dir)/buffer/(plugin-id)/

root_dir is system_config.root_dir if it is configured.
Otherwise, DEFAULT_BACKUP_DIR is applied.
(/tmp/fluent. We can change this by env var FLUENT_BACKUP_DIR)

There is no separate directory for each worker because the IDs of each chunk are entirely unique.
This makes recovery easier.

After the problem with the flush (such as a network issue) is resolved, we can put back the files and restart Fluentd to flush them again.

Difference from the backup feature:

The backup feature is for unrecoverable errors, mainly for bad chunks.
On the other hand, this feature is for normal chunks.
The main motivation for this feature is to enable recovery by evacuating buffer files
when the retry limit is reached due to external factors such as network issues.

Difference from the secondary feature:

The secondary feature is not suitable for recovery.
It can be difficult to recover files made by out_secondary_file because the metadata is lost.
For file buffers, the easiest way for recovery is to evacuate the chunk files as is.
Once the issue is recovered, we can put back the chunk files, and restart Fluentd to load them.
This feature enables it.

Docs Changes:
TODO

Release Note:
TODO

TODO:
Organize tests.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant