Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local.file_match ignore older files #2214

Closed
rveachkc opened this issue Dec 3, 2024 · 2 comments · Fixed by #2245
Closed

local.file_match ignore older files #2214

rveachkc opened this issue Dec 3, 2024 · 2 comments · Fixed by #2245
Labels
enhancement New feature or request frozen-due-to-age proposal A proposal for new functionality.

Comments

@rveachkc
Copy link

rveachkc commented Dec 3, 2024

Request

Could the alloy local.file_match component be updated with an option to ignore files that have been modified in a certain period of time?

Something like:

local.file_match "tmp" {
  path_targets = [{"__path__" = "/tmp/logs/**/*.log"}]
  ignore_older = 1d
}

Use case

Alloy has run out of memory attempting to scrape log files from a batch job that runs frequently and drops a unique log file per execution.
Alloy has also maintained many more open files than is necessary to run, increasing system resource usage.

While the argument could be made that better log rotation practices should be followed, I feel as if this would be a nice addition to match the feature provided by the filebeat filestream ignore_older feature

@rveachkc rveachkc added the enhancement New feature or request label Dec 3, 2024
@ravishankar15
Copy link
Contributor

Hi Team, I can work on this issue. One of the point to note is if we have this option based on the PollFrequency the export can be inconsistent.

like,

  1. sync_period is 10min and File is updated every 11min -> This file will never be included
  2. sync_period is 10min File was initially updated every 11min and later started getting updated every 5min -> Initially the file will be excluded later it will get included.
  3. sync_period is 10min File was initially updated every 5min and later started getting updated every 10min -> This file will first be included and later it will not be included.

Maintainers note,
I am thinking of passing a the time param to getPaths where we can ignore the files/targets based on the ModTime() on fs.FileInfo. Thoughts ?

@rveachkc
Copy link
Author

rveachkc commented Dec 5, 2024

From my perspective as an end-user of alloy, I believe this behavior is reasonable and should be expected.

The update frequency of a file is just something that needs to be considered when setting this parameter. Perhaps any documentation around it should recommend setting the duration longer than the update frequency of the file.

@wildum wildum added the proposal A proposal for new functionality. label Dec 9, 2024
@github-project-automation github-project-automation bot moved this to Incoming in Alloy proposals Dec 9, 2024
@wildum wildum moved this from Incoming to Active in Alloy proposals Dec 9, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request frozen-due-to-age proposal A proposal for new functionality.
Projects
Status: Active
Development

Successfully merging a pull request may close this issue.

3 participants