-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-s3
input's bucket polling accumulates state in the registry
#39116
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
@faec did this get solved as part of the larger redesign of the input, or is this still outstanding? |
I will look at this from a technical point of view (current implementation, potential improvements similar to what has been suggested, quick wins and follow-ups). Will update the issue with findings. Update We are discussing the possibility of,
[1] - https://github.com/aws/aws-sdk-go/blob/v1.55.5/service/s3/api.go#L45823 |
PR #41817 was merged on |
When scanning an S3 bucket, metadata from each object is saved to the registry (including whether it has been successfully downloaded). Each object's metadata consumes approximately 1KB of space in the registry.
The intention in the code was for this metadata to be deleted after a bucket scan, but this deletion was implemented incorrectly (see also #39065), so most S3 object metadata is persisted forever and never cleaned up. This accumulates even after objects have been removed from the original bucket, or the target bucket has been changed, so that the input adds ~1GB to the registry for every million objects it has ever seen across all time and all buckets. These objects are also stored in memory during Filebeat execution and can significantly increase memory requirements on large buckets.
The text was updated successfully, but these errors were encountered: