-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix leaks with metadata processors #16349
Fix leaks with metadata processors #16349
Conversation
315a8d6
to
faaa97f
Compare
b2ca802
to
a4860e2
Compare
@andrewkroh @exekias I would like to have your opinions on this. Summarizing, I think this issue would only affect The issue can be reproduced by using one of these processors in an autodiscover template, or in a script processor. But actually I cannot think on a use case for this. Metadata is added in any case in autodiscover, and even if needed these processors could be placed in the main static configuration. Regarding scripts, I don't think it is useful at all to call these processors from scripts (but we are exposing them at the moment, and they can be used as in the test case added here). I started this PR to add closers to the processors that need them, and call these closers from all the places where they are instantiated. The problem I see with this approach:
I am thinking on continuing adding the closers support in common cases, as it may be also useful for other processors in the future, but adding a check in the script processor so processors that define a closer cannot be registered (this would imply to stop registering these processors for the script processor, what would be a breaking change, but I guess nobody is using them). An alternative would be to do nothing, as this is something that can be solved with a proper configuration. Though it'd be nice to don't allow the problem to happen. Opinions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its a good idea build in support for Close()
ing processors. We have processors holding resources/state that need to be released so we need to release them. The script processor should disallow constructing stateful processors (more on this below).
For the script processor, I would consider it an error to instantiate any processor during the process()
invocation. Any processors that are needed should be constructed once at initialization. So I wouldn't try to handle releasing resources that are allocated for every invocation of the script processor in the process()
function. But we could release the resources associated with the processors constructed at initialization. We control their constructors so we can track what needs to be Close()
ed in a list.
BUT I made an assumption that all processors were stateless (hey, that's how their interface is designed). So the script processor uses a sync.Pool
containing javascript VMs to enable concurrent usage of the processor (a VM instance supports only one invocation at a time). Because it uses a sync.Pool, instances can be garbage collected any time they are not being used. This means we don't have a way to invoke Close()
on the processors when an instance in GC'ed. You'd have to use something other than sync.Pool to make this all work. So it's easier to just disallow stateful processors from scripts.
Oh, I don't know why I was thinking that the whole script was executed every time the processor is run. Then I think it would be ok to implement a closer for the script processor that closes the processors that need it.
Do you mean that the instance can be garbage collected even when the processor is still in use? Would that mean that if the processor is run again, it will run the initialization again? |
Oh, ok, I understand it now looking at the code. Then I think that I will remove by now stateful processors from the registry and we can try in the future to look for an alternative to Thanks! |
I was still thinking about this issue for some reason 😆 and have another consideration that I haven't vetted. This comes into play if you plan on closing the processors associated with publisher clients. It's about this sharing of global processors in publisher pipelines. Anecdotally, I think global processors get shared by each publisher client's processor pipeline. If that is the case then |
Thanks! 😂
Yes, I am afraid that you are right and in some places processors are reused, but I am not sure if only in static configurations, I don't see it happening in autodiscover for example. I was also taking a look to the code and there are several places where we create processors, I think that supporting stateful processors properly is going to be a bigger refactor than I originally expected. As you mentioned, their interface is designed to be stateless (only Another option for refactor would be to re-think the processors with state, maybe they should be a different kind of enrichers. But this would open other different rabbit holes for sure. I have been talking with @exekias too about this, and we think that we might by now try to address this issue by avoiding the cases where this can be more problematic, and leave further refactors for the future. The solutions to implement by now would be:
|
Pinging @elastic/integrations-platforms (Team:Platforms) |
80fbda4
to
b55a656
Compare
|
Nice work here! LGTM |
Thanks for the feedback and the reviews! |
Add a closer interface for processors so their resources can be released when the processor is not needed anymore. Explicitly close publisher pipelines so their processors are closed. Add closers for add_docker_metadata, add_kubernetes_metadata and add_process_metadata. Script processor will fail if a processor that needs to be closed is used. (cherry picked from commit a3fe796)
Add a closer interface for processors so their resources can be released when the processor is not needed anymore. Explicitly close publisher pipelines so their processors are closed. Add closers for add_docker_metadata, add_kubernetes_metadata and add_process_metadata. Script processor will fail if a processor that needs to be closed is used. (cherry picked from commit a3fe796)
* upstream/master: (127 commits) Update obs app links (elastic#21682) fix: update fleet test suite name (elastic#21738) Remove dot from file.extension value in Auditbeat FIM (elastic#21644) Fix leaks with metadata processors (elastic#16349) Add istiod metricset (elastic#21519) [Ingest Manager] Change Sync/Close call order (elastic#21735) [Ingest Manager] Syncing unpacked files (elastic#21706) Fix concurrent map read and write in socket dataset (elastic#21690) Fix conditional coding to remove seccomp info from Winlogbeat (elastic#21652) [Elastic Agent] Fix issue where inputs without processors defined would panic (elastic#21628) Add configuration of filestream input (elastic#21565) libbeat/logp: introduce Logger.WithOptions (elastic#21671) Make o365audit input cancellable (elastic#21647) fix: remove extra curly brace in script (elastic#21692) [Winlogbeat] Remove brittle configuration validation from wineventlog (elastic#21593) Fix function that parses from/to/contact headers (elastic#21672) [CI] Support Windows-2016 in pipeline 2.0 (elastic#21337) Skip publisher flaky tests (elastic#21657) backport: add 7.10 branch (elastic#21635) [CI: Packaging] fix: push ubi8 images too (elastic#21621) ...
Add a closer interface for processors so their resources can be released when the processor is not needed anymore. Explicitly close publisher pipelines so their processors are closed. Add closers for add_docker_metadata, add_kubernetes_metadata and add_process_metadata. Script processor will fail if a processor that needs to be closed is used. (cherry picked from commit a3fe796)
Add a closer interface for processors so their resources can be released when the processor is not needed anymore. Explicitly close publisher pipelines so their processors are closed. Add closers for add_docker_metadata, add_kubernetes_metadata and add_process_metadata. Script processor will fail if a processor that needs to be closed is used. (cherry picked from commit a3fe796)
…dependencies-goals * upstream/master: (46 commits) Use badger code from upstream repository (elastic#21705) Disable writes sync in persistent cache (elastic#21754) Make API address and Shard ID required in Cloud Foundry settings (elastic#21759) [CI] Support skip-ci label (elastic#21377) Increase recommended memory when deploying in Cloud foundry (elastic#21755) typofix for dns timeout configuration option (elastic#21069) chore: create CI artifacts for DEV usage (elastic#21645) [Ingest Manager] Atomic installed forgotten changelog elastic#21780 [Ingest Manager] Agent atomic installer (elastic#21745) Add missing configuration annotations (elastic#21736) [Filebeat] Add check for context.DeadlineExceeded error (elastic#21732) Remove kafka partition ISR from Metricbeat docs (elastic#21709) Skip flaky test with oauth2 config in httpjson input (elastic#21749) Fix for azure retrieve resource by ids (elastic#21711) Update obs app links (elastic#21682) fix: update fleet test suite name (elastic#21738) Remove dot from file.extension value in Auditbeat FIM (elastic#21644) Fix leaks with metadata processors (elastic#16349) Add istiod metricset (elastic#21519) [Ingest Manager] Change Sync/Close call order (elastic#21735) ...
…ne-2.0-arm * upstream/master: (93 commits) Fix non-windows fields on system/filesystem (elastic#21758) disable TestReceiveEventsAndMetadata/TestSocketCleanup/TestReceiveNewEventsConcurrently in Windows (elastic#21750) Use badger code from upstream repository (elastic#21705) Disable writes sync in persistent cache (elastic#21754) Make API address and Shard ID required in Cloud Foundry settings (elastic#21759) [CI] Support skip-ci label (elastic#21377) Increase recommended memory when deploying in Cloud foundry (elastic#21755) typofix for dns timeout configuration option (elastic#21069) chore: create CI artifacts for DEV usage (elastic#21645) [Ingest Manager] Atomic installed forgotten changelog elastic#21780 [Ingest Manager] Agent atomic installer (elastic#21745) Add missing configuration annotations (elastic#21736) [Filebeat] Add check for context.DeadlineExceeded error (elastic#21732) Remove kafka partition ISR from Metricbeat docs (elastic#21709) Skip flaky test with oauth2 config in httpjson input (elastic#21749) Fix for azure retrieve resource by ids (elastic#21711) Update obs app links (elastic#21682) fix: update fleet test suite name (elastic#21738) Remove dot from file.extension value in Auditbeat FIM (elastic#21644) Fix leaks with metadata processors (elastic#16349) ...
What does this PR do?
Add a closer interface for processors so their resources can be released when the processor is not needed anymore.
Explicitly close publisher pipelines so their processors are closed.
Add closers for
add_docker_metadata
,add_kubernetes_metadata
andadd_process_metadata
.Why is it important?
Processors can be defined in dynamic configurations, as in autodiscover templates. Some processors open connections, files, or start goroutines that need to be closed or stopped.
Processors can be dynamically created at least when using autodiscover and when used in the
script
processor. When this issue appears it can be usually solved by configuration (as described in #14389). So this case is probably not so serious.Processors can include an on-disk cache since #20775, files open by this cache should be properly closed on shutdown.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesAuthor's Checklist
Check that resources are released in:
Processors at the module level are open multiple times?(One per metricset, to be reviewed as a different issue, only a real problem when a processor cannnot be instantiated twice with the same config, what only happens withadd_cloudfoundry_metadata
)Somehow call close on all the instantiated processors once the script is finished.Add tests:
Related issues
Use cases
add_cloudfoundry_metadata
).