-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registry state persistence and cleanup #1600
Comments
@guyboertje Please have a look |
considering ignore_older virtually removes file from prospector (scanner), isn't |
Discussed briefly with @ruflin . force_close_files closes the file handle as soon as the file name changes (or the file is rotated). It can be helpful to have something like "force_close_files (for rotated files) after N hours" to give filebeat some time to catch up on processing bursts in traffic after a file name change. |
Based on some internal discussion and community feedback it becomes clear that already force_close_files is not fully clear on what it does and how it behaves. Contributing to this is that our docs in config.full.yml are outdated and in the documentation itself are wrong. For a better understanding I try first to elaborate on how filebeat works, which should make it easier to discuss the different options. On startup, filebeat creates one prospector for each prospector configuration. These prospectors exist for the full runtime of filebeat and their task is to make sure harvesters are started for each file in force_close_filesThe force_close_files option is completely implemented on the harvester side. Normally a harvester just keeps reading the file until the end of the file is reached. Then the harvester keeps backing off until
force_close_files could potentially be split up in two parts:
Be aware, that enabling force_close_file also can have a performance hit as the file stats are read and compared for every line. |
Taking a second look at all the configuration options, they seem to apply on 3 layers with the following purposes:
The options below to not necessarly represent the current behaviour but what I would suggest it to be. HarvesterThat means for the harvester we have the follwing config options.
By default, backoff applies if EOF is reached, means the file handler is kept open but the harvester sleeps for backoff time. If a file is closed, that means the harvester is stopped and a new harvester for the same file will be started again by the prospector after Currently Closing files very often can have a performance impact. One reason is that additional file meta data has to be checked all the time and it diverges from "near real time" as if a file is closed, it will only be picked up again after Prospector
As long as a harvester is open for a file, State / RegistrarThe registry persists the state of the files. The registry is only used, when filebeat is (re)started.
The above options are especially important in 2 cases:
Data LossFilebeat has the principle to send a lot line at least one. Some of the configuration changes especially in combination can lead to data loss. Here some more details on when data could be lost. Harvester options: In general, closing a file handler normally means not data loss, as the file is picked up and scanned again after
Combinations: There are several combinations which can lead to data loss. Some of these combinations can be intended, but it is important to understand the consequences:
Recommend default settings:
Questions
Notes
|
I will now start implementing the options mentioned above. This commend it to track the implementation. Renaming of options is still possible at a later stage: Config options:
Add Docs:
|
During the implementation of the |
After discussion
This makes an additional configuration option obsolete. It has one side affect, that as long as no new events are harvested by filebeat, the registry will not be cleaned up as the registry is never written. |
force_close_files is replaced by the two option close_removed and close_renamed. Force_close_files is deprecated. In case it is enabled, it sets close_removed and close_renamed to true. This is part of elastic#1600
Each state has a timestamp and a ttl. -1 means ttl is disable, 0 means it should be directly removed. This moves the logic on what should happen with a state completely to the state itself and makes it possible to use states in different use cases. The advantage of the ttl is that it does not depend in filebeat on the modification time which can be incorrect. This means, in case a file is rotated, the timestamp of a file is updated and it also counts as a new state. This makes sense as otherwise it could happen that the state of a rotate file is removed and then after rotation the file is picked up again as a completely new file. The downside is that people using filebeat must understand the difference between the state timestamp and modtime. In general timestamp is neweer then the modtime, as filebeat finishes reading later. On the registrar side, the cleanup happens every time before the registry is written. On the prospector side the state is cleaned up after each scan. It can happen that the prospector state list and registrar state list are not 100% in sync as they don't cleanup the states at the same time. The prospector state is the one that will always overwrite the registrar state. No cleanup is done before shutdown. It is important, that on startup first a full scan is done to update the states before the state is cleaned up, otherwise still needed states could be removed. This is part of elastic#1600 Additional: * Fixed offset change for prospector to start harvester for old files. Note: * Nice part of this is that registrar does not have to now about expiry and removal of files, state is communication channel.
Starting to implement |
The clean_frequency makes sense to me. I want to suggest a small tweak to the configuration options and see if the logic makes sense as a result. Instead of having two different settings with relative times in them (close_older, drop_older), what if you had just one time-based setting and then a boolean option which can determine whether the harvester comes to a hard stop even if the file isn't finished...? I also want to offer a suggestion to rename these as promised. After rereading this all a few times I realized that we should probably structure the documentation with these three big concepts as headers and pieces of a large diagram (prospector, harvester, registrar). And further to that end, the options which apply to those logical pieces should be named as such. So, here's a set of names which should be easily mapped to yours. Note that I have taken the liberty of incorporating the above suggestion in this list of names. Yes, these names are long but I believe that they assist in self documenting the behavior which users are unlikely to be reading about on a routine basis. They will go to the docs, understand it once and come back later - hopefully these names in their config files will be clear to them without reading the docs every time. harvester_close_on_renamed [edit - I dropped 'force' from the harvester_close_on_timeout based on your last comment, which I missed] |
WDYT about putting the configs under a namespace?
About your boolean option for
|
Hmm, after seeing all these options, I wonder if it's a good idea to expose them all to the user. I mean, the number of possible combinations is huge, and it sounds like many possible combinations can result in subtle data loss. I'm not quite sure I get the motivation to make all of these configurable, is it for completeness or for handling corner cases? |
It is for handling corner cases and to not have config options with multiple meanings. I would keep the same defaults as we have now so no data loss by default. For the harvester, the corner cases are that people want the file handler closed to be closed faster but for different reasons. For the registrar it is solving the problem of registrar growing too big and inode reuse. |
Ok, just worried that we might be going the other extreme now :-). Perhaps we can go through the options in today's meeting, I'm not sure I get the difference between |
Based on the inputs from @brandonmensing, a suggestion for the config file: Before:
After
I removed the options that would stay the same. For harvester it could also be:
|
@brandonmensing I left out the I left out |
I just realised, the above config examples have a mistake inside. The state @brandonmensing I was wondering if people understand |
One more: More accurate for |
As a note: The general discussion came up to change |
Quick summary of the conversations yesterday: Close configurations:
Close_older will be renamed to
Same argument for Instead of nesting the variable options under state and harvester, we will use comments the visualised the nesting. This reduces the length of the variable names, not additional identation and keeps a new name "harvester" out of the config file.
|
Closing as all changes were implemented. Small follow up changes are tracked in #2012 |
I would like to have a word about the change of the terminology "prospectors" and "harvesters" to "inputs" and "readers". It's been some time I'm under the impression there exists a general trend in software that strives to impoverish what vocabulary lusers are exposed to the most "Simple English" possible. The words "prospectors" and "harvesters" seemed to me to be perfectly describing, in an imaginative way, the role of each actor in the system, and as a nonnative English speaker, I don't mind having to fetch a word definition if needed -- it takes 10 seconds from within a web browser. Moreover, reducing the overall vocabulary in use in a system has a nasty side effect in the long term: you end up with a lot of confusion around every word because they've become ambiguous, and nobody can guess what it's about without a lot of context. Here, the word "input" is the most problematic; it has a lousy definition and could very well mean to me "one particular file", and it does not convey to me any activity about discovering any new data: input is passive thing. The word "reader" is okayish albeit generic and blunt compared to harverster. -- I have a mouse on my desktop, and I have a mouse in the attic. |
Filebeat currently lacks an option to cleanup the registry file. The goal of this issue is two answer three questions:
I suggest to introduce two new config options
clean_older
andclean_removed
with the following behaviour and defaults. This is option 4 below.Better name suggestions for the new variables are welcome.
Option Details
Below is the current behaviour and the different options described in more detail.
Current Behaviour
The current behaviour is as following:
ignore_older
are set to the end of the file and persisted.This brings the problem that the registry is never cleaned up.
Option 1 - ignore_older double meaning
One option is to use ignore_older to also cleanup the state. This would have the following consequences:
Option 2 - clean_older >= ignore_older
The introduction of a
clean_older
variable would allow to set a time when the registry should be cleaned up. In case ignore_older has the same behaviour as now, it requires that clean_older is >= ignore_older as otherwise the two would get into a race condition. As ignore_older is set by default to infinity, it would also mean that clean_older is disabled by default. If clean_older is enabled with a time, it would mean also ignore_older has to be set.Option 3 - ignore_older, clean_removed
Introducing a variable
clean_removed
would allow to enable an option, that only files which disappear from disk are cleaned up.clean_removed
could either be a bool and files that disappear would be directly removed or it could also be time a duration after whicht the files are removed. This means all files which still exist will be kept in the registry.In case clean_removed is duration, it would require
clean_removed >= close_older
orforce_close_files
to be enable to make sure the files are closed.Option 4 - clean_older >= ignore_older, clean_older >= clean_removed
Option 4 would be combining all these options. This is my preferred option as it gives the full flexibility and keeps the behaviour of the existing variables.
The text was updated successfully, but these errors were encountered: