-
Notifications
You must be signed in to change notification settings - Fork 100
logstash stuck in read mode on big number of files #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In your case (an extreme one) sorting is the problem. I spiked a test of a discovered files "feeder" system that, when more than 4095 files are discovered, feeds a smaller number of files into the files-to-process collection. I would keep a keen eye on the sorting performance. This is on the unofficial road map for this plugin |
@guyboertje I am using logstash-input-file 4.1.7. We have many files(60000 files) and I use tail mode, I found that our logs have delay issue. It seems that logstash-input-file did not do discover and stat operations on the input files. I suspected that my delay issue also cause by the "sorting" operation. |
You suspect correctly. |
Version: 6.8.0 Any updates on this ? Having the same issues. If fold has too many files logstash will hang. The only way to get this working is with the work around listed above and comment out @sort_method.call |
Any update on this issue? What is the impact of commenting out the @sort_method.call ? |
sorting watched files performance has been addressed in 4.2.0 #219 |
I'm trying to do a one time processing of a big number of files organized as such : folder/subfolder1/subfolder2/files
There are 298 427 files, 138 "subfolder1" levels, each "subfolder1" level containing between 10 and 20 "subfolder2" levels and each "subfolder2" level contains between 20 and 240 files.
I use the file input plugin with the new read mode,there is a filter and the output is stdout for test (for production the target is elasticSearch).
Here is my config :
input {
file {
sincedb_path => "/dev/null"
path => "/home/me/WORK/logstash/20180917///*.json"
close_older => 5
codec => "json"
mode => "read"
file_completed_action => "log"
file_completed_log_path => "/dev/null"
}
}
I tested my configuration on a fewer number of files and everything goes as intended.
When I launch logstash on the big folder, I see the initialization, and then nothing happens.
The following sequence keeps repeating in the logs :
And nothing happens.
I waited an hour with no luck.
So I decided to investigate a bit.
I did
kill -3 logstashPID
to get a thread dump.In this thread dump, with the exception of threads busy with java.nio, the only RUNNABLE thread is busy in ruby code, all others are WAITING.
here are two samples below, from two different thread dumps taken at 5 minutes interval (lot of output, explanations go on below) :
Other one :
the code seems stuck in watched_files_collection.rb.
Looking at the ruby code in
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.6/lib/filewatch/watched_files_collection.rb
it seems that thesort
method is called each time a file is discovered and added to the list of files to process.So, in my case, actually the growing array of files to process is sorted more than 290 000 times.
The read mode is quite new for this plugin, so maybe the sorting that was appropriate in tail mode could now lead to bad performance (in this case no performance at all since the processing doesn't even start).
As a test (a very very very dirty test) I commented out the line
@sort_method.call
inin file
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.6/lib/filewatch/watched_files_collection.rb
And launched logstash on my big folder. After a few seconds I saw processed files being output to stdout...
Which makes me think that sorting is one cause of my problem.
So maybe the
file_sort_by
parameter of the plugin should provide a third possibility which is, no sorting at all ?The text was updated successfully, but these errors were encountered: