Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Td-agent randomly stops sending logs to Elasticsearch #2981

Closed
himanshigpta opened this issue May 7, 2020 · 8 comments
Closed

Td-agent randomly stops sending logs to Elasticsearch #2981

himanshigpta opened this issue May 7, 2020 · 8 comments
Labels

Comments

@himanshigpta
Copy link

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
I'm using 'td-agent 1.6.3' on a 'Red Hat Enterprise Linux Server release 7.5 (Maipo)' server to send logs to Elasticsearch 7.5.1.
The issue that I'm facing is that td-agent stops sending logs after some time, also the position file's last modified time is that of the last sent logs(i.e when it stopped sending). On restarting td-agent, sometimes it would straightaway start sending logs from where it left, on most occasions it wouldn't. I've enabled debug logs, but couldn't find any relevant explanation for this behavior.
I tried adding the parameter suggested here: #1630 but was of no help, also referred to this one : #2969 , cluster has enough capacity. I'm new to fluentd, might be missing something here..

Your Environment

td-agent --version

td-agent 1.6.3

cat /etc/redhat-release

Red Hat Enterprise Linux Server release 7.5 (Maipo)

uname -r

3.10.0-229.49.1.el7.x86_64

Your Configuration

<source>
  @type tail
  path /home/abinitio/hdfs_mux.txt
  pos_file /etc/td-agent/hdfs_mux.pos
  #time_format %Y-%m-%dT%H:%M:%S.%NZ
  time_format %b %dT%H:%M:%SZ
  tag hdfs.mux
#  enable_stat_watcher false
  #read_from_head true
  <parse>
    @type grok
#    #multiline_start_regexp /[\w]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2}/
    <grok>
     pattern %{HOSTNAME:mux_host}%{SPACE}%{USERNAME:mux_username}%{SPACE}%{NUMBER:mux_pid}%{SPACE}%{DATA:stime}%{SPACE}%{UNIXPATH:java}%{SPACE}%{UNIXPATH:abinitio_coop}
    </grok>
    <grok>
     pattern %{GREEDYDATA:mux_msg}
    </grok>
  </parse>
</source>

<filter hdfs.mux>
    @type record_modifier
     <record>
     hostname "#{Socket.gethostname}"
     formatted_time ${Time.at(time).iso8601(3)}
     </record>
     char_encoding utf-8
     char_encoding utf-8:euc-jp
</filter>

<match hdfs.mux>
  @type elasticsearch
  log_es_400_reason true
  user YYY
  password XXXXXX
  type_name "_doc"
  ssl_version TLSv1_2
  ca_file "/etc/td-agent/cert.crt"
  hosts XXXXX
  scheme "https"
  logstash_format true
  logstash_dateformat %V
  logstash_prefix count_mux
  include_timestamp true
  reconnect_on_error true
  reload_on_failure true
  <buffer>
    @type file
    path /etc/td-agent/hdfs-mux/buffers
    chunk_limit_size 1M
    flush_interval 5s
    retry_forever false
    retry_max_times 3
    retry_wait 10
    retry_max_interval 300
    flush_thread_count 8
  </buffer>
</match>

Your Error Log

2020-05-07 12:39:58 +0530 [info]: fluent/log.rb:322:info: adding source type="tail"
2020-05-07 12:39:58 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
2020-05-07 12:39:58 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_msg>.*)
2020-05-07 12:39:58 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
2020-05-07 12:39:58 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_msg>.*)
2020-05-07 12:39:58 +0530 [warn]: fluent/log.rb:342:warn: parameter 'time_format' in <source>
  @type tail
  path "/home/abinitio/hdfs_mux.txt"
  pos_file "/etc/td-agent/hdfs_mux.pos"
  time_format %b %dT%H:%M:%SZ
  tag "hdfs.mux"
  <parse>
    @type "grok"
    <grok>
      pattern (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
      types mux_host:string,mux_username:string,mux_pid:string,stime:string,java:string,abinitio_coop:string
    </grok>
    <grok>
      pattern (?<mux_msg>.*)
      types mux_msg:string
    </grok>
  </parse>
</source> is not used.
2020-05-07 12:39:58 +0530 [info]: #0 fluent/log.rb:322:info: starting fluentd worker pid=23313 ppid=23308 worker=0
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: buffer started instance=69941383561980 stage_size=0 queue_size=0
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: enqueue_thread actually running
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:58 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing =
2020-05-07 12:39:58 +0530 [info]: #0 fluent/log.rb:322:info: following tail of /home/abinitio/hdfs_mux.txt
2020-05-07 12:39:59 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:59 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 12:39:59 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running

It stops logging after this.
Before this it stopped with these logs:

2020-05-07 02:56:32 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 02:57:32 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 02:58:32 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 02:59:32 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:09:03 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a50197c3dd91304c22fbea61238c7ef" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:09:03 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:09:03 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:10:03 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:27:49 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501dae6660fa81f676c8c1b625b2fd" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:27:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:27:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:28:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:29:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:30:05 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e30622b9d1d8e452b25056206ac" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:30:13 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e3791136a1457ee77e4b446cb73" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:30:20 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e3e69b70f4a7c244121d17ad6eb" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:30:27 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e454dc75597bf78bdf442512aff" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:30:40 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e514ee52d9aef001ac1c98c1213" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:30:47 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e57fca205ce46413a12ddbab7d2" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:30:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:31:06 +0530 [debug]: #0 fluent/log.rb:302:debug: Created new chunk chunk_id="5a501e69ea4d165ab6523b9d3254aecb" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="hdfs.mux", variables=nil> 2020-05-07 03:31:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:32:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:33:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:34:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:35:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:36:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:37:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:38:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 03:39:49 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing = /home/abinitio/hdfs_mux.txt 2020-05-07 04:36:45 +0530 [debug]: fluent/log.rb:302:debug: fluentd supervisor process get SIGUSR1 2020-05-07 04:36:45 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGUSR1 2020-05-07 04:36:45 +0530 [info]: #0 fluent/log.rb:322:info: force flushing buffered events

Additional context

The logs to be sent are of the following format :
npl-ph-hd.india.itm irun 46908 09:00 /usr/java/jdk1.8.0_144/bin/java /tmp/hdfs_mux-3.3.4.6-airun/mux_46865.properties

@Xyrion
Copy link

Xyrion commented May 7, 2020

I see you referenced my issue above. I suggest adding "with_transporter_log true" in your es output match to see the transport logs.

@himanshigpta
Copy link
Author

himanshigpta commented May 7, 2020

@Xyrion I tried the parameter you suggested, this is what I got when I tried restarting td-agent, this occurs atleast once whenever I restart:

2020-05-07 15:06:04 +0530 [info]: fluent/log.rb:322:info: Received graceful stop 2020-05-07 15:06:05 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:06:05 +0530 [debug]: #0 fluent/log.rb:302:debug: getting start to shutdown main process 2020-05-07 15:06:20 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:06:45 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:07:20 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:08:05 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:09:00 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:10:05 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:11:20 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:12:45 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM 2020-05-07 15:13:06 +0530 [info]: fluent/log.rb:322:info: Received graceful stop 2020-05-07 15:14:21 +0530 [debug]: #0 fluent/log.rb:302:debug: fluentd main process get SIGTERM

So I stopped n started the agent :

`2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::TailInput title=:event_loop thread=#<T
hread:0x007f3900d36310@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_5 thread=#<Thread:0x007f3900d36db0@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:enqueue_thr
ead thread=#<Thread:0x007f3900d36518@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_3 thread=#<Thread:0x007f3900d37530@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_4 thread=#<Thread:0x007f3900d370a8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_7 thread=#<Thread:0x007f3900d36748@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_6 thread=#<Thread:0x007f3900d36a40@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_2 thread=#<Thread:0x007f3900d37828@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_1 thread=#<Thread:0x007f3900d37da0@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [warn]: #0 fluent/log.rb:342:warn: thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::ElasticsearchOutput title=:flush_threa
d_0 thread=#<Thread:0x007f3900d3c0f8@/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil
2020-05-07 15:16:05 +0530 [info]: fluent/log.rb:322:info: Worker 0 finished with signal SIGQUIT
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2020-05-07 15:16:16 +0530 [warn]: fluent/log.rb:342:warn: Consider to specify log_level with @log_level.
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: Expanded the pattern  into (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|
\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s
*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: Expanded the pattern  into (?<mux_msg>.*)
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: Expanded the pattern  into (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|
\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s
*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: Expanded the pattern  into (?<mux_msg>.*)
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: using configuration file: <ROOT>


2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: starting fluentd-1.6.3 pid=13442 ruby="2.4.1"
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: spawn command to main:  cmdline=["/opt/td-agent/embedded/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/td-agent", "-
-log", "/var/log/td-agent/td-agent.log", "--use-v1-config", "-v", "--group", "td-agent", "--daemon", "/var/run/td-agent/td-agent.pid", "--under-supervisor"]
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-elasticsearch' version '3.5.2'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-elasticsearch' version '1.9.5'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-elasticsearch' version '1.4.0'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-grok-parser' version '2.1.4'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-kafka' version '0.5.5'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-record-modifier' version '0.6.0'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-s3' version '1.0.0.rc3'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-td' version '1.0.0.rc1'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-td-monitoring' version '0.2.2'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluent-plugin-webhdfs' version '1.1.1'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluentd' version '1.6.3'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: gem 'fluentd' version '0.14.16'
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: adding filter pattern="hdfs.mux" type="record_modifier"
2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: adding match pattern="hdfs.mux" type="elasticsearch"
2020-05-07 15:16:16 +0530 [warn]: #0 fluent/log.rb:342:warn: Consider to specify log_level with @log_level.
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: GET https://elastic:**************@n2pl-pac-abxxxxxxxxxxm:9200/ [status:200, request:0.012s, query:n/a]
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: < {
  "name" : "node-1-mn",
  "cluster_name" : "airelastic",
  "cluster_uuid" : "-0XB8BgExxxxxxxxNw",
  "version" : {
    "number" : "7.5.1",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "3ae9acxxxxxxxxxcd88e1e18d96",
    "build_date" : "2019-12-16T22:57:37.835892Z",
    "build_snapshot" : false,
    "lucene_version" : "8.3.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

2020-05-07 15:16:16 +0530 [info]: fluent/log.rb:322:info: adding source type="tail"
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_msg>.*)
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: Expanded the pattern  into (?<mux_msg>.*)
2020-05-07 15:16:16 +0530 [warn]: fluent/log.rb:342:warn: parameter 'time_format' in <source>
  @type tail
  path "/home/abinitio/hdfs_mux.txt"
  pos_file "/etc/td-agent/hdfs_mux.pos"
  time_format %b %dT%H:%M:%SZ
  tag "hdfs.mux"
  <parse>
    @type "grok"
    <grok>
      pattern (?<mux_host>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))\s*(?<mux_username>[a-zA-Z0-9._-]+)\s*(?<mux_pid>(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))\s*(?<stime>.*?)\s*(?<java>(/([\w_%!$@:.,+~-]+|\\.)*)+)\s*(?<abinitio_coop>(/([\w_%!$@:.,+~-]+|\\.)*)+)
      types mux_host:string,mux_username:string,mux_pid:string,stime:string,java:string,abinitio_coop:string
    </grok>
    <grok>
      pattern (?<mux_msg>.*)
      types mux_msg:string
    </grok>
  </parse>
</source> is not used.
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: starting fluentd worker pid=13452 ppid=13447 worker=0
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: buffer started instance=70045320787540 stage_size=0 queue_size=0
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: enqueue_thread actually running
2020-05-07 15:16:16 +0530 [debug]: #0 fluent/log.rb:302:debug: tailing paths: target = /home/abinitio/hdfs_mux.txt | existing =
2020-05-07 15:16:16 +0530 [info]: #0 fluent/log.rb:322:info: following tail of /home/abinitio/hdfs_mux.txt
2020-05-07 15:16:17 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running
2020-05-07 15:16:17 +0530 [debug]: #0 fluent/log.rb:302:debug: flush_thread actually running

`

The position file's last update is of that time when I started the td-agent, but I dont see any logs being sent to es.

@himanshigpta
Copy link
Author

There's an error that I observed here including other servers where I'm using td-agent, but this never stopped any logs or anything for that matter, mentioning just in case its relevant:

`2020-05-07 15:16:05 +0530 [error]: #0 fluent/log.rb:362:error: unexpected error error_class=SignalException error="SIGQUIT"
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/parser_rege
xp.rb:50:in `match'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/parser_rege
xp.rb:50:in `parse'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-grok-parser-2.1.4/lib/fluent/
plugin/parser_grok.rb:48:in `block in parse'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-grok-parser-2.1.4/lib/fluent/
plugin/parser_grok.rb:47:in `each'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-grok-parser-2.1.4/lib/fluent/
plugin/parser_grok.rb:47:in `parse'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
420:in `convert_line_to_event'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
442:in `block in parse_singleline'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
441:in `each'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
441:in `parse_singleline'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
397:in `call'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
397:in `receive_lines'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:
523:in `wrap_receive_lines'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:757:in `block in handle_notify'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:801:in `with_io'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:737:in `handle_notify'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:733:in `block in on_notify'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:733:in `synchronize'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/in_tail.rb:733:in `on_notify'
  2020-05-07 15:16:05 +0530 [error]: #0 fluent/supervisor.rb:551:block in run_worker: /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.6.3/lib/fluent/plugin/i`

@himanshigpta
Copy link
Author

@repeatedly Could you please provide some insight here...?

@repeatedly
Copy link
Member

The position file's last update is of that time when I started the td-agent, but I dont see any logs being sent to es.

Does this mean
ES output plugin doesn't receive data from in_tail input plugin?
or
in_tail's data is stored into ES output plugin but ES output plugin doesn't flush it to your ES cluster?

@himanshigpta
Copy link
Author

@repeatedly Sorry I couldn't reply earlier! I don't see any data in the specified buffer location if that's what you're asking.
Also there's one more thing to note, I have 4 other configs that work just fine with the same configuration & different log specific details.
One of the working sample configs:

`<source>
  @type tail
  path /data/hdfs_mount.txt
  pos_file /etc/td-agent/hdfs_mount.pos
  #time_format %Y-%m-%dT%H:%M:%S.%NZ
  time_format %b %dT%H:%M:%SZ
  tag hdfs.mount
  #read_from_head true
  <parse>
    @type multiline_grok
#    #multiline_start_regexp /[\w]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2}/
    <grok>
     pattern %{HOSTNAME:host}%{SPACE}%{UNIXPATH:filesystem}%{SPACE}%{USERNAME:size}%{SPACE}%{USERNAME:used}%{SPACE}%{USERNAME:available}%{SPACE}%{NUMBER:use_percent}%%{SPACE}%{UNIXPATH:mounted_on}
    </grok>
    <grok>
     pattern %{GREEDYDATA:hdfs_msg}
    </grok>
  </parse>
</source>

<filter hdfs.mount>
    @type record_modifier
     <record>
     hostname "#{Socket.gethostname}"
     formatted_time ${Time.at(time).iso8601(3)}
     </record>
     char_encoding utf-8
     char_encoding utf-8:euc-jp
</filter>

<match hdfs.mount>
  @type elasticsearch
  log_es_400_reason true
  user elastic
  password XXXXXXX
  type_name "_doc"
  ssl_version TLSv1_2
  ca_file "/etc/td-agent/BhartiCA.crt"
  hosts XXXXXX
  scheme "https"
  logstash_format true
  logstash_dateformat %V
  logstash_prefix hdfs.mount
  include_timestamp true
  reconnect_on_error true
  reload_on_failure true
  <buffer>
    @type file
    path /etc/td-agent/hdfs-mount/buffers
    chunk_limit_size 1M
    flush_interval 5s
    retry_forever false
    retry_max_times 3
    retry_wait 10
    retry_max_interval 300
    flush_thread_count 8
  </buffer>
</match>` 

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Dec 18, 2020
@github-actions
Copy link

This issue was automatically closed because of stale in 30 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants