Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on startup (0.12.6) #406

Closed
jsravn opened this issue Oct 27, 2017 · 12 comments
Closed

Segfault on startup (0.12.6) #406

jsravn opened this issue Oct 27, 2017 · 12 comments

Comments

@jsravn
Copy link
Contributor

jsravn commented Oct 27, 2017

Trying fluent-bit for the first time, inside the docker image.

# gdb /fluent-bit/bin/fluent-bit
<snip>
(gdb) run -c /config/fluent-bit.conf
Starting program: /fluent-bit/bin/fluent-bit -c /config/fluent-bit.conf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Fluent-Bit v0.12.6
Copyright (C) Treasure Data

[New Thread 0x7ffff5fff700 (LWP 240)]
[2017/10/27 13:43:38] [ info] [engine] started

Program received signal SIGSEGV, Segmentation fault.
0x00000000004be4e3 in flb_kube_conf_destroy (ctx=0x0) at /tmp/fluent-bit-0.12.6/plugins/filter_kubernetes/kube_conf.c:206
206     /tmp/fluent-bit-0.12.6/plugins/filter_kubernetes/kube_conf.c: No such file or directory.
(gdb) quit

My config is

[SERVICE]
  Daemon        Off
  Log_level     debug
  Parsers_File  fluent-parsers.conf

[INPUT]
  Name            tail
  Path            /var/log/containers/*.log
  Parser          docker
  Tag             kube.*
  Mem_Buf_Limit   5mb
  Skip_Long_Lines On

[FILTER]
  Name  kubernetes
  Match kube.*

[OUTPUT]
  Name            es
  Match           *
  Host            ${ELASTIC_SEARCH_HOST}
  Port            443
  tls             On
  Logstash_Format On
  Logstash_Prefix ${ENVIRONMENT}
  Include_Tag_Key On
  Time_Key        time
  Time_Format     %Y-%m-%dT%H:%M:%S.%L

Not sure why the context is null, I'll try to dig into it some more.

@edsiper
Copy link
Member

edsiper commented Oct 27, 2017

which Docker image are you using ?

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

fluent/fluent-bit:0.12 @edsiper

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

backtrace

Program received signal SIGSEGV, Segmentation fault.
0x00000000004be4e3 in flb_kube_conf_destroy (ctx=0x0) at /tmp/fluent-bit-0.12.6/plugins/filter_kubernetes/kube_conf.c:206
206         if (ctx->hash_table) {
(gdb) bt
#0  0x00000000004be4e3 in flb_kube_conf_destroy (ctx=0x0) at /tmp/fluent-bit-0.12.6/plugins/filter_kubernetes/kube_conf.c:206
#1  0x00000000004bdcdb in cb_kube_exit (data=0x0, config=0x7ffff601c280) at /tmp/fluent-bit-0.12.6/plugins/filter_kubernetes/kubernetes.c:373
#2  0x000000000046d9b4 in flb_filter_exit (config=0x7ffff601c280) at /tmp/fluent-bit-0.12.6/src/flb_filter.c:164
#3  0x0000000000473cbd in flb_engine_shutdown (config=0x7ffff601c280) at /tmp/fluent-bit-0.12.6/src/flb_engine.c:552
#4  0x000000000047370c in flb_engine_start (config=0x7ffff601c280) at /tmp/fluent-bit-0.12.6/src/flb_engine.c:402
#5  0x000000000041c5b7 in main (argc=3, argv=0x7fffffffe5d8) at /tmp/fluent-bit-0.12.6/src/fluent-bit.c:729

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

Doesn't like something about my output plugin config, then shuts down but doesn't check if the filter was initialised yet and segfaults. Not getting any error or log output though.

@edsiper
Copy link
Member

edsiper commented Oct 27, 2017

that's very weird, would you please provide the output of:

fluent-bit -c yourconffile.conf --sosreport

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

[2017/10/27 14:27:06] [ info] [engine] started                                                                                                                                                            
[2017/10/27 14:27:06] [debug] [in_tail] inotify watch fd=19                                                                                                                                               
[2017/10/27 14:27:06] [debug] [in_tail] scanning path /var/log/containers/*.log                                                                                                                                    
<snip>                                                            
[2017/10/27 14:27:06] [error] [io_tls] flb_io_tls.c:120 X509 - Read/write of file failed                                                                                                                  
[2017/10/27 14:27:06] [error] [output es.0] error initializing TLS context                                                                                                                                
[2017/10/27 14:27:06] [ info] [filter_kube] https=1 host=kubernetes.default.svc port=443                                                                                                                  
[2017/10/27 14:27:06] [ info] [filter_kube] local POD info OK                                                                                                                                             
[2017/10/27 14:27:06] [ info] [filter_kube] testing connectivity with API server...                                                                                                                       
[2017/10/27 14:27:06] [ warn] net_tcp_fd_connect: getaddrinfo(host='kubernetes.default.svc'): Name or service not known                                                                                            
[2017/10/27 14:27:06] [error] [filter_kube] upstream connection error                                                                                                                                     
[2017/10/27 14:27:06] [error] [filter_kube] could not get meta for POD ip-<snip>                                                                                                                  
[2017/10/27 14:27:06] [debug] [router] input=tail.0 'DYNAMIC TAG'                   
Fluent-Bit v0.12.6
Copyright (C) Treasure Data


Fluent Bit Enterprise - SOS Report
==================================
The following report aims to be used by Fluent Bit and Fluentd Enterprise
Customers of Treasure Data. For more details visit:

    https://fluentd.treasuredata.com


[Fluent Bit]
    Edition             Community Edition
    Version             0.12.6
    Built Flags          JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_SQLDB FLB_HAVE_BUFFERING FLB_HAVE_TRACE FLB_HAVE_FLUSH_LIBCO FLB_HAVE_SYSTEMD FLB_HAVE_FORK FLB_HAVE_PROXY_GO FLB_HAVE_JEMALLOC JEMALL
OC_MANGLE FLB_HAVE_REGEX FLB_HAVE_C_TLS FLB_HAVE_SETJMP FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY

[Operating System]
    Name                Linux
    Release             4.4.0-97-generic
    Version             #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017

[Hardware]
    Architecture        x86_64
    Processors          2

[Built Plugins]
    Inputs              cpu mem kmsg tail proc disk systemd netif dummy head health serial stdin tcp mqtt lib forward random syslog
    Filters             grep stdout kubernetes parser record_modifier
    Outputs             counter es exit file forward http influxdb kafka-rest nats null plot stdout td lib flowcounter

[SERVER] Runtime configuration
    Flush               5
    Daemon              Off
    Log_Level           Debug

[INPUT] Instance
    Name                tail.0 (tail, id=0)
    Flags               DYN_TAG
    Threaded            No
    Tag                 kube.*
    Mem_Buf_Limit       4.8M
    Path                /var/log/containers/*.log
    Parser              docker
    Skip_Long_Lines     On

[FILTER] Instance
    Name                kubernetes.0 (kubernetes, id=0)
    Match               kube.*
    Merge_JSON_Log      On

command terminated with exit code 1

That helps! How come sosreport is needed to get the logs to flush? I suppose the assumption is it doesn't segfault first.

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

I got a bit farther by fixing the kubernetes URL. My log shippers are using host networking for "reasons". It's still segfaulting though. This time, it can't find the pod because it looks like it assumes the hostname is the podname (which it isn't with host networking). I think we got around this in the fluentd plugin by setting an environment variable POD_NAME which it would pick up if set. It also looks like tls is failing to initialise for the ES plugin, not sure why though yet.

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

updated sosreport

[2017/10/27 14:48:11] [error] [io_tls] flb_io_tls.c:120 X509 - Read/write of file failed 
[2017/10/27 14:48:11] [error] [output es.0] error initializing TLS context
[2017/10/27 14:48:11] [ info] [filter_kube] https=1 host=<snip> port=443                        
[2017/10/27 14:48:11] [ info] [filter_kube] local POD info OK
[2017/10/27 14:48:11] [ info] [filter_kube] testing connectivity with API server...
[2017/10/27 14:48:11] [debug] [filter_kube] API Server (ns=kube-system, pod=ip-10-150-50-50) http_do=0, HTTP Status: 404
[2017/10/27 14:48:11] [debug] [filter_kube] API Server response                                                                                                                                           
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"ip-10-150-50-50\" not found","reason":"NotFound","details":{"name":"ip-10-150-50-50","kind":"pods"},"code":404}       
                                                                                                                                                                                                          
[2017/10/27 14:48:11] [error] [filter_kube] could not get meta for POD ip-10-150-50-50                                                                                                                     
[2017/10/27 14:48:11] [debug] [router] input=tail.0 'DYNAMIC TAG'            

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

Sorry, getting off topic. I'll work on sorting out the other errors. The original bug here is the segfault. Should it be checking the filter context is null before deleting it? It looks like all the filters have this issue - if the output filter fails to init, it tries to destroy the filters before they get init'd.

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

@edsiper I made a PR to fix the tls output plugin error that was causing the segfault. fluent/fluent-bit-docker-image#7

@jsravn
Copy link
Contributor Author

jsravn commented Oct 27, 2017

I took a stab at fixing the segfault in #407.

@edsiper
Copy link
Member

edsiper commented Oct 27, 2017

thanks for the contribution, closing this issue as fixed.

note: I will make a maintenance v0.12.7 release with these changes in the next week.

@edsiper edsiper closed this as completed Oct 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants