Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat - panic while fetching 'system/process' #5337

Closed
jamiesmith opened this issue Oct 5, 2017 · 13 comments
Closed

Metricbeat - panic while fetching 'system/process' #5337

jamiesmith opened this issue Oct 5, 2017 · 13 comments
Assignees
Labels

Comments

@jamiesmith
Copy link

jamiesmith commented Oct 5, 2017

For confirmed bugs, please report:

  • Version: metricbeat 6.0-rc1
  • Operating System: mac 10.12.6
  • Steps to Reproduce: happens occasionally

Saw this while monitoring on mac - docker and mongo are enabled

2017-10-05T11:49:14-04:00 ERR  recovered from panic while fetching 'system/process' for host ''. Recovering, but please report this: runtime error: index out of range.
2017-10-05T11:49:14-04:00 ERR  Stacktrace: goroutine 128 [running]:
runtime/debug.Stack(0x55d789b, 0x2b, 0xc4201db0d0)
    /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/elastic/beats/libbeat/logp.Recover(0xc420202080, 0x40)
    /go/src/github.com/elastic/beats/libbeat/logp/log.go:176 +0x12f
panic(0x5269360, 0x6276720)
    /usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/elastic/beats/vendor/github.com/elastic/gosigar.kern_procargs(0x4823, 0x0, 0x0, 0xc4201db388, 0xc42030c1e0, 0x30)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_darwin.go:423 +0x763
github.com/elastic/beats/vendor/github.com/elastic/gosigar.(*ProcEnv).Get(0xc4201db400, 0x4823, 0x0, 0x40130bf)
    /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_darwin.go:338 +0x72
github.com/elastic/beats/metricbeat/module/system/process.getProcEnv(0x4823, 0xc42030c1b0, 0xc4201db708, 0x0, 0xc42030c1b0)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:165 +0x53
github.com/elastic/beats/metricbeat/module/system/process.(*Process).getDetails(0xc420342200, 0xc4201db708, 0x4, 0x1)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:120 +0x520
github.com/elastic/beats/metricbeat/module/system/process.(*ProcStats).GetProcStats(0xc42014f440, 0xc4201dbda4, 0xc4204cc4a0, 0xc4201dbda0, 0xc4204cc340, 0xc4201dbd70)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:376 +0x397
github.com/elastic/beats/metricbeat/module/system/process.(*MetricSet).Fetch(0xc4201a54a0, 0x404ae06, 0x18, 0xc42044f1c0, 0xc4204f8000, 0xc4201dbbd0)
    /go/src/github.com/elastic/beats/metricbeat/module/system/process/process.go:82 +0x47
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).multiEventFetch(0xc4202e9b40, 0x7346990, 0xc4201a54a0, 0x623af00, 0xc4200765a0)
    /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:238 +0x4f
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).fetch(0xc4202e9b40, 0x623af00, 0xc4200765a0)
    /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:222 +0x118
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).startPeriodicFetching(0xc4202e9b40, 0x623af00, 0xc4200765a0)
    /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:209 +0x1ea
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).run(0xc4202e9b40, 0xc42045a3c0, 0xc42045ab40)
    /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:186 +0x5b8
github.com/elastic/beats/metricbeat/mb/module.(*Wrapper).Start.func1(0xc42045df00, 0xc42045a3c0, 0xc42045ab40, 0xc4202e9b40)
    /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:110 +0xbd
created by github.com/elastic/beats/metricbeat/mb/module.(*Wrapper).Start
    /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:111 +0x14e
@jamiesmith
Copy link
Author

I disabled mongodb & docker and saw it again, so it looks like it is coming from system

@andrewkroh andrewkroh changed the title metricbeat-mongodb - panic while fetching 'system/process' for host '' Metricbeat - panic while fetching 'system/process' Oct 6, 2017
@andrewkroh
Copy link
Member

I haven't tried to reproduce it, but I wonder if it's something with 10.12.6. In any case the gosigar code should be more defensive.

@tsg
Copy link
Contributor

tsg commented Oct 6, 2017

Doesn't seem to happen to me, and I'm on 10.12.6. @jamiesmith Did you paste your configuration for the system module here, please?

@tsg
Copy link
Contributor

tsg commented Oct 9, 2017

I let it run over the weekend on my laptop (10.12.6), and this didn't happen. I suspect it has something to do with the name of one of your processes. I'd say we should just make the code in gosigar more robust, even if we can't reproduce it.

ph added a commit to elastic/gosigar that referenced this issue Oct 10, 2017
Under some circonstance the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
ph added a commit to ph/gosigar that referenced this issue Oct 10, 2017
Under some circonstance the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
ph added a commit to ph/gosigar that referenced this issue Oct 10, 2017
Under some circonstance the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
@ph
Copy link
Contributor

ph commented Oct 10, 2017

I've tried to reproduce this issue on my end without luck, but I've went ahead and created a small PR on gosigar to add a safety check where the panic happened, I currently only return the PID on error, I was undecided if we should return the content of the splitted slice, or just the PID.

elastic/gosigar#78

@ph ph self-assigned this Oct 10, 2017
@jamiesmith
Copy link
Author

I only saw it a couple of times- I only reported it because it said to :)
So probably test first is a decent approach, as @ph pushed

Note that it also had:
recovered from panic while fetching 'system/process' for host ''. Recovering, but please report this: runtime error: index out of range.

So not sure if the host is actually blank (host '')

thanks!

@andrewkroh
Copy link
Member

The empty value in for host '' is because the system module doesn't use the hosts list from the module config. For example if the panic came from the apache module it would tell you what remote host it was monitoring when the exception occurred.

ph added a commit to ph/gosigar that referenced this issue Oct 11, 2017
Under some circonstance the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
ph added a commit to ph/gosigar that referenced this issue Oct 11, 2017
Under some circonstance the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
ph added a commit to ph/gosigar that referenced this issue Oct 11, 2017
Under some circonstance the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
andrewkroh pushed a commit to elastic/gosigar that referenced this issue Oct 11, 2017
Under some circumstances the code can fail, the root cause is not clear
at the moment, but we should still make the code more robust when
retrieving information for a specific process.

The problem is the `bytes.SplitN` can return an array of one element
making the process panic when we access the other element.

This commit make the code a bit more robust and check to make sure we
successfully retrieved 2 elements instead of one and return an error
if it failed.

Reference: elastic/beats#5337
ph added a commit to ph/beats that referenced this issue Oct 12, 2017
Update gosigar to make the code more robust when we cannot fetch all information for a MacOS X
process.

Closes: elastic#5337
andrewkroh pushed a commit that referenced this issue Oct 12, 2017
Update gosigar to make the code more robust when we cannot fetch all information for a MacOS X process.

Closes: #5337
@arkadicolson
Copy link

arkadicolson commented Jan 24, 2018

We are having the same problem. For the second time in 1 day. Started the daemon again so it will probably crash again soon. Using version 6.0.1

2018-01-23T16:30:49+01:00 INFO metricbeat start running.
2018-01-23T16:59:23+01:00 ERR  recovered from panic while fetching 'system/process' for host ''. Recovering, but please report this: runtime error: slice bounds out of range.
2018-01-23T16:59:23+01:00 ERR  Stacktrace: goroutine 88 [running]:
runtime/debug.Stack(0x2a67e9d, 0x2b, 0xc42026eed8)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/elastic/beats/libbeat/logp.Recover(0xc4200520c0, 0x40)
        /go/src/github.com/elastic/beats/libbeat/logp/log.go:176 +0x12f
panic(0x26f4540, 0x38c9ca0)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
io/ioutil.readAll.func1(0xc42026f260)
        /usr/local/go/src/io/ioutil/ioutil.go:30 +0x119
panic(0x26f4540, 0x38c9ca0)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
bytes.(*Buffer).ReadFrom(0xc42026f1b8, 0x38e2520, 0xc4204d25b8, 0xc420578000, 0x0, 0x200)
        /usr/local/go/src/bytes/buffer.go:180 +0x25c
io/ioutil.readAll(0x38e2520, 0xc4204d25b8, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/io/ioutil/ioutil.go:33 +0x150
io/ioutil.ReadFile(0xc4202763a0, 0x13, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/io/ioutil/ioutil.go:70 +0xf7
github.com/elastic/beats/vendor/github.com/elastic/gosigar.readProcFile(0x4a2f, 0x2a18bfd, 0x7, 0xc420448090, 0x0, 0x4, 0x6, 0x97)
        /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_linux_common.go:412 +0x5b
github.com/elastic/beats/vendor/github.com/elastic/gosigar.(*ProcEnv).Get(0xc42026f400, 0x4a2f, 0x0, 0xc420322000)
        /go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_linux_common.go:312 +0x55
github.com/elastic/beats/metricbeat/module/system/process.getProcEnv(0x4a2f, 0xc4204480c0, 0xc42026f708, 0x0, 0xc4204480c0)
        /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:165 +0x53
github.com/elastic/beats/metricbeat/module/system/process.(*Process).getDetails(0xc42059ef00, 0xc42026f708, 0xf, 0x1)
        /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:120 +0x520
github.com/elastic/beats/metricbeat/module/system/process.(*ProcStats).GetProcStats(0xc4202c0480, 0xc42026fda4, 0xc4202d0640, 0xc42026fda0, 0xc4202d04e0, 0xc42026fd70)
        /go/src/github.com/elastic/beats/metricbeat/module/system/process/helper.go:376 +0x397
github.com/elastic/beats/metricbeat/module/system/process.(*MetricSet).Fetch(0xc42017c0a0, 0x0, 0x0, 0xc42018c030, 0x1, 0xc42026fbd0)
        /go/src/github.com/elastic/beats/metricbeat/module/system/process/process.go:82 +0x47
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).multiEventFetch(0xc4203c6e40, 0x7fc7a5c083f8, 0xc42017c0a0, 0x38f7be0, 0xc420268150)
        /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:238 +0x4f
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).fetch(0xc4203c6e40, 0x38f7be0, 0xc420268150)
        /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:222 +0x118
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).startPeriodicFetching(0xc4203c6e40, 0x38f7be0, 0xc420268150)
        /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:209 +0x1ea
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).run(0xc4203c6e40, 0xc42018ea20, 0xc42018ea80)
        /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:186 +0x5b8
github.com/elastic/beats/metricbeat/mb/module.(*Wrapper).Start.func1(0xc4203f3990, 0xc42018ea20, 0xc42018ea80, 0xc4203c6e40)
        /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:110 +0xbd
created by github.com/elastic/beats/metricbeat/mb/module.(*Wrapper).Start
        /go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:111 +0x14e

2018-01-24T08:21:06+01:00 INFO metricbeat stopped.

@andrewkroh
Copy link
Member

@arkadicolson This was fixed in v6.1. Please update versions.

@arkadicolson
Copy link

We are running version 6.2.2 and still the same error...

@arkadicolson
Copy link

arkadicolson commented Mar 28, 2018

2018-03-26T01:54:52.640+0200 ERROR ioutil/ioutil.go:30 recovered from panic while fetching 'system/process' for host ''. Recovering, but please report this: runtime error: slice bounds out of range.

{"stack": "github.com/elastic/beats/libbeat/logp.Recover
	/go/src/github.com/elastic/beats/libbeat/logp/global.go:88
runtime.call32
	/usr/local/go/src/runtime/asm_amd64.s:509
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:491
io/ioutil.readAll.func1
	/usr/local/go/src/io/ioutil/ioutil.go:30
runtime.call32
	/usr/local/go/src/runtime/asm_amd64.s:509
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:491
runtime.panicslice
	/usr/local/go/src/runtime/panic.go:35
bytes.(*Buffer).ReadFrom
	/usr/local/go/src/bytes/buffer.go:210
io/ioutil.readAll
	/usr/local/go/src/io/ioutil/ioutil.go:33
io/ioutil.ReadFile
	/usr/local/go/src/io/ioutil/ioutil.go:70
github.com/elastic/beats/vendor/github.com/elastic/gosigar.readProcFile
	/go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_linux_common.go:425
github.com/elastic/beats/vendor/github.com/elastic/gosigar.(*ProcEnv).Get
	/go/src/github.com/elastic/beats/vendor/github.com/elastic/gosigar/sigar_linux_common.go:325
github.com/elastic/beats/libbeat/metric/system/process.getProcEnv
	/go/src/github.com/elastic/beats/libbeat/metric/system/process/process.go:177
github.com/elastic/beats/libbeat/metric/system/process.(*Process).getDetails
	/go/src/github.com/elastic/beats/libbeat/metric/system/process/process.go:132
github.com/elastic/beats/libbeat/metric/system/process.(*Stats).getSingleProcess
	/go/src/github.com/elastic/beats/libbeat/metric/system/process/process.go:452
github.com/elastic/beats/libbeat/metric/system/process.(*Stats).Get
	/go/src/github.com/elastic/beats/libbeat/metric/system/process/process.go:393
github.com/elastic/beats/metricbeat/module/system/process.(*MetricSet).Fetch
	/go/src/github.com/elastic/beats/metricbeat/module/system/process/process.go:83
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).multiEventFetch
	/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:227
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).fetch
	/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:207
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).startPeriodicFetching
	/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:194
github.com/elastic/beats/metricbeat/mb/module.(*metricSetWrapper).run
	/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:171
github.com/elastic/beats/metricbeat/mb/module.(*Wrapper).Start.func1
	/go/src/github.com/elastic/beats/metricbeat/mb/module/wrapper.go:112"}

@andrewkroh
Copy link
Member

We are having the same problem.

@arkadicolson The stack trace you posted indicates this is not the same problem (your issue is with Linux, this issue is about a macOS bug). I wish I had noticed this when you first posted. Can you please open a new issue.

@arkadicolson
Copy link

arkadicolson commented Mar 29, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants