-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegraf 0.11.1 (Influx Backend) - Stops Gathering Metrics #1067
Comments
Note: A simple restart of the service gets things going again. |
Does this only happen on one server? Any chance you could send a SIGQUIT ( |
@sparrc It doesn't appear to happen on just one server, the next time it happens I'll try to get a dump for you. |
This help?
|
yes, very helpful, it looks like the ping command is hanging, I'll see what I can do about setting a timeout for that. |
actually it looks like the timeout is settable, but right now it defaults to "no timeout", could you try setting timeout to something like 5s and see if the problem is solved? I'm going to change the ping plugin to default to 5s as well. |
I can, it doesn't show up very often so may take a week or 2 to actually notice the diff. |
Looks like we already had it set :( |
that's strange, my only guess is that your system's ping command is hanging for some reason when this happens? I think I can implement a timeout for this in Telegraf but it might leave behind a zombie process. Do you have any relevant system logs when this occurs? |
Nothing obvious. |
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
First is to write an internal CombinedOutput and Run function with a timeout. Second, the following instances of command runners need to have timeouts: plugins/inputs/ping/ping.go 125: out, err := c.CombinedOutput() plugins/inputs/exec/exec.go 91: if err := cmd.Run(); err != nil { plugins/inputs/ipmi_sensor/command.go 31: err := cmd.Run() plugins/inputs/sysstat/sysstat.go 194: out, err := cmd.CombinedOutput() plugins/inputs/leofs/leofs.go 185: defer cmd.Wait() plugins/inputs/sysstat/sysstat.go 282: if err := cmd.Wait(); err != nil { closes #1067
Thanks for the detailed report @sharkannon! This fix will be available in Telegraf 0.13 |
We have Telegraf running on hundreds of servers, but I've been keeping an eye on a specific server as we keep losing data for it. It appears that every once in awhile Telegraf just stops gathering metrics. There doesn't appear to be anything in the logs other than:
Just suddenly stops "Gathering".
we have the following inputs:
The text was updated successfully, but these errors were encountered: