Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

procstat input shouldn't cache PID #1636

Closed
aabouzaid opened this issue Aug 16, 2016 · 6 comments · Fixed by #2540
Closed

procstat input shouldn't cache PID #1636

aabouzaid opened this issue Aug 16, 2016 · 6 comments · Fixed by #2540
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@aabouzaid
Copy link

procstat reports for processes aren't in the conf file ... and they look totally random from the system.

Here is an example for some procs name:

udevadm
telegraf
sudo
sleep
pgrep
kworker/2:2
cleanup

It collect data for these procs for sometime (also a random time) and then it stops!

System info:

System: Ubuntu 14.04.4 LTS
Telegraf: 0.13.1
pgrep: 3.3.9 (default one that comes with Ubuntu)

Steps to reproduce:

It happens randomly.

Expected behavior:

It should collect data for procs in config file only.

Actual behavior:

It collects data for arbitrary procs (and no errors related to procstat in telegraf logs).

Additional info:

Conf file:
/etc/telegraf/telegraf.d/procstat.conf

###############################################################################
#                                  OUTPUTS                                    #
###############################################################################
[[outputs.influxdb]]
  # The full HTTP or UDP endpoint URL for your InfluxDB instance.
  # Multiple urls can be specified but it is assumed that they are part of the same
  # cluster, this means that only ONE of the urls will be written to each interval.
  # urls = ["udp://localhost:8089"] # UDP endpoint example
  urls = ["http://10.10.10.10:8086"] # required
  # The target database for metrics (telegraf will create it if not exists)
  database = "proc_status" # required
  # Precision of writes, valid values are n, u, ms, s, m, and h
  # note: using second precision greatly helps InfluxDB compression
  precision = "s"

  # Connection timeout (for the connection with InfluxDB), formatted as a string.
  # If not provided, will default to 0 (no timeout)
  timeout = "30s"
  username = "proc_status"
  password = "*****************"
  # Set the user agent for HTTP POSTs (can be useful for log differentiation)
  # user_agent = "telegraf"
  # Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
  # udp_payload = 512

  # This metrics only will go to processes database.
  namepass = ["procstat*"]


###############################################################################
#                                  INPUTS                                     #
###############################################################################
[[inputs.procstat]]
  exe = "salt-minion"

[[inputs.procstat]]
  exe = "sshd"
@sparrc
Copy link
Contributor

sparrc commented Aug 16, 2016

that is very odd, do you have any steps to reproduce? Are you sure there are no other conflicting configuration files? do you see any patterns, such as PID overlap or anything?

@aabouzaid
Copy link
Author

aabouzaid commented Aug 16, 2016

I tried to find any pattern, but I couldn't find any (e.g. if there is another Telegraf plugin is enabled with procstat or so, but it happens in different servers and envs).

But while I'm checking if there is anything with PIDs, I found the exe name is different of process_name for the same PID (in different times).

time,                  exe,     host,       pid,      process_name
-----------------------------------------------------------------------
2016-08-11T04:03:20Z,  "sshd",  "xserver",  "13778",  "sshd"
2016-08-11T04:03:30Z,  "sshd",  "xserver",  "13778",  "java"
--
2016-08-16T01:04:50Z,  "sshd",  "xserver",  "13778",  "java"
2016-08-16T12:10:00Z,  "sshd",  "xserver",  "13778",  "telegraf"

Reuse of the PID when a process finishes is normal. So I think there is a time space between procstat gets/assigns name of process and its PID.

@sparrc
Copy link
Contributor

sparrc commented Aug 16, 2016

I see, in that case I think I know what the bug is. The procstat plugin caches the PID because it assumes that it's not going to be changing. It does this to avoid having to query the system every interval for the pid with the given filter.

I think the only solution will be to query every interval for the PID.

@sparrc sparrc added the bug unexpected problem or unintended behavior label Aug 16, 2016
@sparrc sparrc changed the title procstat - reports arbitrary processes. procstat input shouldn't cache PID Aug 16, 2016
@aabouzaid
Copy link
Author

Great! Thanks @sparrc :-)

jarondl added a commit to jarondl/telegraf that referenced this issue Dec 28, 2016
Changed the procstat input plugin to not cache PIDs. Solves influxdata#1636.
The logic of creating a process by pid was moved from `procstat.go` to
`spec_processor.go`.
@sparrc sparrc added this to the 1.3.0 milestone Feb 1, 2017
sparrc pushed a commit that referenced this issue Feb 2, 2017
* Procstat: don't cache PIDs

Changed the procstat input plugin to not cache PIDs. Solves #1636.
The logic of creating a process by pid was moved from `procstat.go` to
`spec_processor.go`.

* Procstat: go fmt

* procstat: modify changelog for #2206
@sparrc
Copy link
Contributor

sparrc commented Feb 2, 2017

closed by #2206

@sparrc sparrc closed this as completed Feb 2, 2017
mlindes pushed a commit to Comcast/telegraf that referenced this issue Feb 6, 2017
* Procstat: don't cache PIDs

Changed the procstat input plugin to not cache PIDs. Solves influxdata#1636.
The logic of creating a process by pid was moved from `procstat.go` to
`spec_processor.go`.

* Procstat: go fmt

* procstat: modify changelog for influxdata#2206
@sparrc
Copy link
Contributor

sparrc commented Mar 1, 2017

reopening because #2206 is being reverted because it breaks the cpu_usage metric

see #2479

@sparrc sparrc reopened this Mar 1, 2017
@danielnelson danielnelson self-assigned this Mar 13, 2017
danielnelson added a commit that referenced this issue Mar 17, 2017
ssorathia pushed a commit to ssorathia/telegraf that referenced this issue Mar 25, 2017
calerogers pushed a commit to calerogers/telegraf that referenced this issue Apr 5, 2017
vlamug pushed a commit to vlamug/telegraf that referenced this issue May 30, 2017
maxunt pushed a commit that referenced this issue Jun 26, 2018
* Procstat: don't cache PIDs

Changed the procstat input plugin to not cache PIDs. Solves #1636.
The logic of creating a process by pid was moved from `procstat.go` to
`spec_processor.go`.

* Procstat: go fmt

* procstat: modify changelog for #2206
maxunt pushed a commit that referenced this issue Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants