Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraph Docker Input Plugin : per docker running status support #4259

Merged
merged 8 commits into from
Jun 18, 2018
Merged

Telegraph Docker Input Plugin : per docker running status support #4259

merged 8 commits into from
Jun 18, 2018

Conversation

prashanthjbabu
Copy link
Contributor

@prashanthjbabu prashanthjbabu commented Jun 9, 2018

Currently the docker input plugin does not provide the running status of the docker . It does provide health status of the docker but that is only retrieved if the docker application supports it . This enhancement extends the plugin to give out the running status of the docker which is taken from the Docker Container Inspect API.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@@ -435,6 +435,22 @@ func (d *Docker) gatherContainer(
}
}
}
if info.State != nil {
statefields := map[string]interface{}{
"status": info.State.Status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have this value as a tag on all docker_container_* measurements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be too much of duplication since the same information is across multiple measurements?

Copy link
Contributor

@danielnelson danielnelson Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little bit of duplication, but it seems useful to be able to "group by" status on any of the container measurements.

Side note: If I could do it all over from scratch I would just have a single docker_container measurement, but now it's not worth the disruption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the code to add status for each container . It can get tricky for cases like CPU(where there is a per-cpu option) and Network(where there is a per interface option). Also since these collectors are all over the place , we would have to make a separate "docker inspect" API call at each of these places . It seems a bit inefficient considering how it currently is designed . Would you still like me to do it that way?
To handle the "group by" feature you requested , we could plug in "container_id" into this which allows you to match with it.

Copy link
Contributor

@danielnelson danielnelson Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not if we need to make extra API calls. Let's just add it to the new measurement as a tag and we can try to fish it through to other measurements later on, if needed.

Copy link
Contributor Author

@prashanthjbabu prashanthjbabu Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this one ! I've added it as a tag.. So now it comes not only for docker_container_status , but also for cpu,mem etc

Attached some logs :

docker_container_status,container_image=ubuntu,container_name=vibrant_benz,container_status=running,container_version=unknown,deviceid=mydeviceid,engine_host=raspberrypi,server_version=18.05.0-ce started_at="2018-06-14T05:48:53.266176036Z",finished_at="0001-01-01T00:00:00Z",oomkilled=false,pid=5278i,exitcode=0i 1528956922000000000
docker_container_mem,container_image=ubuntu,container_name=vibrant_benz,container_status=running,container_version=unknown,deviceid=mydeviceid,engine_host=raspberrypi,server_version=18.05.0-ce limit=0i,usage=0i,max_usage=0i,usage_percent=0,container_id="f85ef98b08bde3ad6a93fadcacfa33a57a1f833119cce0309cadb15c45df5cf4" 1528956921000000000
docker_container_cpu,container_image=ubuntu,container_name=vibrant_benz,container_status=running,container_version=unknown,cpu=cpu-total,deviceid=mydeviceid,engine_host=raspberrypi,server_version=18.05.0-ce usage_total=102424740i,usage_in_usermode=60000000i,usage_system=10927430000000i,throttling_periods=0i,throttling_throttled_time=0i,usage_in_kernelmode=20000000i,throttling_throttled_periods=0i,container_id="f85ef98b08bde3ad6a93fadcacfa33a57a1f833119cce0309cadb15c45df5cf4",usage_percent=0 1528956921000000000
docker_container_cpu,container_image=ubuntu,container_name=vibrant_benz,container_status=running,container_version=unknown,cpu=cpu0,deviceid=mydeviceid,engine_host=raspberrypi,server_version=18.05.0-ce usage_total=7975104i,container_id="f85ef98b08bde3ad6a93fadcacfa33a57a1f833119cce0309cadb15c45df5cf4" 1528956921000000000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests seem to be failing after adding "container_status" as a tag . I'm not quite sure how to deal with this. Any ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this . All tests pass now.

"status": info.State.Status,
"running": info.State.Running,
"paused": info.State.Paused,
"restarting": info.State.Restarting,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to save the running, paused, restarting, dead values, since this is all encompassed by the status field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point . I will remove them....

"exitcode": info.State.ExitCode,
"error": info.State.Error,
"startedat": info.State.StartedAt,
"finishedat": info.State.FinishedAt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if error, startedat, finishedat are interesting enough to include, I feel like many people will want to exclude them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel startedat,finishedat is important since it gives useful information on the uptime of the container or even how long ago the container died(based on "status") . If you still think its not needed , we can take it off!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding error . I will remove it..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, lets just rename them started_at and finished_at and convert them to unixnano format (unix epoch timestamp in nanosecond precision). This is the format we have been using recently for encoding timestamps into a field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed this to started_at and finished_at .
I have a query regarding converting the time from RFC3339 format (Example : 2018-06-14T05:48:53.266176036Z ) to Unix Nano format . Golang has the option to convert using the following code :
t1, err := time.Parse(time.RFC3339,"0001-01-01T00:00:00Z")
and then use
t1.UnixNano()
Here's some sample output of the timestamps :

docker_container_status,container_image=ubuntu,container_name=vibrant_benz,container_status=running,container_version=unknown,deviceid=mydeviceid,engine_host=raspberrypi,server_version=18.05.0-ce started_at="2018-06-14T05:48:53.266176036Z",finished_at="0001-01-01T00:00:00Z",oomkilled=false,pid=5278i,exitcode=0i 1528956922000000000

In the above case the docker is still running so it has a valid start time , although its endtime is invalid which is "0001-01-01T00:00:00Z" . If i use the above go code to convert this to unix nano , I get "-6795364578871345152" . As you can see it returns a long negative invalid value . Would this be okay? How else would you like me to handle this ?If this is fine then I will go ahead and commit this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to use time.IsZero(), and if true skip adding the field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! I've done that.

@danielnelson danielnelson added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin area/docker labels Jun 13, 2018
@danielnelson danielnelson added this to the 1.8.0 milestone Jun 18, 2018
@danielnelson danielnelson merged commit 98d86df into influxdata:master Jun 18, 2018
@danielnelson
Copy link
Contributor

I updated the README in c98b58d

mkboudreau added a commit to mkboudreau/telegraf that referenced this pull request Jun 22, 2018
* origin: (39 commits)
  Document path tag in tail input
  Update changelog
  Added path tag to tail input plugin (influxdata#4292)
  Run windows tests with -short
  Fix postfix input handling of multi-level queues (influxdata#4333)
  Update changelog
  Add support for comma in logparser timestamp format (influxdata#4311)
  Update vendoring to dep from gdm (influxdata#4314)
  Update changelog
  Add new measurement with results of pgrep lookup to procstat input (influxdata#4307)
  Update changelog
  Add valuecounter aggregator plugin (influxdata#3523)
  Update changelog
  Update docker input documentation for container status
  Add container status tag to docker input (influxdata#4259)
  Drop CI support for Go 1.8 (influxdata#4309)
  Update changelog
  Fix selection of tags under nested objects in the JSON parser (influxdata#4284)
  Update changelog
  Add owner tag on partitions in burrow input (influxdata#4281)
  ...
@Puneeth-n
Copy link

Puneeth-n commented Aug 21, 2018

Hi,

I am unable to find the container_status tags in Chronograf. Im using the telegraf docker version 1.7.

telegraf.conf

[[inputs.docker]]
  interval = "10s"
  endpoint = "unix:///var/run/docker.sock"
  container_state_include = ["running", "dead", "restarting"]
  timeout = "5s"
  perdevice = true
  total = false

Docker version

Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:10:22 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:08:26 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Chronograf

@danielnelson
Copy link
Contributor

@Puneeth-n This feature will be included in Telegraf 1.8.0, but if you are okay with a slightly higher chance of encountering bugs you can try using the nightly development builds to demo this feature now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants