Skip to content
Wytze Hazenberg edited this page Aug 6, 2019 · 38 revisions

Version control

'master' of this repository is the latest version of the sts-agent. When we want updates from the upstream dd-agent, we merge those from the upstream master into the master branch in this repository.

By convention, we git remote add upstream git@github.com:DataDog/sts-agent.

Current differences between the dd-agent and the sts-agent can be seen via git diff remotes/upstream/master master. (do git fetch --all to refresh your data)

Branch references can be found throughout the build files in docker-sts-agent-build-deb-x64 and sts-agent-omnibus. Here we maintain a list of branch references which have to be updated when forking a feature branch and merging.

  • docker-sts-agent-build-deb-x64/deb-x86/Dockerfile
  • docker-sts-agent-build-deb-x64/deb-x86/docker_build.sh
  • sts-agent-omnibus/sotware/datadog-agent.rb:23

Differences

The differences between sts-agent and the upstream dd-agent is that we want customers to come to us for support first instead of bothering upstream, so in most customer-facing outputs we want to show 'sts' and 'StackState' rather than 'dd' and 'Datadog'. Changes include:

  • /etc/dd-agent should be /etc/sts-agent
  • /opt/datadog-agent should be /opt/stackstate-agent
  • /var/log/datadog should be /var/log/stackstate
  • config file should be stackstate.conf
  • url config parameter should be sts_url

Automating these updates has proven hard to automate reliably, so our current strategy is to perform these changes manually, aided by some scripts to make detecting new violations easier.

Packaging

diagram

Debian-package

To create a Debian package, you need the docker-sts-agent-build-deb-x64 docker container.

  1. Clone (https://github.com/StackVista/docker-sts-agent-build-deb-x64)
  2. Build docker build --no-cache -t docker-sts-agent-build-deb-x64 .
  • Add stsbuild_rsa to the directory. Should be an rsa key with access to the stackvista github repo's in the current directory
  • Set permission chmod 600 ./stsbuild_rsa
  1. Run the builder (two strategies, for osx or linux users)

Run the builder for linux users

docker run -e OMNIBUS_SOFTWARE_BRANCH=master 
           -e OMNIBUS_BRANCH=withConnBeat 
           -e AGENT_BRANCH=withConnBeat 
           -e PROJECT_DIR=sts-agent-omnibus 
           -v <local_dir>/pkg:/sts-agent-omnibus/pkg 
           -v <local_dir>/cache:/var/cache/omnibus 
           -v <local_dir>/sts-build-keys:/sts-build-keys 
           docker-sts-agent-build-deb-x64

Run the builder for osx users
There seems to be an issue with volume sharing with docker OSX and compiling datadog, that results in instable compilation or hanging builds. To avoid this problem, use docker container data sharing to keep te cache:

  1. Create a cache container docker create -v /var/cache/omnibus --name omnicache debian:wheezy /bin/true
docker run -e OMNIBUS_SOFTWARE_BRANCH=master 
           -e OMNIBUS_BRANCH=withConnBeat 
           -e AGENT_BRANCH=withConnBeat 
           -e PROJECT_DIR=sts-agent-omnibus 
           -v *<customdir>*/pkg:/sts-agent-omnibus/pkg 
           -v *<customdir>*/sts-build-keys:/sts-build-keys 
           --volumes-from omnicache 
           docker-sts-agent-build-deb-x64

Once you have docker-sts-agent-build-deb-x64, you can use it to build for example the 'withConnBeat' branch of the agent (replace local paths with paths valid on your machine):

Changing the version requires you to set a tag on the sts-agent branch and edit config.py's AGENT_VERSION variable. The build should detect the latest tag and use that. Also, change build_docker.sh in the docker-sts-agent-build-deb-x64 repo to reflect the correct version.

Demo plugin

Demo plugin that emits one event and one gauge per set interval.

~/.datadog-agent/checks.d/eventspammer.py

from checks import AgentCheck
import time
class EventSpammer(AgentCheck):
    def check(self, instance):
        self.log.info("emitting dummy gauge")
        self.gauge('eventspammer.gauge', 1)

        self.log.info("emitting dummy event")
        self.event({
            'timestamp': int(time.time()),
            'source_type_name': "event_spammer_source",
            'api_key' : "ABCDEFGHI",
            'msg_title': "test title",
            'msg_text': "test msg",
            'tags': [
                'test:tag',
                'tag:1'
            ]
        })

~/.datadog-agent/conf.d/eventspammer.yaml

init_config:
    min_collection_interval: 5

instances:
    [{}]

You may want to restart datadog agent.

API Endpoints

Endpoint: intake

HTTP POST to: /intake/?api=<api_key>

Note: truncated processes and metrics

{
	"agentVersion": "5.8.0",
	"apiKey": "5f98193e83ece68c811df22174859355",
	"collection_timestamp": 1467037580.595086,
	"cpuIdle": 38.0,
	"cpuStolen": 0,
	"cpuSystem": 54.0,
	"cpuUser": 8.0,
	"cpuWait": 0,
	"events": {
		"eventspammer": [{
			"api_key": "ABCDEFGHI",
			"msg_text": "test msg",
			"msg_title": "test title",
			"source_type_name": "event_spammer_source",
			"tags": [
				"test:tag",
				"tag:1"
			],
			"timestamp": 1467037584
		}]
	},
	"external_host_tags": {},
	"host-tags": {},
	"internalHostname": "mac-wytze",
	"ioStats": {
		"disk0": {
			"system.io.bytes_per_s": 660602.88
		},
		"disk2": {
			"system.io.bytes_per_s": 0.0
		},
		"disk3": {
			"system.io.bytes_per_s": 0.0
		},
		"disk4": {
			"system.io.bytes_per_s": 0.0
		}
	},
	"memBuffers": null,
	"memCached": null,
	"memPageTables": null,
	"memPhysFree": 649.0625,
	"memPhysPctUsable": 0.41200000000000003,
	"memPhysTotal": null,
	"memPhysUsable": 6744.97265625,
	"memPhysUsed": 14711.33984375,
	"memShared": null,
	"memSlab": null,
	"memSwapCached": null,
	"memSwapFree": 786.0,
	"memSwapPctFree": null,
	"memSwapTotal": null,
	"memSwapUsed": 1262.0,
	"metrics": [
		[
			"system.disk.total",
			1467037584,
			487374848.0, {
				"device_name": "/dev/disk1",
				"hostname": "mac-wytze",
				"type": "gauge"
			}
		],
		[
			"system.disk.free",
			1467037584,
			413437364.0, {
				"device_name": "/dev/disk1",
				"hostname": "mac-wytze",
				"type": "gauge"
			}
		],
		[
			"eventspammer.gauge",
			1467037584,
			1, {
				"hostname": "mac-wytze",
				"type": "gauge"
			}
		],
		[
			"system.net.packets_out.count",
			1467037584,
			470.4736842105263, {
				"device_name": "en0",
				"hostname": "mac-wytze",
				"type": "gauge"
			}
		],
		[
			"system.net.packets_in.count",
			1467037584,
			554.4736842105264, {
				"device_name": "en0",
				"hostname": "mac-wytze",
				"type": "gauge"
			}
		]
	],
	"os": "mac",
	"processes": {
		"apiKey": "5f98193e83ece68c811df22174859355",
		"host": "mac-wytze",
		"processes": [
			[
				"wytzehazenberg",
				"80987",
				"0.0",
				"0.1",
				"2503228",
				"9880",
				"s004",
				"S+",
				"2:12PM",
				"0:23.20",
				"vim"
			],
			[
				"wytzehazenberg",
				"79040",
				"0.0",
				"0.2",
				"2527308",
				"37844",
				"??",
				"S",
				"2:09PM",
				"0:15.82",
				"/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/agent.py foreground --use-local-forwarder"
			],
			[
				"wytzehazenberg",
				"79039",
				"0.0",
				"0.2",
				"2540112",
				"29924",
				"??",
				"S",
				"2:09PM",
				"0:11.55",
				"/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/ddagent.py"
			],
			[
				"wytzehazenberg",
				"79038",
				"0.0",
				"0.2",
				"2530504",
				"32852",
				"??",
				"S",
				"2:09PM",
				"0:11.14",
				"/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/dogstatsd.py --use-local-forwarder"
			],
			[
				"wytzehazenberg",
				"79035",
				"0.0",
				"0.0",
				"2478400",
				"5540",
				"??",
				"Ss",
				"2:09PM",
				"0:01.99",
				"/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/bin/supervisord -c /opt/datadog-agent/etc/supervisor.conf"
			],
			[
				"wytzehazenberg",
				"58859",
				"0.0",
				"0.0",
				"2435856",
				"680",
				"s009",
				"S+",
				"1:32PM",
				"0:00.05",
				"tail -f ./collector.log"
			]
		]
	},
	"python": "2.7.11 (default, May 16 2016, 13:19:07) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)]",
	"resources": {},
	"service_checks": [{
		"check": "datadog.agent.check_status",
		"host_name": "mac-wytze",
		"id": 2091,
		"message": null,
		"status": 0,
		"tags": [
			"check:ntp"
		],
		"timestamp": 1467037584.680259
	}, {
		"check": "datadog.agent.check_status",
		"host_name": "mac-wytze",
		"id": 2092,
		"message": null,
		"status": 0,
		"tags": [
			"check:disk"
		],
		"timestamp": 1467037584.681405
	}, {
		"check": "datadog.agent.check_status",
		"host_name": "mac-wytze",
		"id": 2093,
		"message": null,
		"status": 0,
		"tags": [
			"check:eventspammer"
		],
		"timestamp": 1467037584.682277
	}, {
		"check": "datadog.agent.check_status",
		"host_name": "mac-wytze",
		"id": 2094,
		"message": null,
		"status": 0,
		"tags": [
			"check:network"
		],
		"timestamp": 1467037584.709078
	}, {
		"check": "datadog.agent.up",
		"host_name": "mac-wytze",
		"id": 2095,
		"message": null,
		"status": 0,
		"tags": null,
		"timestamp": 1467037584.709122
	}],
	"system.load.1": 3.31,
	"system.load.15": 2.25,
	"system.load.5": 2.62,
	"system.load.norm.1": 0.41375,
	"system.load.norm.15": 0.28125,
	"system.load.norm.5": 0.3275,
	"system.uptime": 862103.6049060822,
	"uuid": "f8c341c7a50a53759d8726c8c918c0a2",
	"topologies": [{
                "start_snapshot": true,
                "stop_snapshot": true,
		"instance": {
			"type": "mesos",
			"url": "http://localhost:5050"
		},
		"components": [{
			"externalId": "nginx3.e5dda204-d1b2-11e6-a015-0242ac110005",
			"type": {
				"name": "docker"
			},
			"data": {
				"tags": ["mytag"],
				"ip_addresses": ["172.17.0.8"],
				"labels": [{
					"key": "label1"
				}],
				"framework_id": "fc998b77-e2d1-4be5-b15c-1af7cddabfed-0000",
				"docker": {
					"image": "nginx",
					"network": "BRIDGE",
					"port_mappings": [{
						"container_port": 31945,
						"host_port": 31945,
						"protocol": "tcp"
					}],
					"privileged": false
				},
				"task_name": "nginx3",
				"slave_id": "fc998b77-e2d1-4be5-b15c-1af7cddabfed-S0"
			}
		}],
		"relations": [{
			"externalId": "nginxapp",
			"type": {
				"name": "docker"
			},
			"sourceId": "nginx3.e5dda204-d1b2-11e6-a015-0242ac110005",
			"targetId": "nginx3.e5dda204-d1b2-11e6-a015-0242ac110006",
			"data": {}
		}]
	}]
}

Minimal json schema of this data type

{
  "type": "object",
  "properties": {
    "internalHostname": {
      "type": "string"
    },
    "collection_timestamp": {
      "type": "double (epoch in seconds)"
    },
    "events": {
      "type": "object",
      "properties": {
        "": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "msg_text": {
                "type": "string"
              },
              "msg_title": {
                "type",
                "string"
              },
              "tags": {
                "type": "array",
                "items": {
                  "type": "string"
                }
              },
              "timestamp": {
                "type": "int (epoch in seconds)"
              }
            },
            "required": ["msg_text", "msg_title", "tags", "timestamp"]
          }
        }
      }
    },
    "metrics": [
      "string",
      "int (epoch in seconds)",
      "double",
      {
        "type": "object",
        "properties": {
          "device_name": {
            "type": "string"
          },
          "hostname": {
            "type": "string"
          },
          "type": {
            "enum": ["gauge", "count", "rate", "counter"]
          }
        },
        "required": ["device_name", "hostname", "type"]
      }
    ],
    "service_checks": {
      "type": "array"
    }
  }
  "required": [
    "internalHostName",
    "collection_timestamp",
    "events",
    "metrics",
    "service_checks"
  ]
}

Endpoints: intake/metrics and intake/metadata

The data going to intake/ can be split up to intake/metrics and intake/metadata when setting merge_payloads=False in the emit function of class AgentPayload. The default is that the data is being merged and posted to intake/.

The split is based on payload keys , for reference see class AgentPayload in collector.py.

Example excerpt:

{
    "agentVersion": "5.12.0",
    "agent_checks": [
        [
            "redisdb",
            "redis",
            0,
            "OK",
            "",
            {
                "version": "3.2.8"
            }
        ]
    ],
    "apiKey": "none",
    "external_host_tags": {},
    "gohai": "{\"cpu\":{\"cache_size\":\"6144 KB\",\"cpu_cores\":\"4\",\"cpu_logical_processors\":\"4\",\"family\":\"6\",\"mhz\":\"2793.066\",\"model\":\"70\",\"model_name\":\"Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz\",\"stepping\":\"1\",\"vendor_id\":\"GenuineIntel\"},\"filesystem\":[{\"kb_size\":\"61891364\",\"mounted_on\":\"/\",\"name\":\"none\"},{\"kb_size\":\"1023432\",\"mounted_on\":\"/dev\",\"name\":\"tmpfs\"},{\"kb_size\":\"1023432\",\"mounted_on\":\"/sys/fs/cgroup\",\"name\":\"tmpfs\"},{\"kb_size\":\"61891364\",\"mounted_on\":\"/etc/hosts\",\"name\":\"/dev/sda2\"},{\"kb_size\":\"65536\",\"mounted_on\":\"/dev/shm\",\"name\":\"shm\"},{\"kb_size\":\"1023432\",\"mounted_on\":\"/sys/firmware\",\"name\":\"tmpfs\"}],\"gohai\":{\"build_date\":\"Mon Mar 20 15:41:36 UTC 2017\",\"git_branch\":\"last-stable\",\"git_hash\":\"bff26dd\",\"go_version\":\"go version go1.3.3 linux/amd64\"},\"memory\":{\"swap_total\":\"4095996kB\",\"total\":\"2046864kB\"},\"network\":{\"ipaddress\":\"172.17.0.2\",\"ipaddressv6\":\"fe80::42:acff:fe11:2\",\"macaddress\":\"02:42:ac:11:00:02\"},\"platform\":{\"GOOARCH\":\"amd64\",\"GOOS\":\"linux\",\"goV\":\"1.3.3\",\"hardware_platform\":\"x86_64\",\"hostname\":\"f3a70779512e\",\"kernel_name\":\"Linux\",\"kernel_release\":\"4.9.8-moby\",\"kernel_version\":\"#1 SMP Wed Feb 8 09:56:43 UTC 2017\",\"machine\":\"x86_64\",\"os\":\"GNU/Linux\",\"processor\":\"x86_64\",\"pythonV\":\"2.7.13\"}}",
    "host-tags": {},
    "meta": {
        "hostname": "f3a70779512e",
        "socket-fqdn": "f3a70779512e",
        "socket-hostname": "f3a70779512e",
        "timezones": [
            "UTC",
            "UTC"
        ]
    },
    "systemStats": {
        "cpuCores": 4,
        "machine": "x86_64",
        "nixV": [
            "Ubuntu",
            "16.04",
            "xenial"
        ],
        "platform": "linux2",
        "processor": "x86_64",
        "pythonV": "2.7.13"
    }
}

Endpoint: series

HTTP POST: /api/v1/series/?api_key=<api_key>

{
    "series": [
        {
            "device_name": null,
            "host": "mac-wytze",
            "interval": 10.0,
            "metric": "datadog.dogstatsd.packet.count",
            "points": [
                [
                    1467037580.0,
                    0
                ]
            ],
            "tags": null,
            "type": "gauge"
        },
        {
            "device_name": null,
            "host": "mac-wytze",
            "interval": 10.0,
            "metric": "datadog.dogstatsd.serialization_status",
            "points": [
                [
                    1467037593.243472,
                    0.1
                ]
            ],
            "tags": [
                "status:success"
            ],
            "type": "rate"
        }
    ]
}

Endpoint: check_run / service check

HTTP POST: /api/v1/check_run/?api_key=<api_key>

[
    {
        "check": "tomcat.can_connect",
	"host_name": "osboxes",
        "status": 0,
        "tags": [
            "instance:tomcat-localhost-12345",
            "jmx_server:localhost"
        ],
        "timestamp": 1490951770
    }
]

StatsD

The following output is generated by executing the following command multiple times on a running StatsD instance with the datadog backend installed;

echo "statsd_test:1|c" | nc -u -w0 127.0.0.1 8125

The StatsD backend can be found here: https://github.com/DataDog/statsd-datadog-backend/. The backend sends StatsD's output directly to the API.

Backend tag used: statsd:testtag

HTTP POST: /api/v1/series?api_key=<api_key>

{
    "series": [
        {
            "host": "mac-wytze",
            "metric": ".statsd.bad_lines_seen",
            "points": [
                [
                    1467105304,
                    0
                ]
            ],
            "tags": [
                "statsd:testtag"
            ],
            "type": "gauge"
        },
        {
            "host": "mac-wytze",
            "metric": ".statsd.packets_received",
            "points": [
                [
                    1467105304,
                    1.5
                ]
            ],
            "tags": [
                "statsd:testtag"
            ],
            "type": "gauge"
        },
        {
            "host": "mac-wytze",
            "metric": ".statsd.metrics_received",
            "points": [
                [
                    1467105304,
                    1.5
                ]
            ],
            "tags": [
                "statsd:testtag"
            ],
            "type": "gauge"
        },
        {
            "host": "mac-wytze",
            "metric": ".statsd_test",
            "points": [
                [
                    1467105304,
                    1.5
                ]
            ],
            "tags": [
                "statsd:testtag"
            ],
            "type": "gauge"
        },
        {
            "host": "mac-wytze",
            "metric": ".statsd.timestamp_lag",
            "points": [
                [
                    1467105304,
                    0
                ]
            ],
            "tags": [
                "statsd:testtag"
            ],
            "type": "gauge"
        }
    ]
}

Forwarder status page

The agent's forwarder exposes a status page at 127.0.0.1:17123/status/. This page shows, in html, the transactions that still need to be forwarded.

Example:

Id Size Error count Next flush
1 2063 1 2017-04-06 09:07:58
2 91 0 2017-04-06 09:07:18.918453
3 930 0 2017-04-06 09:07:18.920516
4 91 0 2017-04-06 09:07:18.922276
5 457 0 2017-04-06 09:07:33.762445
6 1686 0 2017-04-06 09:07:37.969215
7 91 0 2017-04-06 09:07:37.971299
8 455 1 2017-04-06 09:08:23
9 457 0 2017-04-06 09:07:53.774737

Exceptions in checks

Exceptions thrown in checks are logged in /var/log/collector.log. Checks are executed periodically, which is configurable in the check's configuration. Exceptions do not affect the periodically running of a check.

Topology information in the Agent

The agent's check function exposes the following functions to the user:

/* Create or update. */
self.component(instance_key, id, type, data={})
self.relation(instance_key, source_id, target_id, type, data={})

The information provided by these functions are collected in batches and are pushed to Stackstate's receiver as part of the agent data in the "topology" field. The following format is used between the Agent's collector and the Stackstate receiver:

{
  topologies: [
                {
                  start_snapshot: optional[boolean],
                  stop_snapshot: optional[boolean],  
                  instance: {
                    type: string,   //mesos
                    url: string     // http://locahost:8080/
                    ... // unique instance identification, in case of mesos an url: string field is added
                  },
                  components: [
                      {
                          externalId: string,
                          type: {
                            name: string // {docker | kubernetes}
                          },
                          data: {
                            ... // dict / struct
                          }
                      },
                      ...
                  ],
                  relations: [
                      {
                          externalId: string,
                          type: {
                            "name": string
                          },
                          sourceId: string,
                          targetId; string,
                          data: {
                            ... // dict /struct
                          }
                      },
                      ...
                  ]
              }
        ]
}