-
Notifications
You must be signed in to change notification settings - Fork 6
Home
'master' of this repository is the latest version of the sts-agent. When we want updates from the upstream dd-agent, we merge those from the upstream master into the master branch in this repository.
By convention, we git remote add upstream git@github.com:DataDog/sts-agent
.
Current differences between the dd-agent and the sts-agent can be seen via git diff remotes/upstream/master master
. (do git fetch --all
to refresh your data)
Branch references can be found throughout the build files in docker-sts-agent-build-deb-x64 and sts-agent-omnibus. Here we maintain a list of branch references which have to be updated when forking a feature branch and merging.
- docker-sts-agent-build-deb-x64/deb-x86/Dockerfile
- docker-sts-agent-build-deb-x64/deb-x86/docker_build.sh
- sts-agent-omnibus/sotware/datadog-agent.rb:23
The differences between sts-agent and the upstream dd-agent is that we want customers to come to us for support first instead of bothering upstream, so in most customer-facing outputs we want to show 'sts' and 'StackState' rather than 'dd' and 'Datadog'. Changes include:
- /etc/dd-agent should be /etc/sts-agent
- /opt/datadog-agent should be /opt/stackstate-agent
- /var/log/datadog should be /var/log/stackstate
- config file should be stackstate.conf
- url config parameter should be sts_url
Automating these updates has proven hard to automate reliably, so our current strategy is to perform these changes manually, aided by some scripts to make detecting new violations easier.
To create a Debian package, you need the docker-sts-agent-build-deb-x64 docker container.
- Clone (https://github.com/StackVista/docker-sts-agent-build-deb-x64)
- Build
docker build --no-cache -t docker-sts-agent-build-deb-x64 .
- Add
stsbuild_rsa
to the directory. Should be an rsa key with access to the stackvista github repo's in the current directory - Set permission
chmod 600 ./stsbuild_rsa
- Run the builder (two strategies, for osx or linux users)
Run the builder for linux users
docker run -e OMNIBUS_SOFTWARE_BRANCH=master
-e OMNIBUS_BRANCH=withConnBeat
-e AGENT_BRANCH=withConnBeat
-e PROJECT_DIR=sts-agent-omnibus
-v <local_dir>/pkg:/sts-agent-omnibus/pkg
-v <local_dir>/cache:/var/cache/omnibus
-v <local_dir>/sts-build-keys:/sts-build-keys
docker-sts-agent-build-deb-x64
Run the builder for osx users
There seems to be an issue with volume sharing with docker OSX and compiling datadog, that results in instable compilation or hanging builds. To avoid this problem, use docker container data sharing to keep te cache:
- Create a cache container
docker create -v /var/cache/omnibus --name omnicache debian:wheezy /bin/true
docker run -e OMNIBUS_SOFTWARE_BRANCH=master
-e OMNIBUS_BRANCH=withConnBeat
-e AGENT_BRANCH=withConnBeat
-e PROJECT_DIR=sts-agent-omnibus
-v *<customdir>*/pkg:/sts-agent-omnibus/pkg
-v *<customdir>*/sts-build-keys:/sts-build-keys
--volumes-from omnicache
docker-sts-agent-build-deb-x64
Once you have docker-sts-agent-build-deb-x64, you can use it to build for example the 'withConnBeat' branch of the agent (replace local paths with paths valid on your machine):
Changing the version requires you to set a tag on the sts-agent branch and edit config.py
's AGENT_VERSION variable. The build should detect the latest tag and use that. Also, change build_docker.sh
in the docker-sts-agent-build-deb-x64 repo to reflect the correct version.
Demo plugin that emits one event and one gauge per set interval.
~/.datadog-agent/checks.d/eventspammer.py
from checks import AgentCheck import time class EventSpammer(AgentCheck): def check(self, instance): self.log.info("emitting dummy gauge") self.gauge('eventspammer.gauge', 1) self.log.info("emitting dummy event") self.event({ 'timestamp': int(time.time()), 'source_type_name': "event_spammer_source", 'api_key' : "ABCDEFGHI", 'msg_title': "test title", 'msg_text': "test msg", 'tags': [ 'test:tag', 'tag:1' ] })
~/.datadog-agent/conf.d/eventspammer.yaml
init_config: min_collection_interval: 5 instances: [{}]
You may want to restart datadog agent.
HTTP POST to: /intake/?api=<api_key>
Note: truncated processes and metrics
{ "agentVersion": "5.8.0", "apiKey": "5f98193e83ece68c811df22174859355", "collection_timestamp": 1467037580.595086, "cpuIdle": 38.0, "cpuStolen": 0, "cpuSystem": 54.0, "cpuUser": 8.0, "cpuWait": 0, "events": { "eventspammer": [{ "api_key": "ABCDEFGHI", "msg_text": "test msg", "msg_title": "test title", "source_type_name": "event_spammer_source", "tags": [ "test:tag", "tag:1" ], "timestamp": 1467037584 }] }, "external_host_tags": {}, "host-tags": {}, "internalHostname": "mac-wytze", "ioStats": { "disk0": { "system.io.bytes_per_s": 660602.88 }, "disk2": { "system.io.bytes_per_s": 0.0 }, "disk3": { "system.io.bytes_per_s": 0.0 }, "disk4": { "system.io.bytes_per_s": 0.0 } }, "memBuffers": null, "memCached": null, "memPageTables": null, "memPhysFree": 649.0625, "memPhysPctUsable": 0.41200000000000003, "memPhysTotal": null, "memPhysUsable": 6744.97265625, "memPhysUsed": 14711.33984375, "memShared": null, "memSlab": null, "memSwapCached": null, "memSwapFree": 786.0, "memSwapPctFree": null, "memSwapTotal": null, "memSwapUsed": 1262.0, "metrics": [ [ "system.disk.total", 1467037584, 487374848.0, { "device_name": "/dev/disk1", "hostname": "mac-wytze", "type": "gauge" } ], [ "system.disk.free", 1467037584, 413437364.0, { "device_name": "/dev/disk1", "hostname": "mac-wytze", "type": "gauge" } ], [ "eventspammer.gauge", 1467037584, 1, { "hostname": "mac-wytze", "type": "gauge" } ], [ "system.net.packets_out.count", 1467037584, 470.4736842105263, { "device_name": "en0", "hostname": "mac-wytze", "type": "gauge" } ], [ "system.net.packets_in.count", 1467037584, 554.4736842105264, { "device_name": "en0", "hostname": "mac-wytze", "type": "gauge" } ] ], "os": "mac", "processes": { "apiKey": "5f98193e83ece68c811df22174859355", "host": "mac-wytze", "processes": [ [ "wytzehazenberg", "80987", "0.0", "0.1", "2503228", "9880", "s004", "S+", "2:12PM", "0:23.20", "vim" ], [ "wytzehazenberg", "79040", "0.0", "0.2", "2527308", "37844", "??", "S", "2:09PM", "0:15.82", "/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/agent.py foreground --use-local-forwarder" ], [ "wytzehazenberg", "79039", "0.0", "0.2", "2540112", "29924", "??", "S", "2:09PM", "0:11.55", "/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/ddagent.py" ], [ "wytzehazenberg", "79038", "0.0", "0.2", "2530504", "32852", "??", "S", "2:09PM", "0:11.14", "/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/agent/dogstatsd.py --use-local-forwarder" ], [ "wytzehazenberg", "79035", "0.0", "0.0", "2478400", "5540", "??", "Ss", "2:09PM", "0:01.99", "/opt/datadog-agent/embedded/bin/python /opt/datadog-agent/bin/supervisord -c /opt/datadog-agent/etc/supervisor.conf" ], [ "wytzehazenberg", "58859", "0.0", "0.0", "2435856", "680", "s009", "S+", "1:32PM", "0:00.05", "tail -f ./collector.log" ] ] }, "python": "2.7.11 (default, May 16 2016, 13:19:07) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)]", "resources": {}, "service_checks": [{ "check": "datadog.agent.check_status", "host_name": "mac-wytze", "id": 2091, "message": null, "status": 0, "tags": [ "check:ntp" ], "timestamp": 1467037584.680259 }, { "check": "datadog.agent.check_status", "host_name": "mac-wytze", "id": 2092, "message": null, "status": 0, "tags": [ "check:disk" ], "timestamp": 1467037584.681405 }, { "check": "datadog.agent.check_status", "host_name": "mac-wytze", "id": 2093, "message": null, "status": 0, "tags": [ "check:eventspammer" ], "timestamp": 1467037584.682277 }, { "check": "datadog.agent.check_status", "host_name": "mac-wytze", "id": 2094, "message": null, "status": 0, "tags": [ "check:network" ], "timestamp": 1467037584.709078 }, { "check": "datadog.agent.up", "host_name": "mac-wytze", "id": 2095, "message": null, "status": 0, "tags": null, "timestamp": 1467037584.709122 }], "system.load.1": 3.31, "system.load.15": 2.25, "system.load.5": 2.62, "system.load.norm.1": 0.41375, "system.load.norm.15": 0.28125, "system.load.norm.5": 0.3275, "system.uptime": 862103.6049060822, "uuid": "f8c341c7a50a53759d8726c8c918c0a2", "topologies": [{ "start_snapshot": true, "stop_snapshot": true, "instance": { "type": "mesos", "url": "http://localhost:5050" }, "components": [{ "externalId": "nginx3.e5dda204-d1b2-11e6-a015-0242ac110005", "type": { "name": "docker" }, "data": { "tags": ["mytag"], "ip_addresses": ["172.17.0.8"], "labels": [{ "key": "label1" }], "framework_id": "fc998b77-e2d1-4be5-b15c-1af7cddabfed-0000", "docker": { "image": "nginx", "network": "BRIDGE", "port_mappings": [{ "container_port": 31945, "host_port": 31945, "protocol": "tcp" }], "privileged": false }, "task_name": "nginx3", "slave_id": "fc998b77-e2d1-4be5-b15c-1af7cddabfed-S0" } }], "relations": [{ "externalId": "nginxapp", "type": { "name": "docker" }, "sourceId": "nginx3.e5dda204-d1b2-11e6-a015-0242ac110005", "targetId": "nginx3.e5dda204-d1b2-11e6-a015-0242ac110006", "data": {} }] }] }
Minimal json schema of this data type
{ "type": "object", "properties": { "internalHostname": { "type": "string" }, "collection_timestamp": { "type": "double (epoch in seconds)" }, "events": { "type": "object", "properties": { "": { "type": "array", "items": { "type": "object", "properties": { "msg_text": { "type": "string" }, "msg_title": { "type", "string" }, "tags": { "type": "array", "items": { "type": "string" } }, "timestamp": { "type": "int (epoch in seconds)" } }, "required": ["msg_text", "msg_title", "tags", "timestamp"] } } } }, "metrics": [ "string", "int (epoch in seconds)", "double", { "type": "object", "properties": { "device_name": { "type": "string" }, "hostname": { "type": "string" }, "type": { "enum": ["gauge", "count", "rate", "counter"] } }, "required": ["device_name", "hostname", "type"] } ], "service_checks": { "type": "array" } } "required": [ "internalHostName", "collection_timestamp", "events", "metrics", "service_checks" ] }
The data going to intake/
can be split up to intake/metrics
and intake/metadata
when setting merge_payloads=False
in the emit
function of class AgentPayload
. The default is that the data is being merged and posted to intake/
.
The split is based on payload keys , for reference see class AgentPayload
in collector.py
.
Example excerpt:
{ "agentVersion": "5.12.0", "agent_checks": [ [ "redisdb", "redis", 0, "OK", "", { "version": "3.2.8" } ] ], "apiKey": "none", "external_host_tags": {}, "gohai": "{\"cpu\":{\"cache_size\":\"6144 KB\",\"cpu_cores\":\"4\",\"cpu_logical_processors\":\"4\",\"family\":\"6\",\"mhz\":\"2793.066\",\"model\":\"70\",\"model_name\":\"Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz\",\"stepping\":\"1\",\"vendor_id\":\"GenuineIntel\"},\"filesystem\":[{\"kb_size\":\"61891364\",\"mounted_on\":\"/\",\"name\":\"none\"},{\"kb_size\":\"1023432\",\"mounted_on\":\"/dev\",\"name\":\"tmpfs\"},{\"kb_size\":\"1023432\",\"mounted_on\":\"/sys/fs/cgroup\",\"name\":\"tmpfs\"},{\"kb_size\":\"61891364\",\"mounted_on\":\"/etc/hosts\",\"name\":\"/dev/sda2\"},{\"kb_size\":\"65536\",\"mounted_on\":\"/dev/shm\",\"name\":\"shm\"},{\"kb_size\":\"1023432\",\"mounted_on\":\"/sys/firmware\",\"name\":\"tmpfs\"}],\"gohai\":{\"build_date\":\"Mon Mar 20 15:41:36 UTC 2017\",\"git_branch\":\"last-stable\",\"git_hash\":\"bff26dd\",\"go_version\":\"go version go1.3.3 linux/amd64\"},\"memory\":{\"swap_total\":\"4095996kB\",\"total\":\"2046864kB\"},\"network\":{\"ipaddress\":\"172.17.0.2\",\"ipaddressv6\":\"fe80::42:acff:fe11:2\",\"macaddress\":\"02:42:ac:11:00:02\"},\"platform\":{\"GOOARCH\":\"amd64\",\"GOOS\":\"linux\",\"goV\":\"1.3.3\",\"hardware_platform\":\"x86_64\",\"hostname\":\"f3a70779512e\",\"kernel_name\":\"Linux\",\"kernel_release\":\"4.9.8-moby\",\"kernel_version\":\"#1 SMP Wed Feb 8 09:56:43 UTC 2017\",\"machine\":\"x86_64\",\"os\":\"GNU/Linux\",\"processor\":\"x86_64\",\"pythonV\":\"2.7.13\"}}", "host-tags": {}, "meta": { "hostname": "f3a70779512e", "socket-fqdn": "f3a70779512e", "socket-hostname": "f3a70779512e", "timezones": [ "UTC", "UTC" ] }, "systemStats": { "cpuCores": 4, "machine": "x86_64", "nixV": [ "Ubuntu", "16.04", "xenial" ], "platform": "linux2", "processor": "x86_64", "pythonV": "2.7.13" } }
HTTP POST: /api/v1/series/?api_key=<api_key>
{ "series": [ { "device_name": null, "host": "mac-wytze", "interval": 10.0, "metric": "datadog.dogstatsd.packet.count", "points": [ [ 1467037580.0, 0 ] ], "tags": null, "type": "gauge" }, { "device_name": null, "host": "mac-wytze", "interval": 10.0, "metric": "datadog.dogstatsd.serialization_status", "points": [ [ 1467037593.243472, 0.1 ] ], "tags": [ "status:success" ], "type": "rate" } ] }
HTTP POST: /api/v1/check_run/?api_key=<api_key>
[ { "check": "tomcat.can_connect", "host_name": "osboxes", "status": 0, "tags": [ "instance:tomcat-localhost-12345", "jmx_server:localhost" ], "timestamp": 1490951770 } ]
The following output is generated by executing the following command multiple times on a running StatsD instance with the datadog backend installed;
echo "statsd_test:1|c" | nc -u -w0 127.0.0.1 8125
The StatsD backend can be found here: https://github.com/DataDog/statsd-datadog-backend/. The backend sends StatsD's output directly to the API.
Backend tag used: statsd:testtag
HTTP POST: /api/v1/series?api_key=<api_key>
{ "series": [ { "host": "mac-wytze", "metric": ".statsd.bad_lines_seen", "points": [ [ 1467105304, 0 ] ], "tags": [ "statsd:testtag" ], "type": "gauge" }, { "host": "mac-wytze", "metric": ".statsd.packets_received", "points": [ [ 1467105304, 1.5 ] ], "tags": [ "statsd:testtag" ], "type": "gauge" }, { "host": "mac-wytze", "metric": ".statsd.metrics_received", "points": [ [ 1467105304, 1.5 ] ], "tags": [ "statsd:testtag" ], "type": "gauge" }, { "host": "mac-wytze", "metric": ".statsd_test", "points": [ [ 1467105304, 1.5 ] ], "tags": [ "statsd:testtag" ], "type": "gauge" }, { "host": "mac-wytze", "metric": ".statsd.timestamp_lag", "points": [ [ 1467105304, 0 ] ], "tags": [ "statsd:testtag" ], "type": "gauge" } ] }
The agent's forwarder exposes a status page at 127.0.0.1:17123/status/
. This page shows, in html, the transactions that still need to be forwarded.
Example:
Id | Size | Error count | Next flush |
---|---|---|---|
1 | 2063 | 1 | 2017-04-06 09:07:58 |
2 | 91 | 0 | 2017-04-06 09:07:18.918453 |
3 | 930 | 0 | 2017-04-06 09:07:18.920516 |
4 | 91 | 0 | 2017-04-06 09:07:18.922276 |
5 | 457 | 0 | 2017-04-06 09:07:33.762445 |
6 | 1686 | 0 | 2017-04-06 09:07:37.969215 |
7 | 91 | 0 | 2017-04-06 09:07:37.971299 |
8 | 455 | 1 | 2017-04-06 09:08:23 |
9 | 457 | 0 | 2017-04-06 09:07:53.774737 |
Exceptions thrown in checks are logged in /var/log/collector.log
. Checks are executed periodically, which is configurable in the check's configuration. Exceptions do not affect the periodically running of a check.
The agent's check function exposes the following functions to the user:
/* Create or update. */ self.component(instance_key, id, type, data={}) self.relation(instance_key, source_id, target_id, type, data={})
The information provided by these functions are collected in batches and are pushed to Stackstate's receiver as part of the agent data in the "topology" field. The following format is used between the Agent's collector and the Stackstate receiver:
{ topologies: [ { start_snapshot: optional[boolean], stop_snapshot: optional[boolean], instance: { type: string, //mesos url: string // http://locahost:8080/ ... // unique instance identification, in case of mesos an url: string field is added }, components: [ { externalId: string, type: { name: string // {docker | kubernetes} }, data: { ... // dict / struct } }, ... ], relations: [ { externalId: string, type: { "name": string }, sourceId: string, targetId; string, data: { ... // dict /struct } }, ... ] } ] }