-
Notifications
You must be signed in to change notification settings - Fork 814
Agent Developer Mode
The Agent Developer Mode allows the user to collect a wide array of metrics concerning the performance of the agent itself. It provides visibility into bottlenecks when writing an AgentCheck
and when making changes to the collector core.
The developer mode can be enabled by adding to your datadog.conf
file
developer_mode: yes
Be sure to restart the agent after modifying the configuration file.
There is also an option to override the datadog.conf
setting with the --profile
command-line flag (e.g. python agent.py start --profile
). When in developer mode the following functionality is enabled in the agent:
- Metrics for collection time, emit time and CPU used are sent to Datadog on every collector run.
- The collector loop is profiled using cProfile. At an interval specified by
collector_profile_interval
in the configuration file, thepstats
output for the collector loop is dumped tolog.debug
as well as to the file./collector-stats.dmp
. - An additional check
agent_metrics
is run at the end of every collector loop. This check collects a variety of metrics about the collector's performance, and can be configured with the same interface used to configure regularAgentCheck
s. Source code for this check can be found under checks.d/agent_metrics.py
Here is an example configuration for the agent_metrics
check:
init_config:
process_metrics:
- name: get_memory_info
type: gauge
active: yes
- name: get_io_counters
type: rate
active: yes
- name: get_connections
type: gauge
active: no
instances:
[{}]
Each element in the process_metrics
list represents a single psutil.Process method that will be executed against the running collector process. The name
field specifies the name of the method, the type
field specifies the metric type (currently only gauge and rate are supported), and the active
field is a utility flag to activate/deactivate certain method calls during the check. Note the method specified in name
is executed only when:
- The method is available on the
psutil.Process
class as ofpsutil==2.1.1
- The underlying OS supports the execution of that method (e.g
get_io_counters
is not available for OS X processes)
If the agent_metrics
check cannot execute a particular method, it logs a warning and continues with its business. For debugging, the list of metrics collected in this check is available in the log (grep
for AGENT STATS
)
Metrics collected via the psutil
methods are parsed and aggregated in a namespace derived from the method name and its output. E.g. get_memory_info
is parsed to datadog.agent.collector.memory_info.rss
and datadog.agent.collector.memory_info.vms
. The logic for this parsing lives here and here. Once computed, these metrics are then aggregated and forwarded to Datadog as with any other AgentCheck
It is sometimes useful to profile individual checks to spot bottlenecks and critical paths in agent performance. When used with agent.py check
the --profile
flag dumps some interesting profiling information to stdout. Presently this consists of the following:
- Check runtime
- Memory use and Disk I/O if available
- Pstats output restricted to 20 calls.
Here is an example of what you see when profiling the network
check