Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] service metadata collection #1611

Merged
merged 2 commits into from
Jun 9, 2015
Merged

Conversation

yannmh
Copy link
Member

@yannmh yannmh commented May 11, 2015

Service metadata collection

Update 05/18

AgentCheck.service_metadata(metadata_name, value) replaces AgentCheck.svc_metadata(metadata_dict) method.
Under the hood, metadata are stored in a list, and rolled-up to a dictionnary for every instance.
Usage:

self.service_metadata('foo', "bar")

Changes

AgentCheck interface

Following methods were added to AgentCheck interface to support service metadata collection:

self.service_metadata(name, value) # Check purpose - Send metadata
self.get_service_metadata() # Collector purpose - Return a list of metadata dictionaries saved by check -if any-
                            # and clears them out of the instance's service_checks list

To send metadata from an Agent Check, decorate it with self.svc_metadata(...) statements.

def check(...)
    ...
    # Send integration metadata
    self.service_metadata('foo', "bar")
    ...

Payload refactoring

Collector payload was re-factored behind AgentPayload interface: it offers the same single payload style interface but manages, distinguishes two payloads:

  • A metadata payload related to hostname, system and services
  • A data payload that contains metrics, events, service_checks and more

Each of these payloads can be automatically submitted to a specific endpoint.

Before

payload
+------------+
|'metrics'   |
+--+---------+
|'events'    |
+------------+
|'svc_cheks' |        submit
+--+---------+          ------->     /intake
|'gohai'     |
+------------+
|'systm_stts'|
+------------+
|'meta'      |
+------------+
|'agent_chks'|
+--+---------+
|'svc_meta'  |
+--+---------+
|............|
|            |
+            +
 ............

Now:

 payload
+---------------------------------------+
| payload_data           payload_meta   |
| +------------+         +------------+ |
| |'metrics'   |         |'systm_stts'| |
| +--+---------+         +------------+ |
| |'events'    |         |'meta'      | |
| +------------+         +------------+ |
| |'svc_cheks' |         |'agent_chks'| |
| +--+---------+         +--+---------+ |
| |............|         |'svc_meta'  | |
| |            |         +--+---------+ |
| +            +         |............| |
|  ............          |            | |
| +            +         |            | |
| +------------+         +------------+ |
+-+------------+---------+------------+-+

                       submit
                        + +
                        | |
                        | |
/intake/metrics  <------+ +------->  /intake/metadata

Collector

Collector run method was reorganised to successively populate the payload by data type (it used to be more random).

Steps are:

  1. Payload skeleton
    * self._build_payload(...): only builds the payload skeleton, so it contains all of the generic payload data (no metadata).
  2. Data payload
    * Metrics
    * Events
    * Service checks
  3. Metadata payload
    * self._populate_payload_metadata(...): periodically populates the payload with metadata related to the system, host, and/or checks.

Displaying metadata

agent checkcommand

Agent check command now displays collected metadata, in addition to metrics, events and service checks.

agent info command

Set display_service_metadata to True in datadog.conf file.

...
display_service_metadata: True
...

A new Service metadata section is now displayed in the info command output whenever service metadata are collected.
screenshot 2015-05-12 10 05 35

Thoughts

Split data and metadata ?

Currently data and metadata from the agent are part of the same payload, which is processed by a single handler in Sobotka.
Metadata are extracted, directly written to Postgres when data are forwarded to Kafka.

As the volume of metadata is growing, it would be wiser to split data and metadata processing, so that postgres operations would not slow down the agent data pipeline.

First suggestion made was to have two different Sobotka handlers specific for data and metadata:
https://docs.google.com/document/d/10LAKQ1E_ez4jgTn_czFJwDkEIghGFUFsDF03LgY8QRE

Include host tags in the data payload ?

Host tags are now listed in the metadata payload only.
As it is possibly sent in a different packet (to a different endpoint), it may lead to a 'no tag' situation when a data payload is received before any metadata's.

A workaround would be to add/duplicate host tags in the data payload, but it breaks the Sobotka handler logic described in the section above.

For now

Single payload/endpoint for metrics and metadata

Let's keep it simple for now, so we can test and play with metadata support internally with the 5.4 agent release.
As the volume of metadata is still low, I can set AgentPayload to send a single data+metadata payload to the current /intake endpoint.

TODO

  • Fix flaky tests
  • Submit only one data+metadata payload to /intake endpoint (c.f. 'Single endpoint for metrics and metadata' section)
  • Write a RFC on why and how we would split agent data and metadata

@yannmh yannmh self-assigned this May 11, 2015
@yannmh yannmh added this to the 5.4.0 milestone May 11, 2015
@yannmh yannmh changed the title [core] integration metadata collection [core] service metadata collection May 12, 2015
@yannmh yannmh force-pushed the yann/integration-metadata-5.4 branch 2 times, most recently from 31bde51 to 8fc6528 Compare May 13, 2015 23:24
@@ -564,6 +586,14 @@ def run(self):
tb=traceback.format_exc()
)
instance_statuses.append(instance_status)
# Generate empty metadata if nonexistent
service_metadata_count = len(self.service_metadata)
if service_metadata_count > i + 1:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it matter ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.service_metadata is indexing integration metadata dictionnary by instance.
This mechanism makes sure that the number of metadata dictionnaries generated matches the number of instances: if an instance does not produce any meta (purposely or because it's failing), we complete self.service_metadata adding some padding.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, i don't really like the fact that it prevents from calling

self.svc_metadata

more than once per instance. This is bad pattern i think.

@yannmh yannmh force-pushed the yann/integration-metadata-5.4 branch 2 times, most recently from 78458b3 to 9d80cb3 Compare May 18, 2015 17:58
@yannmh yannmh force-pushed the yann/integration-metadata-5.4 branch from 9d80cb3 to 98457ed Compare May 26, 2015 18:34
@degemer
Copy link
Member

degemer commented May 26, 2015

Do you have a "standard" metadata payload ?
I noticed that on your info command screenshot, one of the version is a list of int, and the other one is a list of string.
I think it would be better to enforce a "standard" metadata payload (as we already had some weird type issues with the collector), or at least to add explicitly the advised types in the documentation.

@remh
Copy link

remh commented May 29, 2015

Failing on Flake8, can you fix please ?

@yannmh yannmh force-pushed the yann/integration-metadata-5.4 branch 8 times, most recently from ef87bfa to c751e15 Compare June 5, 2015 17:06
@yannmh
Copy link
Member Author

yannmh commented Jun 5, 2015

Test are passing. @remh Shall we merge it ?

@remh remh assigned remh and unassigned yannmh Jun 9, 2015
yannmh added 2 commits June 9, 2015 11:43
* Interface for collecting metadata
* Splitting payload to distinguish data and metadata
* Two endpoints: /intake/metrics and /intake/metadata
* '_collect_metadata' functions for a few checks
* Option to display metadata locally in `info` command
* Tests

Rebased on 5.2.1 agent release
`AgentCheck.service_metadata(metadata_name, value)` replaces `AgentCheck.svc_metadata(metadata_dict)` method.
Under the hood, metadata are stored in a list, and rolled-up to a dictionnary for every instance.
Usage:
```
self.service_metadata('foo', "bar")
```
@yannmh yannmh force-pushed the yann/integration-metadata-5.4 branch from c751e15 to 5ab8473 Compare June 9, 2015 15:44
yannmh added a commit that referenced this pull request Jun 9, 2015
@yannmh yannmh merged commit eb19ad5 into master Jun 9, 2015
@yannmh yannmh deleted the yann/integration-metadata-5.4 branch June 9, 2015 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants