Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add service tag to health metrics #2154

Closed
wants to merge 1 commit into from
Closed

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Jul 13, 2022

Currently health metrics are sent without a service tag, with only the env tag.

This means that all services in the same env will get their health metrics merged together. This makes these metrics not useful.

This PR adds the service tag, collected from c.service, to all statsd metrics generated for health metrics.

No service name being set (c.service == nil) is still supported (metrics will be send successfully without a service), but not very useful in practice.

For Runtime Metrics, this currently works because runtime metrics collect service names from spans being flushed, thus populating an internal list of services.

@marcotc marcotc added the core Involves Datadog core libraries label Jul 13, 2022
@marcotc marcotc self-assigned this Jul 13, 2022
@marcotc marcotc requested a review from a team July 13, 2022 23:58
@marcotc marcotc force-pushed the health-metrics-servie branch from 0f86f75 to 1a8ecc8 Compare July 14, 2022 00:00
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Seems reasonable codewise, but as I said in my note, I don't understand why this ever was a feature without the service tag lol.

Comment on lines +44 to +47
# Return dupes, so that the constant isn't modified,
# and defaults are unfrozen for mutation in Statsd.
super.tap do |options|
options[:tags] = options[:tags].dup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we also do the dup in Core::Runtime::Metrics#default_metric_options but is this really needed?

As far as I see it, super here is referring to Datadog::Core::Metrics::Options#default_metric_options which already dups the frozen tags so I don't think we need to on any of the child classes...

Comment on lines +48 to +59
options[:tags] << @tags
end
end

private

# Cache tag strings to avoid recreating them on every flush operation.
def compile_tags!
return unless @service

"#{Environment::Ext::TAG_SERVICE}:#{@service}".freeze
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is it me or is this code looking deceptively generic, but not actually? E.g. from a high level, there's @tags = compile_tags!, which looks generic.

But then compile_tags! is not a list but a single tag; and compile_tags! doesn't even run if no @service is around.

I would suggest picking either making this specific (use a @service_tag) or really generic (actually have a list in compile_tags!).

(I actually somewhat favor just doing @service_tag = ... in the constructor and then referring it directly in #default_metric_options).

Copy link
Contributor

@TonyCTHsu TonyCTHsu Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the chain of condition introduce unnecessary cognitive burden and this method does not deserved to be extracted as private method, since it is just a simple conditional being invoke only once during initialisation.

Comment on lines 21 to 29
def build_health_metrics(settings)
settings = settings.diagnostics.health_metrics
options = { enabled: settings.enabled }
options[:statsd] = settings.statsd unless settings.statsd.nil?
health_settings = settings.diagnostics.health_metrics

Core::Diagnostics::Health::Metrics.new(**options)
Core::Diagnostics::Health::Metrics.new(
service: settings.service,
enabled: health_settings.enabled,
statsd: health_settings.statsd
)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR baffles me in a way. That is, this feature is SO obvious, that I cannot think why we didn't have it from the get-go. Am I missing something? How would this be usable without a service tag, unless customers had exactly 1 service reporting to their account?

let(:service) { nil }

it 'does not set the statsd service tag' do
expect(statsd).to_not receive(:count).with(any_args, tags: array_including(include('service:')))
Copy link
Contributor

@TonyCTHsu TonyCTHsu Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, I would avoid using to_not with detail constraints since this leads to weaker assertion.

I would prefer simply a more deterministic assertion to prove the non-existence of the missing element.

expect(statsd).to receive(:count).with('api_requests', 1, tags: ['...', '...'])

RSpec does provided hash_excluding but not array_excluding. There is work around with negation matcher is working but not satisfying in terms of semantic.

RSpec::Matchers.define_negated_matcher :exclude, :include
expect(statsd).to receive(:count).with(any_args, tags: array_including(exclude('service:')))

Yeah... asserting something's non-existence is always kinda awkward, since the appearance could be in various form(such as 'service:nil').

@@ -3,9 +3,12 @@
require 'spec_helper'
require 'ddtrace'
require 'datadog/core/diagnostics/health'
require 'datadog/statsd'

RSpec.describe Datadog::Core::Diagnostics::Health::Metrics do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file being rename to spec/datadog/core/diagnostics/health/metrics_spec.rb?

@@ -30,6 +37,26 @@ class Metrics < Core::Metrics::Client
gauge :queue_max_length, Ext::Health::Metrics::METRIC_QUEUE_MAX_LENGTH
gauge :queue_spans, Ext::Health::Metrics::METRIC_QUEUE_SPANS
gauge :sampling_service_cache_length, Ext::Health::Metrics::METRIC_SAMPLING_SERVICE_CACHE_LENGTH

def default_metric_options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this public method tested somewhere?

I supposed it would be a better candidate for verifying service tag, compared a more integrated #api_requests test.

@marcotc marcotc closed this Oct 26, 2023
@GustavoCaso GustavoCaso deleted the health-metrics-servie branch October 27, 2023 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants