Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaggedMetricRegistry composability #81

Merged
merged 9 commits into from
Jun 26, 2018
Merged

Conversation

j-baker
Copy link
Contributor

@j-baker j-baker commented Jun 18, 2018

Frequently libraries produce metrics, but the context of usage is not known (e.g. Atlas metrics are coupled to an Atlas client, Spark metrics are tied to a job). In this case, if a caller has multiple metric registries, they can add the child metrics to the parent with a tag, so that they can be distinguished.

This is a very quick thing to throw up, but basically internally
we have a bunch of producers of the same set of metrics, and we'd
like to be able to differentiate them (with tags).

With this PR, we'd be able to have multiple metric registries,
and merge them into a single metric registry.

Please don't merge, since there are a few things I need to think
through, but wanted to make sure that this doesn't sound super
unreasonable.

@Override
public Map<MetricName, Metric> getMetrics() {
return metricRegistry.getMetrics().entrySet().stream()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could avoid the copy here if we think it's important by implementing ForwardingMap.

@@ -36,6 +36,7 @@
private static final TaggedMetricRegistry DEFAULT = new DefaultTaggedMetricRegistry();

private final Map<MetricName, Metric> registry = new ConcurrentHashMap<>();
private final Map<Map<String, String>, TaggedMetricSet> taggedRegistries = new ConcurrentHashMap<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not quite right, since will want to shove probably two MetricSets in for each AtlasDB, also probably shouldn't be passing in an empty map here. Maybe just a Set of Pair type structure is the right thing here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this feels odd and I worry about someone putting a mutable Map<String, String> in as a key and causing all kinds of chaos.

Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey James, interested to see how you're planning on using this, maybe sketch out some tests showing intent? I'm wondering if we want to have a single canonical metric registry that other partitions can feed into and auto adjust MetricName to include that partition's tags.

For example if we have service Foo with a single registry, and Atlas clients Bar and Baz, we'd inject a registry that auto adds tag client=Bar and client=Baz respectively, Atlas adds a counter qux and the underlying Foo registry has two counters qux;client=Bar and qux;client=Baz.

@@ -36,6 +36,7 @@
private static final TaggedMetricRegistry DEFAULT = new DefaultTaggedMetricRegistry();

private final Map<MetricName, Metric> registry = new ConcurrentHashMap<>();
private final Map<Map<String, String>, TaggedMetricSet> taggedRegistries = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this feels odd and I worry about someone putting a mutable Map<String, String> in as a key and causing all kinds of chaos.

@Override
public Map<MetricName, Metric> getMetrics() {
return metricRegistry.getMetrics().entrySet().stream()
.collect(toMap(entry -> MetricName.builder().safeName(entry.getKey()).build(), Map.Entry::getValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we ok to assume all metric names are safe here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - they'll get logged internally in the same way as they currently do. The reason this was called safe name was to make clear that they have to be safe - but that's a baseline requirement.

*
* @return map of registered metrics
*/
Map<MetricName, Metric> getMetrics();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at how painful the API break would be, do you have a sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not actually an API break; it's just moved into the super interface

@j-baker
Copy link
Contributor Author

j-baker commented Jun 25, 2018

hey dave - the motivation here is basically: we have multiple atlas clients and want to be able to control how they're reported.

I don't think libraries should be using static metrics. This is for a few reasons, but I've grown conviction in this workflow over time. It's painful for producers because they have to understand the metrics with regards to a lifecycle. For example, with AtlasDB they have to go through the workflows of removing metrics upon failed initialisation tasks, so as to not leave now-unused gauges around, and the complexity of this is nontrivial.

For consumers, it's painful because now the metrics that are being recorded are defined by the producer. For example, AtlasDB made the bad assumption that no one has more than one AtlasDB hanging around, and this was very very painful to change (2000 lines +-), because the metrics needed to be plumbed through (even though it was only like 100 lines of additions overall).

It's also painful for aggregators, since it makes the incremental cost of adding metrics almost zero, which means that tracking which metrics exist and are useful is essentially impossible. For example, AtlasDB by default adds around 1500 metrics. This makes it very difficult to have a coherent metric reporting story (which metrics are even valuable? which should I use?) because they're so numerous.

Copy link
Contributor

@schlosna schlosna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey James, makes sense and agreed AtlasDB should have been injecting the appropriately scoped metrics registry from the beginning.

Had one minor API question before merging.

/**
* Removes a TaggedMetricsSet added via addMetrics from this metrics set.
*/
void removeMetrics(String safeTagName, String safeTagValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API question: should we return either a boolean letting consumer know if it was removed, or the removed TaggedMetricSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now return an optional taggedmetricset to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with remove()

entry.getKey().getKey(),
entry.getKey().getValue(),
entry.getValue().getMetrics().entrySet().stream())))
.collect(toMap(Map.Entry::getKey, Map.Entry::getValue, (a, b) -> a));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you got checkstyled

ParameterNameCheck - com.palantir.tritium.metrics.registry.DefaultTaggedMetricRegistry - checkstyleMain
DefaultTaggedMetricRegistry.java:105: Parameter name 'b' must match pattern '^[a-z][a-zA-Z0-9]+$'.

ParameterNameCheck - com.palantir.tritium.metrics.registry.DefaultTaggedMetricRegistry - checkstyleMain
DefaultTaggedMetricRegistry.java:105: Parameter name 'a' must match pattern '^[a-z][a-zA-Z0-9]+$'.

@schlosna
Copy link
Contributor

Approved pending checkstyle fix

@schlosna
Copy link
Contributor

@j-baker go ahead and update the description when you're happy to squash & merge

@schlosna schlosna merged commit 586e66c into palantir:develop Jun 26, 2018
@jiahuijiang
Copy link

@j-baker Why do we only allow one tag for each metricSet here not a map of tags and their values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants