-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaggedMetricRegistry composability #81
Conversation
This is a very quick thing to throw up, but basically internally we have a bunch of producers of the same set of metrics, and we'd like to be able to differentiate them (with tags). With this PR, we'd be able to have multiple metric registries, and merge them into a single metric registry. Please don't merge, since there are a few things I need to think through, but wanted to make sure that this doesn't sound super unreasonable.
|
||
@Override | ||
public Map<MetricName, Metric> getMetrics() { | ||
return metricRegistry.getMetrics().entrySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could avoid the copy here if we think it's important by implementing ForwardingMap.
@@ -36,6 +36,7 @@ | |||
private static final TaggedMetricRegistry DEFAULT = new DefaultTaggedMetricRegistry(); | |||
|
|||
private final Map<MetricName, Metric> registry = new ConcurrentHashMap<>(); | |||
private final Map<Map<String, String>, TaggedMetricSet> taggedRegistries = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not quite right, since will want to shove probably two MetricSets in for each AtlasDB, also probably shouldn't be passing in an empty map here. Maybe just a Set of Pair type structure is the right thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this feels odd and I worry about someone putting a mutable Map<String, String>
in as a key and causing all kinds of chaos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey James, interested to see how you're planning on using this, maybe sketch out some tests showing intent? I'm wondering if we want to have a single canonical metric registry that other partitions can feed into and auto adjust MetricName to include that partition's tags.
For example if we have service Foo with a single registry, and Atlas clients Bar and Baz, we'd inject a registry that auto adds tag client=Bar
and client=Baz
respectively, Atlas adds a counter qux
and the underlying Foo registry has two counters qux;client=Bar
and qux;client=Baz
.
@@ -36,6 +36,7 @@ | |||
private static final TaggedMetricRegistry DEFAULT = new DefaultTaggedMetricRegistry(); | |||
|
|||
private final Map<MetricName, Metric> registry = new ConcurrentHashMap<>(); | |||
private final Map<Map<String, String>, TaggedMetricSet> taggedRegistries = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this feels odd and I worry about someone putting a mutable Map<String, String>
in as a key and causing all kinds of chaos.
@Override | ||
public Map<MetricName, Metric> getMetrics() { | ||
return metricRegistry.getMetrics().entrySet().stream() | ||
.collect(toMap(entry -> MetricName.builder().safeName(entry.getKey()).build(), Map.Entry::getValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we ok to assume all metric names are safe here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - they'll get logged internally in the same way as they currently do. The reason this was called safe name was to make clear that they have to be safe - but that's a baseline requirement.
* | ||
* @return map of registered metrics | ||
*/ | ||
Map<MetricName, Metric> getMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at how painful the API break would be, do you have a sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not actually an API break; it's just moved into the super interface
hey dave - the motivation here is basically: we have multiple atlas clients and want to be able to control how they're reported. I don't think libraries should be using static metrics. This is for a few reasons, but I've grown conviction in this workflow over time. It's painful for producers because they have to understand the metrics with regards to a lifecycle. For example, with AtlasDB they have to go through the workflows of removing metrics upon failed initialisation tasks, so as to not leave now-unused gauges around, and the complexity of this is nontrivial. For consumers, it's painful because now the metrics that are being recorded are defined by the producer. For example, AtlasDB made the bad assumption that no one has more than one AtlasDB hanging around, and this was very very painful to change (2000 lines +-), because the metrics needed to be plumbed through (even though it was only like 100 lines of additions overall). It's also painful for aggregators, since it makes the incremental cost of adding metrics almost zero, which means that tracking which metrics exist and are useful is essentially impossible. For example, AtlasDB by default adds around 1500 metrics. This makes it very difficult to have a coherent metric reporting story (which metrics are even valuable? which should I use?) because they're so numerous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey James, makes sense and agreed AtlasDB should have been injecting the appropriately scoped metrics registry from the beginning.
Had one minor API question before merging.
/** | ||
* Removes a TaggedMetricsSet added via addMetrics from this metrics set. | ||
*/ | ||
void removeMetrics(String safeTagName, String safeTagValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API question: should we return either a boolean
letting consumer know if it was removed, or the removed TaggedMetricSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now return an optional taggedmetricset to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with remove()
entry.getKey().getKey(), | ||
entry.getKey().getValue(), | ||
entry.getValue().getMetrics().entrySet().stream()))) | ||
.collect(toMap(Map.Entry::getKey, Map.Entry::getValue, (a, b) -> a)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you got checkstyled
ParameterNameCheck - com.palantir.tritium.metrics.registry.DefaultTaggedMetricRegistry - checkstyleMain
DefaultTaggedMetricRegistry.java:105: Parameter name 'b' must match pattern '^[a-z][a-zA-Z0-9]+$'.
ParameterNameCheck - com.palantir.tritium.metrics.registry.DefaultTaggedMetricRegistry - checkstyleMain
DefaultTaggedMetricRegistry.java:105: Parameter name 'a' must match pattern '^[a-z][a-zA-Z0-9]+$'.
Approved pending checkstyle fix |
@j-baker go ahead and update the description when you're happy to squash & merge |
@j-baker Why do we only allow one tag for each metricSet here not a map of tags and their values? |
Frequently libraries produce metrics, but the context of usage is not known (e.g. Atlas metrics are coupled to an Atlas client, Spark metrics are tied to a job). In this case, if a caller has multiple metric registries, they can add the child metrics to the parent with a tag, so that they can be distinguished.