Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use metric-schema for instrumentation metrics #1962

Merged
merged 2 commits into from
Jun 15, 2024

Conversation

pkoenig10
Copy link
Member

@pkoenig10 pkoenig10 commented Jun 13, 2024

Before this PR

The tagged metrics produced by instrumented classes are cumbersome to use because each instrumented class uses a unique metric name. These metrics names are the fully qualified class names, which can be quite long making them difficult to use in metric query UIs.

It is also particularly cumbersome given our internal metrics filtering, since metrics for newly instrumented classes won't be available until they are used in a dashboard.

After this PR

Instrumentation metrics are defined in metric-schema. There is a single instrumentation metric and the interface name, method name, and result are tags.

Possible downsides?

Dashboards will need to be updated. I don't think users tend to have these instrumentation metrics on dashboards - they are more commonly used for one-off investigations.

@changelog-app
Copy link

changelog-app bot commented Jun 13, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Instrumentation metrics are now defined using metric-schema. There is a single metric name for all instrumented classes with tags for the service name, endpoint, and result.

Check the box to generate changelog(s)

  • Generate changelog entry

String serviceName = Strings.isNullOrEmpty(prefix) ? interfaceClass.getName() : prefix;
this.handlers.add(new TaggedMetricsServiceInvocationEventHandler(metricRegistry, serviceName));
this.handlers.add(new TaggedMetricsServiceInvocationEventHandler(
metricRegistry, Strings.isNullOrEmpty(serviceName) ? interfaceClass.getSimpleName() : serviceName));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've intentionally changed the default service name from interfaceClass.getName() to interfaceClass.getSimpleName() to more closely match the behavior of similar instrumentation elsewhere, such as the Conjure endpoint instrumentation.


@ExtendWith(SystemStubsExtension.class)
@SuppressWarnings("NullAway")
public class TritiumTest {
Copy link
Member Author

@pkoenig10 pkoenig10 Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem duplicative, so I've removed them.

@schlosna
Copy link
Contributor

Directionally I'm supportive of this. @pkoenig10 do you have a sense of the migration lift for shifting from existing to new metric names & tags for monitors & queries?

@pkoenig10
Copy link
Member Author

pkoenig10 commented Jun 14, 2024

It will definitely be manual work for dashboards.

I'm fairly confident that no one is using these metrics in monitors. Doing so would be fairly brittle, as the metric names could change after a refactor.

Searching internally I don't see any monitor definitions that appear to be using one of these metrics. It's a bit difficult since there are a number of legacy metrics that use class names, but which aren't these instrumented metrics. I used the following Sourcegraph query (case-sensitive and regex):

metric.*com\.palantir\..*\.[A-Z].+\.\w.+ and endpoint and file:.*\.hype

@bulldozer-bot bulldozer-bot bot merged commit 371e61b into develop Jun 15, 2024
5 checks passed
@bulldozer-bot bulldozer-bot bot deleted the pkoenig/instrumentationMetrics branch June 15, 2024 12:45
@svc-autorelease
Copy link
Collaborator

Released 0.89.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants