Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[all] OpenTelemetry metrics integration in venice-router #1303

Merged
merged 19 commits into from
Nov 26, 2024

Conversation

m-nagarajan
Copy link
Contributor

@m-nagarajan m-nagarajan commented Nov 14, 2024

Summary

Initial integration of openTelemetry in venice-router.

Changes:

  1. Before this change, MetricsRepository holding MetricsConfig was passed around from service creation throughout the service code to carrying the config and the metrics. Introduced VeniceMetricsRepository to extend MetricsRepository and also hold VeniceOpenTelemetryMetricsRepository. Introduced VeniceMetricsConfig to include tehuti's MetricConfig and Otel config. Either MetricsRepository or VeniceMetricsRepository can be passed in during router service creation and if VeniceMetricsRepository is used and if otel configs are enabled, otel metrics will be used along with tehuti metrics.
  2. VeniceOpenTelemetryMetricsRepository holds histogramMap and counterMap to create a counter only once regardless for multiple request types and store name: Those should be dimensions instead.
  3. Added new dimensions:
venice.store.name
venice.cluster.name
venice.request.method
http.response.status_code
http.response.status_code_category
venice.response.status_code_category
venice.request.validation_outcome
venice.client.type
venice.request.retry_type
venice.request.retry_abort_reason
  1. added new otel metrics for some of the existing router metrics:
venice.router.call_time
venice.router.incoming_call_count
venice.router.call_count
venice.router.retry_count
venice.router.aborted_retry_count
venice.router.retry_delay
venice.router.call_key_count
  1. introduced configs to enable/disable this feature and to export the metrics
    otel.venice.enabled => To enable/disable opentelemetry in venice
    otel.venice.export.to.log => Export the metrics to logs, for debug purposes
    otel.venice.export.to.endpoint => Export metrics to the configured endpoint

otel.venice.metrics.naming.format => VeniceOpenTelemetryMetricNamingFormat => snake_case (default), pascal_case, camel_case
otel.venice.custom.dimensions.map => Can be configured to pass in custom dimensions or system dimensions.
It also uses existing otel configs to configure the endpoints like below and other opentelemetry supported configs

otel.exporter.otlp.metrics.protocol
otel.exporter.otlp.metrics.endpoint
otel.exporter.otlp.metrics.temporality.preference
otel.exporter.otlp.metrics.headers
otel.exporter.otlp.metrics.default.histogram.aggregation
otel.exporter.otlp.metrics.default.histogram.aggregation.max.scale
otel.exporter.otlp.metrics.default.histogram.aggregation.max.buckets

Optimizations in line for the next iteration:

  1. Prepopulate the additional dimensions rather than populating it during record()
  2. Transformer for the metric/dimension names rather than just predefined formats
  3. Revisit the structuring of MetricEntityState to not have to do a lookup of tehuti metric everytime record() is called
  4. Revisit otel histogram default maxScale and maxBuckets config
  5. Skip or Noop for otel record() for pre aggregations in code. For instance: if storeName is total or total.<clusterName>

How was this PR tested?

unit tests

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.
    Introduces OpenTelemetry metrics into the Venice world. Tehuti Instrumentation will work as it is and if Otel is enabled, Otel metrics will also be emitted as per config on top of the existing tehuti metrics.

@nisargthakkar
Copy link
Contributor

Should the PR title (and commit message) change logging to metrics?

@m-nagarajan m-nagarajan changed the title [router] OpenTelemetry logging integration in venice-router [router] OpenTelemetry metrics integration in venice-router Nov 14, 2024
Copy link
Contributor

@nisargthakkar nisargthakkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did one pass of this. Super excited for this. Great job 🚀

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished a first pass on this PR... There may be some comments that we choose to not address, or maybe not address yet, but that we still want to get to later. Please take a look and LMK what you think.

@m-nagarajan m-nagarajan changed the title [router] OpenTelemetry metrics integration in venice-router [all] OpenTelemetry metrics integration in venice-router Nov 18, 2024
Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another quick pass on the latest update... thanks for iterating!

Copy link
Contributor

@KaiSernLim KaiSernLim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited for this change to go in.

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more pass. I think this is moving in the right direction! Thanks for iterating!

Copy link
Contributor Author

@m-nagarajan m-nagarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews @nisargthakkar @FelixGV @KaiSernLim. Addressed some of the changes and left some replies.

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more round... Thanks for continuing to refine this PR. I think it would be good to meet, which might help speed up iterations.

Copy link
Contributor Author

@m-nagarajan m-nagarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continuous feedback. Addressed some comments.

Copy link
Contributor

@nisargthakkar nisargthakkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments. But I still need to go through the changes

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Manoj. Just finished one more pass on the latest diff...

I think the most important thing to fix/refactor now is the enum carrying static mutable state. Besides this, it seems we are pretty close to merging for v1 of this project.

Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more round... getting closer and closer! Thanks for continuing to iterate!

Copy link
Contributor Author

@m-nagarajan m-nagarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Felix, for the reviews.

FelixGV
FelixGV previously approved these changes Nov 26, 2024
Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good to go! Thanks for all the effort going into this first milestone of OTel integration! Looking forward to the next steps!

Copy link
Contributor Author

@m-nagarajan m-nagarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FelixGV for the review. Addressed the last nit comments.

@m-nagarajan m-nagarajan requested a review from FelixGV November 26, 2024 21:30
Copy link
Contributor

@FelixGV FelixGV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks Manoj!

@m-nagarajan m-nagarajan merged commit ce23e38 into linkedin:main Nov 26, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants