Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry to increase observability #5155

Closed
girishc13 opened this issue Sep 9, 2022 · 1 comment · Fixed by #5175
Closed

Add OpenTelemetry to increase observability #5155

girishc13 opened this issue Sep 9, 2022 · 1 comment · Fixed by #5175

Comments

@girishc13
Copy link
Contributor

Describe the feature

Improve observability using OpenTelemetry and the available sdk implementations to

  • enable tracing of requests
  • standardize the already available prometheus metrics collection
  • API is need to enable measurement and SDK is required to collect and export metrics

Standard API's are available for monitoring different operations with varying granularity which reduces the effort of using different metrics collection and aggregation vendors. The feature is optional and can be independently configured by users depending on the telemetry collector implementation.

Your proposal

Add OpenTelemetry API's and standard SDK's for:

  • tracing network requests within the flow ecosystem
  • convert the existing prometheus collection and export to the new open telemetry compatible metrics collector
  • allow users to better tracks the operation in the executors

Available packages:


Environment

Screenshots

@girishc13
Copy link
Contributor Author

girishc13 commented Sep 19, 2022

Below some points of consideration for introducing the first version of OpenTelemetry support.

Configuration Options

1. Use JINA_ENABLE_OTEL_TRACING to enable tracing everywhere.
2. Use JINA_ENABLE_OTEL_METRICS to enable metrics everywhere.
3. The use could overwrite the environment variables at the Gateway or Executor level.

  1. Provide pod level parser options:
    • '--opentelemetry-tracing' to enable tracing.
    • '--opentelemetry-metrics' to enable metrics.
  2. Add the above two options to the client parser to enable OpenTelemetry tracing and metrics at the client level.

Package

  • Name the package instrumentation to provide a clear separation between existing telemetry and prometheus client.
    - This package will provide the global TRACER and METER classes that will:
    • provide helper methods to create a span from a parent span if exists otherwise create a stand alone span.
    • provide helper methods to create instruments from the metrics provider.
    • The package will contain the InstumentationMixin which instantiates the tracer and the metrics providers based of the self.args argument. This MixIn can be added to any method or operation that wants to create a trace or measure an operation.
    • The InstumentationMixin has been added to the BaseClient, AsyncNewLoopRuntime which is used as a base for Client, Gateway and Runtime abstractions.
    • Further, the InstumentationMixin will provide static objects and methods for grpc.aio interceptors for tracing grpc server and channels. The grpc.aio interceptor are provided in the instrumentation package because the official opentelemtry-python contrib doesn't yet support implementations for grpc.aio.Server abstractions. These can be removed once the contrib package adds the required support.

Default Attributes

  1. Add OTEL defined semantic attributes by default.
    1. https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/http/
    2. https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/rpc/
    3. https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/
  2. For later:
    1. we can configure some default jcloud cluster deployment identifiers.
    2. we can allow users to add additional global attributes by parsing environment variables with a prefix like OTEL_CUSTOM_ATTRIBUTE_APPLICATION_ID=lottiefiles which will add a global tracing attribute as APPLICATION_ID=lottiefiles.
    3. docker image name and tag?
  3. Shoud I include the current telemetry info on the tracing and metrics provider. This information will be added automatically to all spans created from the TRACER and METER objects?

Default Tracing in a Flow

  1. Trace requests in the request handler by ensuring that the parent span is properly propagated to the executor. Communications with the Gateway → Executor or Executor → Executor must be covered by default.
  2. Ensure that http,grpc,websocket requests (based on the Gateway) attributes from the Client → Gateway are correctly propagated.
  3. Provide the parent span from the request handler to the request method. The user must add code to cover any additional operations within the requests method using the provided helper methods to ensure correct propagation.

Exporter Configuration

  1. Default OTEL trace exporter configurations can be provided as per OTEL recommendations.
  2. Default Prometheus metrics exporter can be provided as per OTEL recommendations.
  3. Use standard yaml parsers provided in the sdk.

Documentation

  • New page?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant