Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extended key performance indicators metrics support #3021

Merged
merged 2 commits into from
May 20, 2021
Merged

Add extended key performance indicators metrics support #3021

merged 2 commits into from
May 20, 2021

Conversation

tjquinno
Copy link
Member

@tjquinno tjquinno commented May 15, 2021

Resolves #2688
Resolves #2689

Overview

A key performance indicator (KPI) can be useful not only for measuring the performance of a server but also for building application-specific health checks using the KPI.

Existing ("basic") KPI metrics and implementation

Previously, in any app including metrics/metrics, Helidon has published two KPI metrics whenever the app includes the module metrics/metrics:

  1. counter - Counter of the total number of requests received by the server
  2. meter - Meter (rate) for above count.

The full metric names are the routing name for the routing (if any) plus request. so, typically, users see requests.count and requests.meter.

Recall that a Meter reports an overall count, mean rate, and rates over the previous 1, 5, and 15-minute intervals.

These were created in MetricsSupport#configureVendorMetrics and a handler lambda declared in that same method would update the metrics before invoking request.next().

New ("extended") KPI metrics

The extended metrics are disabled by default, enabled by configuration or builder methods.

  1. inFlight - ConcurrentGauge of requests actively being processed at a given moment
  2. load - Meter of inFlight
  3. longRunning -Meter - requests which take longer than a configurable threshold to complete their processing
  4. deferred - Meter - received requests were not processed immediately

Some requests (for example, but not limited to, those executed by Jersey) might be deferred--received by Helidon but then delayed for some reason (e.g, queued due to a shortage of available threads) before actually being processed. This means there can be a difference between when Helidon receives a request and when it starts processing the request.

Similarly, some requests (again, for example, but not limited to, Jersey ones) might complete their work asynchronously, after the user code has invoked request.next() and, therefore, separate from when the MetricsSupport handler's call to request.next() returns.

Some of the new KPI metrics require action after as well as before each request is handled and processed.

These complexities lead to code changes that are a little more more extensive and complicated than might be expected at first glance.

Design

Introduce two abstractions:

  • A KPI metrics-related Context which tracks the key events in a request's life cycle that are relevant to KPI metrics
  • A KPI Metrics interface which encapsulates whether the basic or extended KPI metrics are set up and how to update them at various stages in the request's lifecycle

Make some small changes in the current request workflow:

  • microprofile/server ServerCdiExtension places a new instance of the KPI Context into the request's context store.
  • webserver/jersey JerseySupport, just before it submits each request to Jersey for processing, gets the KPI Context and, if found, reports to it that request processing is starting.
  • MetricsSupport, in its existing KPI-related handler:
    • retrieves the KPI Context from the request's request context, or, if there is no KPI Context present, creates one;
    • obtains the correct impl of the Metrics abstraction mentioned above;
    • invokes methods on the KPI Context to report the handling and processing of the request, passing the Metrics object mentioned just above.

This way, with no change needed in an MP app's code, the KPI metrics distinguish between Helidon's handling vs. processing of JAX-RS requests. By default, in SE apps Helidon treats the start of handling the request as the start of processing the request as well. An SE developer has the option of changing the app code slightly so the KPI metrics reflect the distinction between handling and processing. (See the Other Notes section below.)

The basic or extended KPI metrics are registered in the vendor registry when MetricsSupport instantiates the Metrics abstraction.

Changed components

webserver/webserver

  • New interface KeyPerformanceIndicatorSupport - defines the KPI behavior through nested interfaces
    • Metrics- abstracts the creation and collection of original (simple) and new (extended) KPI metrics
    • Context - abstracts the stages in a request's life cycle that are relevant to KPI metrics
      • Methods invoked when a request is received, work started, work completed
      • Static factory method
    • DeferrableRequestContext - sub-interface of Context
      • Adds a method invoked when processing actually begins on a deferrable request
      • Static factory method
  • New package-private class KeyPerformanceIndicatorContextImpls - impls for the above Context interfaces.

The specific type of Context needed for a request varies from request to request.
Callers of the Context factory methods always know which type they need.

webserver/jersey

  • Existing JerseySupport class now, just before starting the processing of the request, retrieves the KPI DeferrableRequestContext from the request Context and invokes requestProcessingStarted.

JerseySupport always invokes the DeferrableRequestContext if it's present. If metrics/metrics is not on the class or module path, the DeferrableRequestContext#requestProcessingStarted invocation done here acts as a no-op.

metrics/metrics

  • New interface KeyPerformanceIndicatorMetricsSettings
    A POJO which holds all the configurable settings related to KPI metrics, currently:

    • whether the extended KPI metrics should be used, and
    • what threshold defines a "long-running" request

    A buildable object which encapsulates the KPI metrics settings (whether extended metrics are enabled and the threshold for long-running requests). The builder is used as a "sub-builder" for MetricsSupport.

  • KeyPerformanceIndicatorMetricsSettingsImpl - impl for above

  • KeyPerformanceIndicatorMetricsImpls

    • Basic and extended implementations of the KPI Metrics abstraction
    • A factory method for the above
  • MetricsSupport

    • Enhance Builder and constructor to incorporate KPI metrics config settings
    • In configureVendorMetrics
      • Gets the correct instance of Metrics via the KeyPerformanceIndicatorMetricsImpls factory method
      • In the existing metrics-updating handler in configureVendorMetrics:
        • Retrieves the KPI Context from the request Context or creates a suitable one
        • Invokes requestHandlingStarted, requestProcessingCompleted, and requestHandlingCompleted from appropriate points. This handler cannot invoke requestProcessingStarted because the request's processing might be deferred by a handler later in the chain.

microprofile/metrics

Revise and add tests to check the extended KPI metrics.

microprofile/server

ServerCdiExtension now adds a new handler to the handler chain. The handler adds a new DeferrableRequestContext to each app's handler chain in a position earlier than the handler added by MetricsSupport.

This means:

  • The handler added by MetricsSupport will find that Context and use it, rather than creating and using a non-extended one.
  • JerseySupport will also find that DeferrableRequestContext and use it to track when the processing of the request begins.

examples/metrics/kpi

New example app showing config and explicit set-up of extended KPI metrics.

docs

Updates to the SE and MP metrics guides, involving refactoring duplicated text into a new common file.

Other notes

Non-Jersey apps with deferred requests

Users can write their own non-Jersey apps which defer processing of requests. To work correctly with the enhanced KPI support, the developer needs to add code to:

  • For each relevant routing path, insert an early handler which adds a DeferrableRequestContext to the request's Context (so MetricsSupport will use that KPI Context when it updates the KPI metrics), and
  • During request processing, retrieve that context and invoke its requestProcessingStarted() method at the appropriate moment.

Note that if developers do not make these changes, nothing fails. The KPI metrics will not accurately reflect deferred requests; it will seem as if all requests are processed immediately upon receipt--the default SE behavior.

SE apps using JerseySupport

Although we do not promote it (and coming changes in other technologies might remove this use case), developers can currently use JerseySupport in SE apps. Such apps can benefit from the KPI metrics improvements.

To do so, these apps must add a handler to each routing that should participate in extended KPI metrics (as just described above) which adds a new DeferrableRequestContext to the request's context store. The application must add that handler earlier in the handler chain than it registers the MetricsSupport instance as a service.

@tjquinno tjquinno self-assigned this May 15, 2021
@tjquinno tjquinno changed the title Add extended KPI support, test, example, doc Add extended key performance indicators metrics support May 15, 2021
----

NOTE: You cannot get the individual fields of a metric. For example, you cannot target http://localhost:8080/metrics/vendor/requests.meter.count.
// end::get-single-metric[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is a good addition. I remember seeing a question from a user about this.

which to use.
You would typically write any given application to use only one of the approaches.

## Build and run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not a specific comment for this readme, just an observation]
Title case is typically used for H1 and H2 while sentence case is reserved for H3/4. I see that this is done differently all over the site and repo, but I think we could implement some standards going forward. Need to speak to Joe, Romain and Dmitry.

@tjquinno tjquinno merged commit 3de77e6 into helidon-io:master May 20, 2021
@tjquinno tjquinno deleted the kpi branch May 20, 2021 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants