Skip to content

Metrics and Monitoring

David Gross edited this page Feb 26, 2015 · 15 revisions

Hystrix captures metrics using the HystrixRollingNumber and HystrixRollingPercentile classes in rolling windows. The rolling windows allow Hystrix to use low-latency moving windows of metrics for circuit breaker health checks and operations.

Direct Access

You can access metrics programmatically with the following calls:

HystrixCommandMetrics.getInstances()
HystrixThreadPoolMetrics.getInstances()

Metrics Event Stream

You can use the hystrix-metrics-event-stream to power the dashboard, real-time alerting, and other such use cases.

Metrics Publisher

You can publish metrics by using an implementation of HystrixMetricsPublisher.

Register your HystrixMetricsPublisher implementations by calling HystrixPlugins.registerMetricsPublisher(HystrixMetricsPublisher impl).

Hystrix includes the following implementations as hystrix-contrib modules:

The following sections explain the metrics published with those implementations:

Command Metrics

Each HystrixCommand publishes metrics with the following tags:

  • Servo Tag: "instance", Value: HystrixCommandKey.name()
  • Servo Tag: "type", Value: "HystrixCommand"

Informational and Status

  • Boolean isCircuitBreakerOpen
  • Number errorPercentage
  • Number executionSemaphorePermitsInUse
  • String commandGroup
  • Number currentTime

Cumulative and Rolling Event Counts

Cumulative counts (Counter) represent the number of events since the start of the application.

Rolling counts (Gauge) are configured by metrics.rollingStats.* properties. They are “point in time” counts representing the last x seconds (for example 10 seconds).

Event Cumulative Count (Long) Rolling Count (Number)
BAD_REQUEST countBadRequests rollingCountBadRequests
COLLAPSED countCollapsedRequests rollingCountCollapsedRequests
EMIT countEmit rollingCountEmit
EXCEPTION_THROWN countExceptionsThrown rollingCountExceptionsThrown
FAILURE countFailure rollingCountFailure
FALLBACK_EMIT countFallbackEmit rollingCountFallbackEmit
FALLBACK_FAILURE countFallbackFailure rollingCountFallbackFailure
FALLBACK_REJECTION countFallbackRejection rollingCountFallbackRejection
FALLBACK_SUCCESS countFallbackSuccess rollingCountFallbackSuccess
RESPONSE_FROM_CACHE countResponsesFromCache rollingCountResponsesFromCache
SEMAPHORE_REJECTED countSemaphoreRejected rollingCountSemaphoreRejected
SHORT_CIRCUITED countShortCircuited rollingCountShortCircuited
SUCCESS countSuccess rollingCountSuccess
THREAD_POOL_REJECTED countThreadPoolRejected rollingCountThreadPoolRejected
TIMEOUT countTimeout rollingCountTimeout

Latency Percentiles: HystrixCommand.run() Execution (Gauge)

These metrics represent percentiles of execution times for the HystrixCommand.run() method (on the child thread if using thread isolation).

These are rolling percentiles as configured by metrics.rollingPercentile.* properties.

  • Number latencyExecute_mean
  • Number latencyExecute_percentile_5
  • Number latencyExecute_percentile_25
  • Number latencyExecute_percentile_50
  • Number latencyExecute_percentile_75
  • Number latencyExecute_percentile_90
  • Number latencyExecute_percentile_99
  • Number latencyExecute_percentile_995

Latency Percentiles: End-to-End Execution (Gauge)

These metrics represent percentiles of execution times for the end-to-end execution of HystrixCommand.execute() or HystrixCommand.queue() until a response is returned (or is ready to return in case of queue()).

The purpose of this compared with the latencyExecute* percentiles is to measure the cost of thread queuing/scheduling/execution, semaphores, circuit breaker logic, and other aspects of overhead (including metrics capture itself).

These are rolling percentiles as configured by metrics.rollingPercentile.* properties.

  • Number latencyTotal_mean
  • Number latencyTotal_percentile_5
  • Number latencyTotal_percentile_25
  • Number latencyTotal_percentile_50
  • Number latencyTotal_percentile_75
  • Number latencyTotal_percentile_90
  • Number latencyTotal_percentile_99
  • Number latencyTotal_percentile_995

Property Values (Informational)

These informational metrics report the actual property values being used by the HystrixCommand. This enables you to see when a dynamic property takes effect and to confirm a property is set as expected.

  • Number propertyValue_rollingStatisticalWindowInMilliseconds
  • Number propertyValue_circuitBreakerRequestVolumeThreshold
  • Number propertyValue_circuitBreakerSleepWindowInMilliseconds
  • Number propertyValue_circuitBreakerErrorThresholdPercentage
  • Boolean propertyValue_circuitBreakerForceOpen
  • Boolean propertyValue_circuitBreakerForceClosed
  • Number propertyValue_executionIsolationThreadTimeoutInMilliseconds
  • String propertyValue_executionIsolationStrategy
  • Boolean propertyValue_metricsRollingPercentileEnabled
  • Boolean propertyValue_requestCacheEnabled
  • Boolean propertyValue_requestLogEnabled
  • Number propertyValue_executionIsolationSemaphoreMaxConcurrentRequests
  • Number propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests

ThreadPool Metrics

Each HystrixThreadPool publishes metrics with the following tags:

  • Servo Tag: "instance", Value: HystrixThreadPoolKey.name()
  • Servo Tag: "type", Value: "HystrixThreadPool"

Informational and Status

  • String name
  • Number currentTime

Rolling Counts (Gauge)

  • Number rollingMaxActiveThreads
  • Number rollingCountThreadsExecuted

Cumulative Counts (Counter)

  • Long countThreadsExecuted

ThreadPool State (Gauge)

  • Number threadActiveCount
  • Number completedTaskCount
  • Number largestPoolSize
  • Number totalTaskCount
  • Number queueSize

Property Values (Informational)

  • Number propertyValue_corePoolSize
  • Number propertyValue_keepAliveTimeInMinutes
  • Number propertyValue_queueSizeRejectionThreshold
  • Number propertyValue_maxQueueSize