-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Metrics and Monitoring
Hystrix captures metrics using the HystrixRollingNumber
and HystrixRollingPercentile
classes in rolling windows. The rolling windows allow Hystrix to use low-latency moving windows of metrics for circuit breaker health checks and operations.
You can access metrics programmatically with the following calls:
HystrixCommandMetrics.getInstances()
HystrixThreadPoolMetrics.getInstances()
You can use the hystrix-metrics-event-stream to power the dashboard, real-time alerting, and other such use cases.
You can publish metrics by using an implementation of HystrixMetricsPublisher.
Register your HystrixMetricsPublisher
implementations by calling HystrixPlugins.registerMetricsPublisher(HystrixMetricsPublisher impl).
Hystrix includes the following implementations as hystrix-contrib
modules:
- Netflix Servo: hystrix-servo-metrics-publisher
- Yammer Metrics: hystrix-yammer-metrics-publisher
The following sections explain the metrics published with those implementations:
Each HystrixCommand publishes metrics with the following tags:
- Servo Tag:
"instance"
, Value:HystrixCommandKey.name()
- Servo Tag:
"type"
, Value:"HystrixCommand"
-
Boolean
isCircuitBreakerOpen
-
Number
errorPercentage
-
Number
executionSemaphorePermitsInUse
-
String
commandGroup
-
Number
currentTime
Cumulative counts (Counter
) represent the number of events since the start of the application.
Rolling counts (Gauge
) are configured by metrics.rollingStats.* properties. They are “point in time” counts representing the last x seconds (for example 10 seconds).
Event | Cumulative Count (Long) | Rolling Count (Number) |
---|---|---|
BAD_REQUEST |
countBadRequests |
rollingCountBadRequests |
COLLAPSED |
countCollapsedRequests |
rollingCountCollapsedRequests |
EMIT |
countEmit |
rollingCountEmit |
EXCEPTION_THROWN |
countExceptionsThrown |
rollingCountExceptionsThrown |
FAILURE |
countFailure |
rollingCountFailure |
FALLBACK_EMIT |
countFallbackEmit |
rollingCountFallbackEmit |
FALLBACK_FAILURE |
countFallbackFailure |
rollingCountFallbackFailure |
FALLBACK_REJECTION |
countFallbackRejection |
rollingCountFallbackRejection |
FALLBACK_SUCCESS |
countFallbackSuccess |
rollingCountFallbackSuccess |
RESPONSE_FROM_CACHE |
countResponsesFromCache |
rollingCountResponsesFromCache |
SEMAPHORE_REJECTED |
countSemaphoreRejected |
rollingCountSemaphoreRejected |
SHORT_CIRCUITED |
countShortCircuited |
rollingCountShortCircuited |
SUCCESS |
countSuccess |
rollingCountSuccess |
THREAD_POOL_REJECTED |
countThreadPoolRejected |
rollingCountThreadPoolRejected |
TIMEOUT |
countTimeout |
rollingCountTimeout |
Latency Percentiles: HystrixCommand.run() Execution (Gauge)
These metrics represent percentiles of execution times for the HystrixCommand.run() method (on the child thread if using thread isolation).
These are rolling percentiles as configured by metrics.rollingPercentile.* properties.
-
Number
latencyExecute_mean
-
Number
latencyExecute_percentile_5
-
Number
latencyExecute_percentile_25
-
Number
latencyExecute_percentile_50
-
Number
latencyExecute_percentile_75
-
Number
latencyExecute_percentile_90
-
Number
latencyExecute_percentile_99
-
Number
latencyExecute_percentile_995
Latency Percentiles: End-to-End Execution (Gauge)
These metrics represent percentiles of execution times for the end-to-end execution of HystrixCommand.execute() or HystrixCommand.queue() until a response is returned (or is ready to return in case of queue()).
The purpose of this compared with the latencyExecute*
percentiles is to measure the cost of thread queuing/scheduling/execution, semaphores, circuit breaker logic, and other aspects of overhead (including metrics capture itself).
These are rolling percentiles as configured by metrics.rollingPercentile.* properties.
-
Number
latencyTotal_mean
-
Number
latencyTotal_percentile_5
-
Number
latencyTotal_percentile_25
-
Number
latencyTotal_percentile_50
-
Number
latencyTotal_percentile_75
-
Number
latencyTotal_percentile_90
-
Number
latencyTotal_percentile_99
-
Number
latencyTotal_percentile_995
Property Values (Informational)
These informational metrics report the actual property values being used by the HystrixCommand. This enables you to see when a dynamic property takes effect and to confirm a property is set as expected.
-
Number
propertyValue_rollingStatisticalWindowInMilliseconds
-
Number
propertyValue_circuitBreakerRequestVolumeThreshold
-
Number
propertyValue_circuitBreakerSleepWindowInMilliseconds
-
Number
propertyValue_circuitBreakerErrorThresholdPercentage
-
Boolean
propertyValue_circuitBreakerForceOpen
-
Boolean
propertyValue_circuitBreakerForceClosed
-
Number
propertyValue_executionIsolationThreadTimeoutInMilliseconds
-
String
propertyValue_executionIsolationStrategy
-
Boolean
propertyValue_metricsRollingPercentileEnabled
-
Boolean
propertyValue_requestCacheEnabled
-
Boolean
propertyValue_requestLogEnabled
-
Number
propertyValue_executionIsolationSemaphoreMaxConcurrentRequests
-
Number
propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests
Each HystrixThreadPool publishes metrics with the following tags:
- Servo Tag:
"instance"
, Value:HystrixThreadPoolKey.name()
- Servo Tag:
"type"
, Value:"HystrixThreadPool"
-
String
name
-
Number
currentTime
Rolling Counts (Gauge)
-
Number
rollingMaxActiveThreads
-
Number
rollingCountThreadsExecuted
Cumulative Counts (Counter)
-
Long
countThreadsExecuted
ThreadPool State (Gauge)
-
Number
threadActiveCount
-
Number
completedTaskCount
-
Number
largestPoolSize
-
Number
totalTaskCount
-
Number
queueSize
Property Values (Informational)
-
Number
propertyValue_corePoolSize
-
Number
propertyValue_keepAliveTimeInMinutes
-
Number
propertyValue_queueSizeRejectionThreshold
-
Number
propertyValue_maxQueueSize
A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Twitter @HystrixOSS | Jobs