Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Flexible Metrics strategy #854

Closed
mattrjacobs opened this issue Aug 6, 2015 · 2 comments
Closed

More Flexible Metrics strategy #854

mattrjacobs opened this issue Aug 6, 2015 · 2 comments

Comments

@mattrjacobs
Copy link
Contributor

Currently, HystrixCommandMetrics / HystrixThreadPoolMetrics / HystrixCollapserMetrics are all concrete classes which proscribe a single metrics strategy. In each case, they do in-memory summarization of counts and latencies. This is generally fine, but it provides no flexibility if anything else is desired. See #333 for an example.

@mattrjacobs
Copy link
Contributor Author

#843 should solve this issue. This will be the first commit after I branch off 1.4.x

@mattrjacobs
Copy link
Contributor Author

I've spent the last 2 weeks prototyping using this change, and that has refined my thinking. I'm leaving this open until I get a few more tasks done.

I would like Hystrix 1.5.0 to support multiple modes of operation w.r.t metrics. Here are a few examples:

A) Work as-is today, which is to aggregate into in-memory data structures for commands / threadpools / collapsers. Circuit-breakers are based on a rolling window of command outcomes. A metrics publisher plugin gets the metrics off-box on some sort of interval.

B) Shift as much metrics aggregation off-box as possible. Per-request, flush all state that got built over a request (command executions / collapser executions). Provide a way to access longer-lived metrics, such as thread pool / queue utilization or concurrency experienced by a command. The only reason to keep on-box data structures is for circuit-breaking. This has the advantage of never losing any data. Interesting data can be directly computed by the off-box aggregator, such as a true histogram of command latency / thread-pool utilization / interarrival time of a command.

C) Keep metrics aggregation on-box, but allow for different representations. Circuit-breaking should still be supported, so a rolling window of command outcomes should still be there, but everything else is up for grabs. Collapser metrics may be dropped, for instance. Or you could store each Command / Collapser event in a List and publish that List periodically for downstream processing.

From this, I'm creating some concrete tasks to get done for 1.5.0

  • Add collapser executions to request state (supports B)
  • Add command startTime and distinct latency info to request state (supports B)
  • Create text and binary representation of full Hystrix data on per-request-basis (supports B)
  • Add semantic metric type to metrics (Rolling Sum / Cumulative Sum / Snapshot / etc). This allows for more generic code to be written in each of the metrics publisher plugins. (supports A, B, C)
  • Give HystrixRollingNumber and HystrixRollingPercentile a way to share the logic for bucket-rolling (supports A, C)
  • Cache commonly-read values for HystrixRollingPercentile (A, C)
  • Tie HealthCounts to bucket-rolling for HystrixCommandMetrics. I don't see a ton of value in allowing these to be computed independently. (A, B, C)
  • Evaluate performance impact of a background thread performing the bucket-rolling algorithm. This would save every metrics write/read from having to check the current time to determine if it should do a bucket-roll. (A, B, C)

If there are other cases to consider, or any concerns the above does not address, I'd love to hear them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant