-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Histogram support aggregation #1364
Conversation
I just moved the Histogram code to new type of plugins (which I called filter) this will make it easier to write test cases against it and will allow future add of these types of plugins which act like middleware between input and the output, some of use cases for middleware/filter plugins will be adding sampling, bandwidth control, control tag and fields included in metrics, shaping metrics like adding/renaming tags, fields and even metric names plus editing metric values . adding aggregation support , split metrics to multi metric based on fields, combine metrics to one metrics and a lot more. |
b0c2322
to
710f97c
Compare
I will add more error handling and Unit test once we agree one filter implementation. |
b95f146
to
d1d0adf
Compare
@sparrc any idea on this ? what you think about the filter plugins ? |
@alimousazy you'll need to give me more time, I'm very swamped with other features at the moment. |
sure. take your time. |
Will this also handle sum/aggregate count? |
@jadbox it does handle Here is an example usage if you can merge pull and compile it from source
You can add more percentile if you want like [0.90, 0.50] I'm planning to add more filter in the future (which intercept metrics between input and output plugins) some of these filters might be [[filter.rename]] ,[filter.rename.tags],[[filter.must_have]], [[filter.sample_metrics]], [[filter.only_pass_if]], [[filter.bandwidth]] which help shape metrics before emitting for final distention. |
@alimousazy "Sum can be added easily if you want I can do that." YES, please! My data "firehose" looks like the following (abstractly):
and I want to aggregate in telegraph before sending to Influx:
My pipeline does 100k/s in messages so micro-aggregates before going into Influx is needed. |
if joe is used as name of metric then configuration should be like this
*Note: you can tone aggregation interval by modifying flush_interval (I may change flush interval to aggregation interval) , If you don't need percentile just leave the array empty. The output will be similar to this
|
@alimousazy Thanks, that's helpful. I guess what I'm really looking for is just aggregation... not histogram aggregation. I'll take this concern to #380 |
96b0948
to
3008ce0
Compare
I'm planing to add support for all the metric types mentioned here http://metrics.dropwizard.io/3.1.0/getting-started/, once this get reviewed and we agree on first version |
fafb725
to
8f7bbdb
Compare
@alimousazy the histogram filter should do all by default. Since Filters in general should have a plugin structure and should thus appear in the default config file. Can you explain how that is going to look and exactly how one would configure the filters for generically aggregating on fields, tags, measurement names, etc.? |
@sparrc Filters are middleware which stand between inputs and outputs (Pipes) . here is how histogram filter configuration. (Note that filter are independent from input and output ) Filters will look like this in configuration files (Note the configuration is flexible and we can change it)
here is the interface that filters should implemented
we may add other filters (middleware) in the future like adding sampling, bandwidth control, control tag and fields included in metrics, shaping metrics like adding/renaming tags, fields and even metric names plus editing metric values , split metrics to multi metric based on fields, combine metrics to one, alerting and anomaly detection. Please check the code for more detail 8f7bbdb |
I'll need to do a fuller review, but like I said initially this is going to need to support matching metrics on more than just measurement name, and all name matching will need to support glob filters, much of which we already have implemented in |
filter := a.Config.Filters[name] | ||
log.Printf("Filter %s is enabled", name) | ||
metricStream = filter.Pipe(metricStream) | ||
go func(f telegraf.Filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this in a goroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter suppose to work in is own context like other plugins it may have timer inside and i don't want to block the agent code from collecting metrics or flushing them. consider filter as pipe with input and output (black box). Histogram filter for example has timer for flushing metrics on specific intervals
I have to add command line option to generate default filters (Disabled) when generating configuration file ) please let me know what command line option you prefer for such operation ? also I'm going to work on adding glob matching on metric value and tags value ( I think it is not possible to do matching on field value since it is numeric ), is that what you expect ? |
829b0d5
to
38849c9
Compare
Command line option added
I will work on adding globing and unit test |
997a54a
to
88bea9e
Compare
@sparrc I added a new way to match metric, User can now specify Tags or Measurement to match against (Glob enabled) , rollup name, or function to apply on data set beside if they want to keep the old metric @jadbox. User also can pass the original metric and only create rollup for aggregation like in second example
Please tell me what you think |
I'm closing this in favor of #1777 |
Required for all PRs:
Adding Histogram (aggregation) support. User can list down metric for aggregation, supported methods
1- Percentile 90 80 ... 20 etc.
2- Count.
3- Mean.
4- Variance.
Method used to calculate these values has a very low memory print and specially designed for streaming data. More information can be found here http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf and here https://www.vividcortex.com/blog/2013/07/08/streaming-approximate-histograms/.
Note: This method of Streaming Approximate Histograms result quantile approximations there for it may not lead for exact values. User can optimize accuracy by increase bucket size.
I'm thinking about introducing middleware plugins which process data between inputs and output this will make the agent code clean and allow user to provide additional processing to the pipeline, If you agree on this I will move the code and put it inside a folder called middleware. Creating interface for middle ware to add and retrieve metrics using channels