Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top k processor #3691

Closed
wants to merge 546 commits into from
Closed

Top k processor #3691

wants to merge 546 commits into from

Conversation

mirath
Copy link
Contributor

@mirath mirath commented Jan 18, 2018

This processor objective is to provide the possibility to pick the top k metrics from a set of metrics. It aims to solve #3192. It also solves #3562, as that issue was preventing the test to run succesfully

The pluging uses:

  • Selectors to pick the metrics that will be filtered
  • A group by clause to group the metrics based on metric names and tags
  • A list of fields
  • An aggregator function to agreggate each field of the grouped metrics
  • Once the aggregations for each group are calculated, the plugin orders the groups by each aggregation and returns all the metrics of each group in the top k spots

PR #3400 closed in favor of this one, as I mangled the branch through several irresponsible rebases

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@mirath
Copy link
Contributor Author

mirath commented Jan 18, 2018

@danielnelson Here is the new PR. Much better XD

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, here are a few comments and ideas I had:

return topk
}

func NewTopKProcessor() telegraf.Processor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need this if you return a *TopK from NewTopK, since it is a telegraf.Processor by virtue of the functions begin defined. Since you just have a single new function, you should probably call it New, to match Go convention.

}

var sampleConfig = `
[[processors.topk]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch this to match commenting style of other plugins with the comments above the options. Options with a default value that do not need to be set should be commented out, but make sure it works if they are unset which can be hard to detect due to "zero values", use ## for the comment lines and wrap them at 78 chars.

example:

[[processors.topk]]
  ## How many seconds between aggregations. Default: 10
  # period = 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielnelson I've corrected this and several other issues. I've not pushed because in fixing this issue I changed lines tied to several open discussions, and I don't want them to show as updated when they are not yet resolved. When we come to a conclusion regarding the outstanding comments I will push my latest changes.

field string
}

func (ags Aggregations) Len() int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a pointer receiver for this type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't. The sort interface asks for a direct value:

plugins/processors/topk/topk.go:165: cannot use aggs (type Aggregations) as type sort.Interface in argument to sort.Sort:
	Aggregations does not implement sort.Interface (Len method has pointer receiver)
plugins/processors/topk/topk.go:167: cannot use aggs (type Aggregations) as type sort.Interface in argument to sort.Reverse:
	Aggregations does not implement sort.Interface (Len method has pointer receiver)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, BTW I have been heard from my colleagues that it is now preferred to use sort.SliceStable with a less func instead of defining ordering on the type.

aggregation = "avg" # What aggregation to use. Default: "avg". Options: sum, avg, min, max

bottomk = false # Instead of the top k largest metrics, return the bottom k lowest metrics. Default: false
simple_topk = false # If true, this will override any GroupBy options and assign each metric its own individual group. Default: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we need either group_by to be non-empty or simple_topk to be true, else we have nothing to group by. What if we treated an empty group_by as enable simple_topk and remove it as an option?

Copy link
Contributor Author

@mirath mirath Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should treat an empty group_by to aggregate over all the tags of the metric. Speaking of which, we could allow regexes in the group_by list, so users with dynamically generated tags can make better use of the processor.

# this setting) to each metric with the value of the calculated GroupBy tag.
# Useful for debugging. Default: ""

position_field = "" # This settings provides a way to know the position of each metric in the top k.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be called rank and the generated field would be something like: memory_rss_rank.

}

var sampleConfig = `
[[processors.topk]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sampleConfig only, remove the [[processors.topk]] line, this gets added by Telegraf when generating the default config, test with telegraf config | less

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved


# Metrics are grouped based on their tags and name. The plugin aggregates the selected fields of
# these groups of metrics and sorts the groups based these aggregations
group_by = ["process_name"] # Over which tags should the aggregation be done. Default: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would someone want to specify less than the full set of tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have metrics with different tag sets that you want to aggregate together. For example, I want the top 10 processes by RSS size across all the instances in the cluster0.

Are you thinking that there should be a way to say "I want to aggregate across all tags" without having to specify tag by tag? That would certainly help in environments with dynamic tags

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and aggregate across all tags should probably be the default, I would do what you mentioned with empty list matching all. Use the filter.Filter type which support globs (not regex), which will make it behave like the measurement filtering options.

return []telegraf.Metric{}
}

func min(a, b int) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this function is called, remove it if you can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called in line 221

// Create a one dimentional list with the top K metrics of each key
for i, ag := range aggregations[0:min(t.K, len(aggregations))] {

# The plugin returns a metric if it is in a group in the top k groups ordered by any of the aggregations of the selected fields
# This effectively means that more than K metrics may be returned. If you need to return only the top k metrics regardless of grouping, use the simple_topk setting
fields = ["memory_rss"] # Over which fields are the top k are calculated. Default: ["value"]
aggregation = "avg" # What aggregation to use. Default: "avg". Options: sum, avg, min, max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call avg -> mean to be consistent with basicstats aggregator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved

# The plugin returns a metric if it is in a group in the top k groups ordered by any of the aggregations of the selected fields
# This effectively means that more than K metrics may be returned. If you need to return only the top k metrics regardless of grouping, use the simple_topk setting
fields = ["memory_rss"] # Over which fields are the top k are calculated. Default: ["value"]
aggregation = "avg" # What aggregation to use. Default: "avg". Options: sum, avg, min, max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to have support for different aggregations per field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good question. I thought about it and decided that someone down the line could contribute such a feature. Do you think is a valuable enough capability to delay the release and work on it before merging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add this in later.

@mirath
Copy link
Contributor Author

mirath commented Feb 15, 2018

@danielnelson I finished addressing the issues you brought up. I also made group_by_metric_name true by default, as I figured that that way the processor is most likely to work out of the box

Let me know if you have any more comments, or if we are ready to merge (and if the plugin would be available for Telegraf 1.6)

@danielnelson
Copy link
Contributor

I'll try to take a look as soon as I can, but this will have to be a 1.7 addition, sorry.

@mirath
Copy link
Contributor Author

mirath commented Feb 21, 2018

No worries, thx

Germán Jaber added 27 commits May 2, 2018 21:46
@mirath
Copy link
Contributor Author

mirath commented May 3, 2018

So I messed up the rebase again X.X

I'm closing this PR and opening a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.