Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-field keys to terms aggs #65623

Closed
imotov opened this issue Nov 30, 2020 · 7 comments · Fixed by #67597
Closed

Add support for multi-field keys to terms aggs #65623

imotov opened this issue Nov 30, 2020 · 7 comments · Fixed by #67597
Assignees
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@imotov
Copy link
Contributor

imotov commented Nov 30, 2020

There is a need in kibana (see elastic/kibana#77632) to dynamically form a key from multiple ordered fields, similar to composite aggs. The example that kibana team provided is to find top 10 cpu users, where each user is identified by a compound key consisting of datacenter id, host id, and container id. The required result is 3 keys and cpu usage metric with everything sorted by cpu usage descending:

datacenter host name container id cpu (sorted 🔽)
us-east-1 f80a10 b91e34 55.20
us-east-2 6dd558 493fb5 42.00
us-west-2 ba4d4 9675e5 40.73
us-west-1 60413e 0574ef 39.92
us-east-1 f80a10 b91e34 36.21
us-east-2 6dd558 47b9c81 34.58
us-west-1 ba4d4 9675e5 24.01
us-east-1 f80a10 b91e34 22.40
us-east-2 6dd558 493fb5 15.91
us-west-1 27380b f44031 12.33

The current solution is really awkward and limiting concatenation of fields using a script. We would like to offer a more streamlined solution for Kibana here.

@imotov imotov self-assigned this Nov 30, 2020
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

imotov added a commit to imotov/elasticsearch that referenced this issue Dec 17, 2020
In elastic#65623 we are adding a new aggregation that can reuse some of the
non-trivial logic of terms aggs such as reduction. This refactoring
moves some of this logic into a parent class where it can be reused.

Relates to elastic#65623
imotov added a commit that referenced this issue Dec 18, 2020
In #65623 we are adding a new aggregation that can reuse some of the
non-trivial logic of terms aggs such as reduction. This refactoring
moves some of this logic into a parent class where it can be reused.

Relates to #65623
imotov added a commit that referenced this issue Dec 18, 2020
In #65623 we are adding a new aggregation that can reuse some of the
non-trivial logic of terms aggs such as reduction. This refactoring
moves some of this logic into a parent class where it can be reused.

Relates to #65623
@dgieselaar
Copy link
Member

dgieselaar commented Dec 23, 2020

I think this will be very useful for APM as well. There's a lot of data that usually should only be aggregated together based on a set of fields. E.g., for transaction duration data, we want to aggregate on service.name, service.environment and transaction.type. Or service.name and service.node.name for JVM memory usage.

imotov added a commit to imotov/elasticsearch that referenced this issue Jan 15, 2021
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.

Closes elastic#65623
@jimczi
Copy link
Contributor

jimczi commented Jan 18, 2021

How does it relates to #66247 ? I can see why it would be useful to group things by several dimensions but that's only one side of the issue. The real complexity imo is to use this information correctly when computing the average cpu per host for instance, like described in #66247 ? Is it a different problem that requires another solution ?

@imotov
Copy link
Contributor Author

imotov commented Jan 19, 2021

Is it a different problem that requires another solution ?

We are not trying to tackle #66247 here. As you said, it has a different set of challenges. I think the multi term agg might be used as part of the overall solution, but we will definitely need a different part of the solution that deals with downsampling aspects. As far as I know we didn't discuss #66247 in details yet, but I have a feeling that we might run into the same type of issues we ran in #60619.

@dgieselaar
Copy link
Member

@jimczi FWIW, I'm not aware that we have the same issue in the APM app. We use avg and max for our metrics aggregations, at least as far as I know.

Re: multi-field keys, we're currently using both nested terms aggregations and composite aggregations to get aggregate results over multiple fields. Both have their downsides (num buckets, sorting, performance, etc). I'm not sure what the performance characteristics of multi_terms will be but if it comes close to a terms agg or is faster than a composite agg it might be very appealing to us.

imotov added a commit that referenced this issue Feb 3, 2021
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.

Closes #65623
@YohanSciubukgian
Copy link

YohanSciubukgian commented Feb 3, 2021

Could it be backported to 7.11 ?

@imotov
Copy link
Contributor Author

imotov commented Feb 3, 2021

@YohanSciubukgian no, it's too late to add new features to 7.11. This one will go to 7.12. Sorry.

imotov added a commit to imotov/elasticsearch that referenced this issue Feb 3, 2021
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.

Closes elastic#65623
imotov added a commit that referenced this issue Feb 4, 2021
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.

Closes #65623
imotov added a commit to imotov/elasticsearch that referenced this issue Feb 22, 2021
This PR clarifies when multi_terms aggs should be used instead of composite
aggs or nested term aggs.

Relates to elastic#65623
imotov added a commit that referenced this issue Feb 23, 2021
This PR clarifies when multi_terms aggs should be used instead of composite
aggs or nested term aggs.

Relates to #65623
imotov added a commit to imotov/elasticsearch that referenced this issue Feb 23, 2021
This PR clarifies when multi_terms aggs should be used instead of composite
aggs or nested term aggs.

Relates to elastic#65623
imotov added a commit to imotov/elasticsearch that referenced this issue Feb 23, 2021
This PR clarifies when multi_terms aggs should be used instead of composite
aggs or nested term aggs.

Relates to elastic#65623
imotov added a commit that referenced this issue Feb 23, 2021
This PR clarifies when multi_terms aggs should be used instead of composite
aggs or nested term aggs.

Relates to #65623
imotov added a commit that referenced this issue Feb 23, 2021
This PR clarifies when multi_terms aggs should be used instead of composite
aggs or nested term aggs.

Relates to #65623
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants