Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow administrators to limit the depth of nested aggregations #67479

Open
williamrandolph opened this issue Jan 13, 2021 · 8 comments
Open

Allow administrators to limit the depth of nested aggregations #67479

williamrandolph opened this issue Jan 13, 2021 · 8 comments
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@williamrandolph
Copy link
Contributor

Instead of allowing a user to nest aggregations 10+ levels, this could be tunable, and shut down before the request ever hits a shard. Admins could set a limit on the number of nested aggregations.

This could be implemented as a circuit breaker, or perhaps as a search setting.

This issue was originally raised as part of #62457.

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@polyfractal
Copy link
Contributor

Adding team-discuss to this for our next team meeting.

I suspect this would be implemented this as a cluster setting since it's relatively static and can be checked when the query is parsed rather than during agg runtime.

That said, it's a pretty coarse hammer; you can construct non-abusive aggs that are also quite deep (or shallow aggs that are very abusive). But I do agree that depth tends to be a good metric for abuse potential, and allowing admins control over that could be a good safety net.

@Rupeshiya

This comment has been minimized.

@rjernst

This comment has been minimized.

@imotov
Copy link
Contributor

imotov commented Feb 11, 2021

We have discussed this in our team meeting and we thought that an estimated overall number of the buckets in aggregations might be a better metric to prevent very large queries from running. Due to the recent optimization effort that @nik9000 was doing, we already have some idea about dimension of some of our aggregation early in the query phase. So we can perhaps combine #68504 and this issue and add a cluster level parameter that will end requests that can result in an excessive number of buckets earlier.

@maosuhan
Copy link

@imotov It seems you prefer to use overall number of the buckets to early break the request. That's what I thought in #68504. Then I find some new issues #67474, #67478, it seems they track aggregation memory instead of bucket count and I begin to believe it is a better idea.
I'm a little curious why do you think the metric of bucket count is better.

@imotov
Copy link
Contributor

imotov commented Feb 22, 2021

@maosuhan the main issue with breaking on exceeding aggregation memory is that we cancel the query when it exhausts enough memory to trip the breaker. By this time, some damage is already done, so to speak. So, it might be beneficial to cancel the query as early as possible in order to prevent the query from wasting resources before being cancelled.

@maosuhan
Copy link

@imotov Agree with you about that and it is what we met in our production environment. In #67474, I think breaker.search.aggregation_memory.limit is a per-request memory control and is different from circuit breaker. I think it is similar to search.max_buckets but more reasonable.

breaker.search.aggregation_memory.limit is a per-request setting, we can set it to a relative small number like 10% of the heap so it can still early break the huge request. And if you set search.max_buckets to more than 1M buckets, it still slow to react. So in my opinion, they are both similar things.

If we use breaker.search.aggregation_memory.limit, it can solve the problem of size variant of each bucket so I think it is more reasonable. Also in #67478, it also use response size to try to early break the request which is very helpful in our case and may work better with #67474.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

7 participants