[Feature Request] Dynamically update `maxClauseCount` based on resources on a node #12549

harshavamsi · 2024-03-06T21:58:48Z

Is your feature request related to a problem? Please describe

Today lucene has a concept of max_clause_count which prevents the number of clauses from exceeding when expanding wildcard and prefix queries. OpenSearch inherited ES 7.10 which used a static index.query.bool.max_clause_count setting that is set to a max of 1024 by default.

In https://issues.apache.org/jira/browse/LUCENE-8811, lucene changed the clause count from affecting only boolean queries to all queries on the index by adding the constraint at the indexSearcher level. This means that the expansion effects all the fields and is now very easy to cross the limit.

A workaround was to use a rewrite to avoid hitting that limit, but it no longer works since the number of clauses after a rewrite is now what is counted.

A lot of users trying to migrate from OS 1.x to 2.x have reported this problem where their queries would previously not hit the clause limit, but since upgrading have now run into the limits. ElasticSearch themselves have updated the way that the limit is computed and moved away from a static limit of 1024 to a more dynamic resource based(thread pool and heap size) limit.

Graylog2/graylog2-server#14272 is an example of people running into this issue.

Describe the solution you'd like

Rather than relying on INDICES_MAX_CLAUSE_COUNT_SETTING from cluster settings, compute threadpool and heap sizes from jvm stats and use them to determine clause limit.

Is there a better metric to compute them? The limit was introduced to prevent rouge queries from consuming too many threads, CPU, and memory. It would make sense to update the limit based on these attributes.

Related component

Search:Resiliency

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

harshavamsi · 2024-03-08T00:33:00Z

@msfroh @reta @andrross thoughts?

reta · 2024-03-08T20:07:29Z

Thanks @harshavamsi

Rather than relying on INDICES_MAX_CLAUSE_COUNT_SETTING from cluster settings, compute threadpool and heap sizes from jvm stats and use them to determine clause limit

AFAIK this is not a cluster but node level setting

Is there a better metric to compute them? The limit was introduced to prevent rouge queries from consuming too many threads, CPU, and memory. It would make sense to update the limit based on these attributes.

I think it would make sense to explore this path, but it also not clear to me how to align this max number with cpu / heap (that would vary per node), @msfroh any heuristics you may be aware of? thank you

andrross · 2024-03-13T15:34:58Z

[Triage - attendees 1 2 3]
@harshavamsi Thanks for filing this issue.

msfroh · 2024-03-29T21:02:46Z

@msfroh any heuristics you may be aware of? thank you

I have a few thoughts about this:

I'm not convinced that a good heuristic exists for how many clauses is "too many" for some hardware. In particular, the cost of a clause really depends on the clause itself. What kind of clause is it? How many docs does the clause match? Which index structure is the clause hitting (postings, point tree, doc values)?
Any estimate that we come up with may be wrong. We should allow users to adjust the value without needing to restart their node.
In a perfect world, I would make it an index-level setting, since we can theoretically connect the index with the workload running on it. Unfortunately, while the setting was moved into IndexSearcher, it's a static variable (probably for backward compatibility?). Before Lucene 10 is released, maybe we can turn it into a member variable?

AFAIK this is not a cluster but node level setting

@reta -- do you know if there's any backward-compatibility issue with turning it into a dynamic cluster-level setting? I suppose it would break if different nodes have different values in their opensearch.yml file.

reta · 2024-03-30T02:51:24Z

@reta -- do you know if there's any backward-compatibility issue with turning it into a dynamic cluster-level setting? I suppose it would break if different nodes have different values in their opensearch.yml file.

Thanks @msfroh , at high level I don't see why this settings couldn't be dynamic and at a cluster level, but the evil is in details I think: the IndexSearcher::maxClauseCount is not supposed to be changed dynamically at runtime, it is not volatile nor designed to be managed safely in a presence of multiple threads. I was looking into original pull request as well [1] for the clues at it seems like it was the concern too [2] (with BQ but IndexSearcher inherited same implementation).

[1] elastic/elasticsearch#18341
[2] elastic/elasticsearch#18341 (comment)

harshavamsi added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 6, 2024

github-actions bot added the Search:Resiliency label Mar 6, 2024

github-project-automation bot added this to Search Project Board Mar 6, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Mar 6, 2024

andrross removed the untriaged label Mar 13, 2024

harshavamsi added this to Performance Roadmap Apr 1, 2024

github-project-automation bot moved this to Todo in Performance Roadmap Apr 1, 2024

harshavamsi moved this from Todo to Now (This Quarter) in Performance Roadmap Apr 1, 2024

getsaurabh02 moved this from Now (This Quarter) to In Progress in Performance Roadmap Apr 29, 2024

harshavamsi mentioned this issue May 7, 2024

Set INDICES_MAX_CLAUSE_COUNT dynamically #13568

Merged

8 tasks

harshavamsi moved this from In Progress to In-Review in Performance Roadmap May 13, 2024

reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.16.0 Issues and PRs related to version 2.16.0 labels Jun 13, 2024

harshavamsi moved this from In-Review to Done in Performance Roadmap Jul 1, 2024

harshavamsi closed this as completed Jul 19, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Dynamically update `maxClauseCount` based on resources on a node #12549

[Feature Request] Dynamically update `maxClauseCount` based on resources on a node #12549

harshavamsi commented Mar 6, 2024

harshavamsi commented Mar 8, 2024

reta commented Mar 8, 2024

andrross commented Mar 13, 2024

msfroh commented Mar 29, 2024

reta commented Mar 30, 2024 •

edited

Loading

[Feature Request] Dynamically update maxClauseCount based on resources on a node #12549

[Feature Request] Dynamically update maxClauseCount based on resources on a node #12549

Comments

harshavamsi commented Mar 6, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

harshavamsi commented Mar 8, 2024

reta commented Mar 8, 2024

andrross commented Mar 13, 2024

msfroh commented Mar 29, 2024

reta commented Mar 30, 2024 • edited Loading

[Feature Request] Dynamically update `maxClauseCount` based on resources on a node #12549

[Feature Request] Dynamically update `maxClauseCount` based on resources on a node #12549

reta commented Mar 30, 2024 •

edited

Loading