Set terminate_early to trackTotalHitsUpTo on non-scoring boolean queries for performance #18842

peteralfonsi · 2025-07-25T22:22:49Z

Description

On boolean queries with only filter or must_not clauses, all documents have equal scores. We only track the total number of hits up to trackTotalHitsUpTo, 10k by default. Since all documents are tied in score, and we use lower doc IDs as a tiebreaker, we will never return any documents after trackTotalHitsUpTo and they won't impact the returned hit count. (This assumes no aggregations/sorting/pagination/scrolling). Therefore, we should enable terminate_after on these queries, to avoid scanning through further documents after 10k hits are gathered. This can cause significant speedups. For more explanation see the issue.

This is similar to ApproximatePointRangeQuery in terms of its effect. However there are some pros and cons to each, and after discussion with @harshavamsi we think these can go in parallel:

The terminate_after method is much simpler and requires no changes to Query/Scorer logic
Extending it to other field types happens automatically, and extending it to other constant-scoring query types is as simple as an instanceof check for that query type
terminate_after's performance is slightly better, mostly because collectors operate on the shard level rather than the segment level, so we stop searching after 10k documents instead of segment_count * 10k documents as we would with approximate queries. Custom query logic overhead is probably also a factor.
However, approximate queries can be used on sorted queries, while terminate_after cannot.

I ran this branch of OSB using http_logs. Each query is a boolean query with 2 filter clauses. The selectivity of the queries varies a lot.

Results with concurrent segment search off:

Query	Original p50 (ms)	Contender p50 (ms)	Speedup as fraction of original
"request" matches "images" & "status" matches "200"	783	12.6	62x
"request" matches "images" & "status" matches "500"	12.5	9.8	1.3x
"request" matches "images" & "status" matches "400"	39.9	15.4	2.6x
"request" matches "images" & "timestamp" from 6/10-6/13	432	8.8	49x
"timestamp" from 6/10-6/13 & "request.raw" in list of top 10 terms	235	9.1	26x

We can see the speedup is greatest the more docs the query matches, which makes sense.

Results with concurrent segment search on:

Query	Original p50 (ms)	Contender p50 (ms)	Speedup as fraction of original
"request" matches "images" & "status" matches "200"	596	17.8	33x
"request" matches "images" & "status" matches "500"	11.8	11.3	1.04x
"request" matches "images" & "status" matches "404"	31.6	18.3	1.7x
"request" matches "images" & "timestamp" from 6/10-6/13	346	11.1	31x
"timestamp" from 6/10-6/13 & "request.raw" in list of top 10 terms	173	12.9	13x

Note there were some concerns around CSS + terminate_after in the past. However, after discussion with @jed326 it seems it is ok in this case. The main concern was a user manually setting terminate_after to some value like 1k. Then, if they enabled CSS with 4 slices for example, they would suddenly be gathering 4 * 1k docs, increasing resource usage. This isn't a concern in our case, because we only apply this change if terminate_after is not set on the query, and the number of docs processed can only ever decrease or stay the same. So, enabling this optimization will never cause increased resource usage with CSS.

Note that once terminate_after is set to 10k, CSS actually performs a little worse than non-CSS, which is expected because of the slight overhead + >1 slices. In future we could disable CSS when this optimization enables terminate_after but it seems as of today, at this point in the query processing CSS can't be enabled/disabled. However the speedup is still significant so I think it's ok to leave this for later.

Related Issues

Resolves #18510

Check List

Functionality includes testing.
[N/A] API changes companion pull request created, if applicable.
[N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

github-actions · 2025-07-28T22:29:32Z

❌ Gradle check result for 0066d64: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

github-actions · 2025-07-28T23:06:14Z

❌ Gradle check result for 52ee010: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

github-actions · 2025-07-29T17:58:06Z

❌ Gradle check result for 44c6677: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

peteralfonsi · 2025-07-29T20:13:17Z

Flaky tests: #14509, #18490, #18157

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

github-actions · 2025-07-29T21:33:06Z

❌ Gradle check result for 9fccadc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

peteralfonsi · 2025-07-29T22:11:28Z

Flaky test: #17271

Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>

github-actions · 2025-07-29T22:56:30Z

❌ Gradle check result for 8929416: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

peteralfonsi · 2025-07-30T16:23:46Z

#18872

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

github-actions · 2025-07-30T17:47:52Z

✅ Gradle check result for 6eec6df: SUCCESS

codecov · 2025-07-30T17:48:15Z

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.70%. Comparing base (3b7603e) to head (6eec6df).
⚠️ Report is 442 commits behind head on main.

Files with missing lines	Patch %	Lines
...va/org/opensearch/search/DefaultSearchContext.java	78.57%	0 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #18842      +/-   ##
============================================
+ Coverage     72.67%   72.70%   +0.02%     
+ Complexity    68610    68600      -10     
============================================
  Files          5577     5577              
  Lines        315375   315391      +16     
  Branches      45772    45784      +12     
============================================
+ Hits         229209   229300      +91     
+ Misses        67613    67518      -95     
- Partials      18553    18573      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

atris · 2025-08-04T17:52:20Z

server/src/main/java/org/opensearch/search/DefaultSearchContext.java

+            // but at this point the CSS logic already assumes it can't be changed. We can revisit this in future.
+        }
+        // TODO: In future we can do the same for any constant-score query
+        return false;


Should we just disable CSS when early termination is enabled?

atris · 2025-08-04T18:23:14Z

server/src/main/java/org/opensearch/search/DefaultSearchContext.java

+        assert from != -1 : "Cannot call `tryEnablingEarlyTermination` until after `from` has been set";
+        if (query == null) return false;
+
+        if (terminateAfter != DEFAULT_TERMINATE_AFTER) return false;


Can we check for deeper queries, such as

Nested BooleanQueries inside FILTER clauses

ConstantScoreQuery wrapping (common in filter contexts)

MinimumShouldMatch requirements not exposed in BooleanQuery API

atris · 2025-08-04T18:24:03Z

server/src/main/java/org/opensearch/search/DefaultSearchContext.java

+
+        // We can only set terminateAfter to trackTotalHitsUpTo if we only have filter and must_not clauses
+        if (bq.getClauses(Occur.MUST).isEmpty() && bq.getClauses(Occur.SHOULD).isEmpty()) {
+            terminateAfter = Math.max(size, trackTotalHitsUpTo);


Should we not be checking for null query here? Returning false for null query is wrong. A null query is equivalent to MatchAllQuery in OpenSearch, which IS a constant-scoring query suitable for early termination.

atris · 2025-08-04T18:24:33Z

server/src/main/java/org/opensearch/search/DefaultSearchContext.java

+
+        if (terminateAfter != DEFAULT_TERMINATE_AFTER) return false;
+        if (!(query instanceof BooleanQuery bq)) return false;
+


No check for search profiling. Early termination will produce misleading profiler output showing only partial query execution.

atris · 2025-08-04T18:25:25Z

server/src/internalClusterTest/java/org/opensearch/search/query/BooleanQueryIT.java

+        int trackTotalHitsUpTo = 500;
+
+        // Enforce 1 shard per node, so that no shard has < trackTotalHitsUpTo matching docs and cannot actually terminate early
+        assertAcked(


Where are we validating multi shard behaviour?

atris · 2025-08-04T18:25:59Z

server/src/internalClusterTest/java/org/opensearch/search/query/BooleanQueryIT.java

+            .setTerminateAfter(1_000_000)
+            .get();
+        assertFalse(originalQueryResponse.isTerminatedEarly()); // Returns false not null when TA was set but not reached
+        assertHitCount(originalQueryResponse, trackTotalHitsUpTo, GREATER_THAN_OR_EQUAL_TO);


Should we also test for mixed clause types that look like filters?

atris · 2025-08-04T18:26:24Z

server/src/internalClusterTest/java/org/opensearch/search/query/BooleanQueryIT.java

+            .get();
+        assertTrue(response.isTerminatedEarly());
+        // Note: queries that have finished early with terminate_after will return "eq" for hit relation
+        assertHitCount(response, trackTotalHitsUpTo, EQUAL_TO);


Should we validate the correctness of the returned documents as well?

atris · 2025-08-04T18:27:20Z

server/src/main/java/org/opensearch/search/DefaultSearchContext.java

+        if (!(query instanceof BooleanQuery bq)) return false;
+
+        if (aggregations() != null) return false;
+        if (from > 0 || searchAfter != null) return false;


Why cant early termination not work with search_after?

opensearch-trigger-bot · 2025-09-04T15:22:31Z

This PR is stalled because it has been open for 30 days with no activity.

opensearch-trigger-bot · 2025-10-09T15:30:57Z

This PR is stalled because it has been open for 30 days with no activity.

opensearch-trigger-bot · 2025-11-16T15:22:27Z

This PR is stalled because it has been open for 30 days with no activity.

Peter Alfonsi added 6 commits July 1, 2025 14:09

add early termination

381b531

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Fix test

288e66d

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

More test cases

dbb86ae

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Add UT

1b9323e

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

cleanup

aacded3

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Improve IT

3260538

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

peteralfonsi requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, VachaShah, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners July 25, 2025 22:22

github-actions bot added enhancement Enhancement or improvement to existing feature or request lucene Search:Performance labels Jul 25, 2025

Fix suggest yaml test

0066d64

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

spotless apply

52ee010

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Rerun gradle

44c6677

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

This was referenced Jul 29, 2025

[AUTOCUT] Gradle Check Flaky Test Report for FullRollingRestartIT #18490

Closed

[AUTOCUT] Gradle Check Flaky Test Report for RecoveryWhileUnderLoadIT #14509

Open

[AUTOCUT] Gradle Check Flaky Test Report for WarmIndexSegmentReplicationIT #18157

Open

rerun gradle

9fccadc

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Merge branch 'main' into filter-terminate-early

8929416

Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>

opensearch-ci-bot mentioned this pull request Jul 30, 2025

[AUTOCUT] Gradle Check Flaky Test Report for TransferManagerBlobContainerReaderTests #18872

Open

rerun gradle

6eec6df

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

atris requested changes Aug 4, 2025

View reviewed changes

opensearch-ci-bot mentioned this pull request Aug 12, 2025

[AUTOCUT] Gradle Check Flaky Test Report for ConcurrentSeqNoVersioningIT #17271

Open

opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 4, 2025

opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Oct 9, 2025

opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 16, 2025


		if (terminateAfter != DEFAULT_TERMINATE_AFTER) return false;
		if (!(query instanceof BooleanQuery bq)) return false;

Set terminate_early to trackTotalHitsUpTo on non-scoring boolean queries for performance #18842

Are you sure you want to change the base?

Set terminate_early to trackTotalHitsUpTo on non-scoring boolean queries for performance #18842

Uh oh!

Conversation

peteralfonsi commented Jul 25, 2025

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

peteralfonsi commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

peteralfonsi commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

peteralfonsi commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

codecov bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

opensearch-trigger-bot bot commented Sep 4, 2025

Uh oh!

opensearch-trigger-bot bot commented Oct 9, 2025

Uh oh!

opensearch-trigger-bot bot commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jul 30, 2025 •

edited

Loading