-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend query size limit using scroll #716
Extend query size limit using scroll #716
Conversation
public Integer getMaxResultWindow() { | ||
// system index doesn't need this function | ||
// the magic number is the number of fields of the mapping table | ||
return 27; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a better way to do this
Codecov Report
@@ Coverage Diff @@
## main #716 +/- ##
============================================
- Coverage 97.76% 94.83% -2.94%
- Complexity 2880 2898 +18
============================================
Files 276 287 +11
Lines 7077 7802 +725
Branches 447 568 +121
============================================
+ Hits 6919 7399 +480
- Misses 157 349 +192
- Partials 1 54 +53
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
*/ | ||
public void pushDownLimit(Integer limit, Integer offset) { | ||
SearchSourceBuilder sourceBuilder = request.getSourceBuilder(); | ||
if (limit + offset > maxResultWindow) { | ||
limit = maxResultWindow - offset; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for scrolling.
OpenSearch requires that from(offset) + size(limit) <= index.max_result_window
. A limit operator with limit+offset > index.max_result_window
will invoke scroll request multiple times, each with batch size index.max_result_window
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if condition limit + offset > maxResultWindow
is used in different place, could we simply it?
Integer limit = node.getLimit(); | ||
Integer offset = node.getOffset(); | ||
PlanContext planContext = context.getPlanContext(); | ||
if (limit + offset > planContext.getMaxResultWindow()) { | ||
planContext.setIndexScanType(PlanContext.IndexScanType.SCROLL); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to reduce duplicate code. visitLimit and visitHead
} else { | ||
fail("Search request after empty response returned already"); | ||
when(response.isEmpty()).thenReturn(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now fetch results in batches, additional search requests can be made when calling hasNext()
@@ -86,6 +88,19 @@ public Map<String, IndexMapping> getIndexMappings(String... indexExpression) { | |||
} | |||
} | |||
|
|||
@Override | |||
public Integer getIndexMaxResultWindow(String... indexExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments to explain the min(index...).
@@ -30,6 +30,8 @@ public interface OpenSearchClient { | |||
*/ | |||
Map<String, IndexMapping> getIndexMappings(String... indexExpression); | |||
|
|||
Integer getIndexMaxResultWindow(String... indexExpression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docs.
this.request = new OpenSearchQueryRequest(indexName, | ||
settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory); | ||
this.maxResultWindow = context.getMaxResultWindow(); | ||
switch (context.getIndexScanType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i not sure how the PlanContext help in this case. seems we could build OpenSerachRequest when visit the logical plan tree.
@@ -38,7 +39,11 @@ public MergeLimitAndRelation() { | |||
} | |||
|
|||
@Override | |||
public LogicalPlan apply(LogicalLimit plan, Captures captures) { | |||
public LogicalPlan apply(LogicalLimit plan, Captures captures, PlanContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why keeping the Limit node?
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndexScan.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public boolean hasNext() { | ||
if (isAggregation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregation should also support scroll, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I'll work on it next. Aggregation works quite different from non-agg queries. If there's a lot to change I'll consider making a separate PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could open a issue if it is not covered in this PR?
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@@ -54,6 +58,36 @@ public Map<String, IndexMapping> getIndexMappings(String... indexExpression) { | |||
} | |||
} | |||
|
|||
@Override | |||
public Map<String, Integer> getIndexMaxResultWindows(String... indexExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need mapping or just single maxResultWindow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approach this in a way similar to how getIndexMappings
handles it.
For the client, getIndexMappings
and getIndexMaxResultWindows
return a Mapping from the index name to the corresponding result, without extra business logic.
In OpenSearchDescribeIndexRequest
, getMaxResultWindow
chooses the minimum of these values. This is specific to our need: if there's multiple indices, get the minimum max_result_window. Similarly, getFieldTypes
unions the mappings of multiple indices here.
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sean Kao <seankao@amazon.com>
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchQueryRequest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-716-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f3c9c29538abdf93780e156edba3558fd43479da
# Push it to GitHub
git push --set-upstream origin backport/backport-716-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
Description
#703 for non aggregation queries.
With this feature, we can
source=index
and get backquery.size_limit
resultssource=index | head x
wherex <= index.max_result_window
and get back x results by regular OpenSearch query requestsource=index | head x
wherex > index.max_result_window
and get back x results by scrolling, bypassing the OpenSearch restriction, essentiallyThis can be combined with other commands that post-processes the data rows fetched, e.g.
source=index | head 100000 | top <field>
source=index | head 100000 | <ml command>
Similar also works for SQL LIMIT operator.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.