Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend query size limit using scroll #716

Merged
merged 36 commits into from
Aug 12, 2022
Merged

Extend query size limit using scroll #716

merged 36 commits into from
Aug 12, 2022

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Aug 1, 2022

Description

#703 for non aggregation queries.

With this feature, we can

  • run source=index and get back query.size_limit results
  • run source=index | head x where x <= index.max_result_window and get back x results by regular OpenSearch query request
  • run source=index | head x where x > index.max_result_window and get back x results by scrolling, bypassing the OpenSearch restriction, essentially

This can be combined with other commands that post-processes the data rows fetched, e.g.

  • source=index | head 100000 | top <field>
  • source=index | head 100000 | <ml command>

Similar also works for SQL LIMIT operator.

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

public Integer getMaxResultWindow() {
// system index doesn't need this function
// the magic number is the number of fields of the mapping table
return 27;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a better way to do this

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2022

Codecov Report

Merging #716 (52ba001) into main (792d3ff) will decrease coverage by 2.93%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main     #716      +/-   ##
============================================
- Coverage     97.76%   94.83%   -2.94%     
- Complexity     2880     2898      +18     
============================================
  Files           276      287      +11     
  Lines          7077     7802     +725     
  Branches        447      568     +121     
============================================
+ Hits           6919     7399     +480     
- Misses          157      349     +192     
- Partials          1       54      +53     
Flag Coverage Δ
query-workbench 62.76% <ø> (?)
sql-engine 97.78% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...opensearch/sql/planner/logical/LogicalPlanDSL.java 100.00% <ø> (ø)
...ch/sql/opensearch/client/OpenSearchNodeClient.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/client/OpenSearchRestClient.java 100.00% <100.00%> (ø)
...sql/opensearch/request/OpenSearchQueryRequest.java 100.00% <100.00%> (ø)
...l/opensearch/request/OpenSearchRequestBuilder.java 100.00% <100.00%> (ø)
...ql/opensearch/request/OpenSearchScrollRequest.java 100.00% <100.00%> (ø)
...request/system/OpenSearchDescribeIndexRequest.java 100.00% <100.00%> (ø)
...search/sql/opensearch/storage/OpenSearchIndex.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/storage/OpenSearchIndexScan.java 100.00% <100.00%> (ø)
workbench/public/components/Header/Header.tsx 100.00% <0.00%> (ø)
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

*/
public void pushDownLimit(Integer limit, Integer offset) {
SearchSourceBuilder sourceBuilder = request.getSourceBuilder();
if (limit + offset > maxResultWindow) {
limit = maxResultWindow - offset;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for scrolling.
OpenSearch requires that from(offset) + size(limit) <= index.max_result_window. A limit operator with limit+offset > index.max_result_window will invoke scroll request multiple times, each with batch size index.max_result_window

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if condition limit + offset > maxResultWindow is used in different place, could we simply it?

Comment on lines 150 to 155
Integer limit = node.getLimit();
Integer offset = node.getOffset();
PlanContext planContext = context.getPlanContext();
if (limit + offset > planContext.getMaxResultWindow()) {
planContext.setIndexScanType(PlanContext.IndexScanType.SCROLL);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to reduce duplicate code. visitLimit and visitHead

} else {
fail("Search request after empty response returned already");
when(response.isEmpty()).thenReturn(true);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now fetch results in batches, additional search requests can be made when calling hasNext()

@@ -86,6 +88,19 @@ public Map<String, IndexMapping> getIndexMappings(String... indexExpression) {
}
}

@Override
public Integer getIndexMaxResultWindow(String... indexExpression) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments to explain the min(index...).

@@ -30,6 +30,8 @@ public interface OpenSearchClient {
*/
Map<String, IndexMapping> getIndexMappings(String... indexExpression);

Integer getIndexMaxResultWindow(String... indexExpression);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docs.

this.request = new OpenSearchQueryRequest(indexName,
settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory);
this.maxResultWindow = context.getMaxResultWindow();
switch (context.getIndexScanType()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i not sure how the PlanContext help in this case. seems we could build OpenSerachRequest when visit the logical plan tree.

@@ -38,7 +39,11 @@ public MergeLimitAndRelation() {
}

@Override
public LogicalPlan apply(LogicalLimit plan, Captures captures) {
public LogicalPlan apply(LogicalLimit plan, Captures captures, PlanContext context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keeping the Limit node?

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
}

@Override
public boolean hasNext() {
if (isAggregation) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregation should also support scroll, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I'll work on it next. Aggregation works quite different from non-agg queries. If there's a lot to change I'll consider making a separate PR for it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could open a issue if it is not covered in this PR?

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az marked this pull request as ready for review August 8, 2022 23:10
@seankao-az seankao-az requested a review from a team as a code owner August 8, 2022 23:10
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az requested a review from penghuo August 9, 2022 21:14
@@ -54,6 +58,36 @@ public Map<String, IndexMapping> getIndexMappings(String... indexExpression) {
}
}

@Override
public Map<String, Integer> getIndexMaxResultWindows(String... indexExpression) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need mapping or just single maxResultWindow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approach this in a way similar to how getIndexMappings handles it.
For the client, getIndexMappings and getIndexMaxResultWindows return a Mapping from the index name to the corresponding result, without extra business logic.
In OpenSearchDescribeIndexRequest, getMaxResultWindow chooses the minimum of these values. This is specific to our need: if there's multiple indices, get the minimum max_result_window. Similarly, getFieldTypes unions the mappings of multiple indices here.

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az requested a review from penghuo August 11, 2022 18:50
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@seankao-az seankao-az merged commit f3c9c29 into opensearch-project:main Aug 12, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-716-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f3c9c29538abdf93780e156edba3558fd43479da
# Push it to GitHub
git push --set-upstream origin backport/backport-716-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-716-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants