-
Notifications
You must be signed in to change notification settings - Fork 180
Support pushdown physical sort operator to speedup SortMergeJoin #3864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
24a0488
9ebdb92
7142b9a
d6b1b67
1edb107
dd8334d
7773d04
7c4717e
d5ae3ab
fa0fb77
23e179e
3e9e60b
2d0b7c7
4ecd58f
511400f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -134,7 +134,7 @@ public void testMultiSortPushDownExplain() throws IOException { | |||||||||||||||||||||||||||||||
| explainQueryToString( | ||||||||||||||||||||||||||||||||
| "source=opensearch-sql_test_index_account " | ||||||||||||||||||||||||||||||||
| + "| sort account_number, firstname, address, balance " | ||||||||||||||||||||||||||||||||
| + "| sort - balance, - gender, address " | ||||||||||||||||||||||||||||||||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do not sort on text type without keyword subfield in v3. Even sorting on text type with fielddata=true will get unexpected order: Check following queries and their outputs in v2. index=test1:
index=test2:
The logic of sorting a fielddata is 1) tokenization 2) internal sort by token 3) sort by first token |
||||||||||||||||||||||||||||||||
| + "| sort - balance, - gender, account_number " | ||||||||||||||||||||||||||||||||
| + "| fields account_number, firstname, address, balance, gender")); | ||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -59,7 +59,7 @@ public void testRareWithGroup() throws IOException { | |||||||
| rows("F", "OK", 7), | ||||||||
| rows("F", "KS", 7), | ||||||||
| rows("F", "CO", 7), | ||||||||
| rows("F", "NV", 8), | ||||||||
| isPushdownEnabled() ? rows("F", "AR", 8) : rows("F", "NV", 8), | ||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we introduce this disparity between with&without pushdown?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added "keyword" subfield for the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why pushdown enable impact results?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This result difference is not introduced by code changes in this PR. You can reproduce it in main branch when change the schema from to The three results (v2 non-pushdown, v3 non-pushdown, v3 pushdown) are different.
But they are all correct. The rare command is not a deterministic command, it result depends on the order of return from OpenSearch and implementation (for v2). When the bucket pushdown works (by gender), the fetched data order is different with data order in non-pushdown. |
||||||||
| rows("M", "NE", 5), | ||||||||
| rows("M", "RI", 5), | ||||||||
| rows("M", "NV", 5), | ||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,7 +11,6 @@ | |
| import static org.opensearch.sql.util.MatcherUtils.rows; | ||
| import static org.opensearch.sql.util.MatcherUtils.schema; | ||
| import static org.opensearch.sql.util.MatcherUtils.verifyDataRows; | ||
| import static org.opensearch.sql.util.MatcherUtils.verifyDataRowsInOrder; | ||
| import static org.opensearch.sql.util.MatcherUtils.verifySchema; | ||
| import static org.opensearch.sql.util.MatcherUtils.verifySchemaInOrder; | ||
|
|
||
|
|
@@ -358,27 +357,15 @@ public void testStatsBySpanAndMultipleFields() throws IOException { | |
| schema("span(age,10)", null, "int"), | ||
| schema("gender", null, "string"), | ||
| schema("state", null, "string")); | ||
| if (isCalciteEnabled()) { | ||
| verifyDataRows( | ||
| response, | ||
| rows(1, 20, "F", "VA"), | ||
| rows(1, 30, "F", "IN"), | ||
| rows(1, 30, "F", "PA"), | ||
| rows(1, 30, "M", "IL"), | ||
| rows(1, 30, "M", "MD"), | ||
| rows(1, 30, "M", "TN"), | ||
| rows(1, 30, "M", "WA")); | ||
| } else { | ||
| verifyDataRowsInOrder( | ||
| response, | ||
| rows(1, 20, "f", "VA"), | ||
| rows(1, 30, "f", "IN"), | ||
| rows(1, 30, "f", "PA"), | ||
| rows(1, 30, "m", "IL"), | ||
| rows(1, 30, "m", "MD"), | ||
| rows(1, 30, "m", "TN"), | ||
| rows(1, 30, "m", "WA")); | ||
|
Comment on lines
-374
to
-380
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this lower case values in v2 were caused by grouping by a |
||
| } | ||
| verifyDataRows( | ||
| response, | ||
| rows(1, 20, "F", "VA"), | ||
| rows(1, 30, "F", "IN"), | ||
| rows(1, 30, "F", "PA"), | ||
| rows(1, 30, "M", "IL"), | ||
| rows(1, 30, "M", "MD"), | ||
| rows(1, 30, "M", "TN"), | ||
| rows(1, 30, "M", "WA")); | ||
| } | ||
|
|
||
| @Test | ||
|
|
@@ -395,27 +382,15 @@ public void testStatsByMultipleFieldsAndSpan() throws IOException { | |
| schema("span(age,10)", null, "int"), | ||
| schema("gender", null, "string"), | ||
| schema("state", null, "string")); | ||
| if (isCalciteEnabled()) { | ||
| verifyDataRows( | ||
| response, | ||
| rows(1, 20, "F", "VA"), | ||
| rows(1, 30, "F", "IN"), | ||
| rows(1, 30, "F", "PA"), | ||
| rows(1, 30, "M", "IL"), | ||
| rows(1, 30, "M", "MD"), | ||
| rows(1, 30, "M", "TN"), | ||
| rows(1, 30, "M", "WA")); | ||
| } else { | ||
| verifyDataRowsInOrder( | ||
| response, | ||
| rows(1, 20, "f", "VA"), | ||
| rows(1, 30, "f", "IN"), | ||
| rows(1, 30, "f", "PA"), | ||
| rows(1, 30, "m", "IL"), | ||
| rows(1, 30, "m", "MD"), | ||
| rows(1, 30, "m", "TN"), | ||
| rows(1, 30, "m", "WA")); | ||
| } | ||
| verifyDataRows( | ||
| response, | ||
| rows(1, 20, "F", "VA"), | ||
| rows(1, 30, "F", "IN"), | ||
| rows(1, 30, "F", "PA"), | ||
| rows(1, 30, "M", "IL"), | ||
| rows(1, 30, "M", "MD"), | ||
| rows(1, 30, "M", "TN"), | ||
| rows(1, 30, "M", "WA")); | ||
| } | ||
|
|
||
| @Test | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalProject(account_number=[$0], firstname=[$1], address=[$2], birthdate=[$3], gender=[$4], city=[$5], lastname=[$6], balance=[$7], employer=[$8], state=[$9], age=[$10], email=[$11], male=[$12], r.account_number=[$13], r.firstname=[$14], r.address=[$15], r.birthdate=[$16], r.gender=[$17], r.city=[$18], r.lastname=[$19], r.balance=[$20], r.employer=[$21], r.state=[$22], r.age=[$23], r.email=[$24], r.male=[$25])\n LogicalJoin(condition=[=($0, $13)], joinType=[inner])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], birthdate=[$3], gender=[$4], city=[$5], lastname=[$6], balance=[$7], employer=[$8], state=[$9], age=[$10], email=[$11], male=[$12])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], birthdate=[$3], gender=[$4], city=[$5], lastname=[$6], balance=[$7], employer=[$8], state=[$9], age=[$10], email=[$11], male=[$12])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])\n", | ||
| "physical": "EnumerableMergeJoin(condition=[=($0, $13)], joinType=[inner])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]], PushDownContext=[[PROJECT->[account_number, firstname, address, birthdate, gender, city, lastname, balance, employer, state, age, email, male], SORT->[{\n \"account_number\" : {\n \"order\" : \"asc\",\n \"missing\" : \"_last\"\n }\n}]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"birthdate\",\"gender\",\"city\",\"lastname\",\"balance\",\"employer\",\"state\",\"age\",\"email\",\"male\"],\"excludes\":[]},\"sort\":[{\"account_number\":{\"order\":\"asc\",\"missing\":\"_last\"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]], PushDownContext=[[PROJECT->[account_number, firstname, address, birthdate, gender, city, lastname, balance, employer, state, age, email, male], SORT->[{\n \"account_number\" : {\n \"order\" : \"asc\",\n \"missing\" : \"_last\"\n }\n}]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"birthdate\",\"gender\",\"city\",\"lastname\",\"balance\",\"employer\",\"state\",\"age\",\"email\",\"male\"],\"excludes\":[]},\"sort\":[{\"account_number\":{\"order\":\"asc\",\"missing\":\"_last\"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalSort(sort0=[$3], sort1=[$4], sort2=[$2], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4])\n LogicalSort(sort0=[$3], sort1=[$4], sort2=[$2], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], sort3=[$3], dir0=[ASC-nulls-first], dir1=[ASC-nulls-first], dir2=[ASC-nulls-first], dir3=[ASC-nulls-first])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender], SORT->[{\n \"balance\" : {\n \"order\" : \"desc\",\n \"missing\" : \"_last\"\n }\n}, {\n \"gender\" : {\n \"order\" : \"desc\",\n \"missing\" : \"_last\"\n }\n}, {\n \"address\" : {\n \"order\" : \"asc\",\n \"missing\" : \"_first\"\n }\n}]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\"],\"excludes\":[]},\"sort\":[{\"balance\":{\"order\":\"desc\",\"missing\":\"_last\"}},{\"gender\":{\"order\":\"desc\",\"missing\":\"_last\"}},{\"address\":{\"order\":\"asc\",\"missing\":\"_first\"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n" | ||
| "logical": "LogicalSort(sort0=[$3], sort1=[$4], sort2=[$0], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4])\n LogicalSort(sort0=[$3], sort1=[$4], sort2=[$0], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], sort3=[$3], dir0=[ASC-nulls-first], dir1=[ASC-nulls-first], dir2=[ASC-nulls-first], dir3=[ASC-nulls-first])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender], SORT->[{\n \"balance\" : {\n \"order\" : \"desc\",\n \"missing\" : \"_last\"\n }\n}, {\n \"gender.keyword\" : {\n \"order\" : \"desc\",\n \"missing\" : \"_last\"\n }\n}, {\n \"account_number\" : {\n \"order\" : \"asc\",\n \"missing\" : \"_first\"\n }\n}]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\"],\"excludes\":[]},\"sort\":[{\"balance\":{\"order\":\"desc\",\"missing\":\"_last\"}},{\"gender.keyword\":{\"order\":\"desc\",\"missing\":\"_last\"}},{\"account_number\":{\"order\":\"asc\",\"missing\":\"_first\"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalProject(account_number=[$0], firstname=[$1], address=[$2], birthdate=[$3], gender=[$4], city=[$5], lastname=[$6], balance=[$7], employer=[$8], state=[$9], age=[$10], email=[$11], male=[$12], r.account_number=[$13], r.firstname=[$14], r.address=[$15], r.birthdate=[$16], r.gender=[$17], r.city=[$18], r.lastname=[$19], r.balance=[$20], r.employer=[$21], r.state=[$22], r.age=[$23], r.email=[$24], r.male=[$25])\n LogicalJoin(condition=[=($0, $13)], joinType=[inner])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], birthdate=[$3], gender=[$4], city=[$5], lastname=[$6], balance=[$7], employer=[$8], state=[$9], age=[$10], email=[$11], male=[$12])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], birthdate=[$3], gender=[$4], city=[$5], lastname=[$6], balance=[$7], employer=[$8], state=[$9], age=[$10], email=[$11], male=[$12])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])\n", | ||
| "physical": "EnumerableMergeJoin(condition=[=($0, $13)], joinType=[inner])\n EnumerableSort(sort0=[$0], dir0=[ASC])\n EnumerableCalc(expr#0..18=[{inputs}], proj#0..12=[{exprs}])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])\n EnumerableSort(sort0=[$0], dir0=[ASC])\n EnumerableCalc(expr#0..18=[{inputs}], proj#0..12=[{exprs}])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalSort(sort0=[$3], sort1=[$4], sort2=[$2], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4])\n LogicalSort(sort0=[$3], sort1=[$4], sort2=[$2], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], sort3=[$3], dir0=[ASC-nulls-first], dir1=[ASC-nulls-first], dir2=[ASC-nulls-first], dir3=[ASC-nulls-first])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "EnumerableSort(sort0=[$3], sort1=[$4], sort2=[$2], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n EnumerableCalc(expr#0..16=[{inputs}], proj#0..4=[{exprs}])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n" | ||
| "logical": "LogicalSort(sort0=[$3], sort1=[$4], sort2=[$0], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4])\n LogicalSort(sort0=[$3], sort1=[$4], sort2=[$0], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], sort3=[$3], dir0=[ASC-nulls-first], dir1=[ASC-nulls-first], dir2=[ASC-nulls-first], dir3=[ASC-nulls-first])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "EnumerableSort(sort0=[$3], sort1=[$4], sort2=[$0], dir0=[DESC-nulls-last], dir1=[DESC-nulls-last], dir2=[ASC-nulls-first])\n EnumerableCalc(expr#0..16=[{inputs}], proj#0..4=[{exprs}])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related. just fix a compile error in benchmarks module.