-
Notifications
You must be signed in to change notification settings - Fork 181
Support sort expression pushdown for SortMergeJoin #4830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sort expression pushdown for SortMergeJoin #4830
Conversation
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
| public void testSimpleSortExprPushdownForSMJ() throws Exception { | ||
| String query = | ||
| "source=opensearch-sql_test_index_bank | join left=a right=b on a.age + 1 = b.balance - 20" | ||
| + " opensearch-sql_test_index_bank"; | ||
| var result = explainQueryYaml(query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that this query is not executable with the following error:
java.lang.RuntimeException: java.sql.SQLException: exception while executing query: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$2(OpenSearchExecutionEngine.java:217) ~[opensearch-sql-3.4.0.0-SNAPSHOT.jar:3.4.0.0-SNAPSHOT]
It may result from the type difference between age and balance. I see quite a lot type cast to Integer in the generated code.
This should be another problem that's irrelevant to this optimization though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@songkant-aws does this issue be fixed? I don't see any IT for correctness. can you add more tests in CalcitePPLJoinIT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is caused by type mismatch at runtime due to different index mapping types. The plan explanation can still pass. I think it's not a scope of this PR. Dynamic type conversion at runtime or type validation checker could potentially address it.
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughRefactors sort-expression pushdown to operate on abstract scan types, adds pushdownSortExpr and copy() to AbstractCalciteIndexScan, implements copy() in CalciteEnumerableIndexScan, removes sort pushdown internals from CalciteLogicalIndexScan, updates planner rules to match Sort/Project/AbstractCalciteIndexScan with convention guards, and adds integration tests plus expected explain outputs for SMJ sort-pushdown scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Rule as SortExprIndexScanRule
participant Project as Project
participant Scan as AbstractCalciteIndexScan
participant PushCtx as PushDownContext
Note over Rule,Project: Rule matches (Sort, Project, AbstractCalciteIndexScan)
Rule->>Rule: verify same convention
alt conventions match
Rule->>Project: mapThroughProject(...) // collect expressions
Rule->>Scan: extractSortExpressionInfos(...)
Rule->>Scan: pushdownSortExpr(sortExprDigests)
Scan->>Scan: build SortBuilders (field or script)
Scan->>PushCtx: clone/create PushDownContext with SORT/PROJECT entries
PushCtx-->>Scan: updated scan (pushdown registered)
Scan-->>Rule: return copy() of scan with new pushdown
else conventions differ
Rule-->>Rule: no transformation (early return)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (1)
418-492: Sort expression pushdown implementation is well-structured.The method correctly handles both simple field sorts and complex expression sorts through different code paths:
- Simple field references use
fieldSort(lines 446-456)- Complex expressions use
scriptSortwithPredicateAnalyzer.ScriptQueryExpression(lines 457-478)The use of
Supplierpattern for deferredSortBuildercreation is appropriate, and theMISSING_MAXparameter (lines 459-464) correctly controls null handling for differentNullDirectionvalues.Minor observation: The assertion on line 458 (
assert sortExpr instanceof RexCall) will be stripped in production builds. Consider whether an explicit runtime check with exception is more appropriate for production safety.Consider replacing the assertion with an explicit runtime check:
- assert sortExpr instanceof RexCall : "sort expression should be RexCall"; + if (!(sortExpr instanceof RexCall)) { + throw new PushDownUnSupportedException( + "Complex sort expression must be RexCall, got: " + sortExpr.getClass()); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_push.json(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_pushdown_for_smj.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_single_expr_output_push.json(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_simple_sort_expr_pushdown_for_smj.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_sort_pass_through_join_then_pushdown.yaml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java(6 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java(2 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java(5 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Project.java (1)
Project(26-137)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Project.java (1)
Project(26-137)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchTextType.java (1)
OpenSearchTextType(21-106)opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java (1)
PushDownUnSupportedException(78-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (21)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_simple_sort_expr_pushdown_for_smj.yaml (1)
1-21: Expected output for non-pushdown SMJ scenario looks correct.The YAML correctly represents a SortMergeJoin plan where sort operations remain at the Enumerable level (
EnumerableSortat lines 15 and 18) rather than being pushed down into the index scans. This aligns with the PR's goal of testing both pushdown and non-pushdown scenarios.integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_single_expr_output_push.json (1)
4-4: PushDownContext ordering change looks intentional.The reordering of
PROJECTbeforeSORTin thePushDownContextrepresentation appears to be a consequence of the refactoring to abstract-level pushdown. The actual query behavior (projectingbalance, sorting bybalanceascending with nulls first, limit 10000) remains correct.integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_push.json (1)
4-4: Consistent with other expected output updates.The
PushDownContextordering change (PROJECT before SORT) is consistent with the pattern seen in other updated expected outputs. The underlying query semantics remain unchanged.integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_pushdown_for_smj.yaml (1)
1-28: Expected output correctly demonstrates sort pushdown for SortMergeJoin.The physical plan shows that sort operations are successfully pushed down into
CalciteEnumerableIndexScan(visible inPushDownContextat lines 16-21 and 23-28), eliminating theEnumerableSortoperators that appear in the no-pushdown variant. This validates the PR's core functionality.Key observations:
- First scan sorts by
age(matching+($10, 1)join key component)- Second scan sorts by
balance(matching-($7, 20)join key component)requestedTotalSize=2147483647on line 21 correctly represents an unbounded first inputrequestedTotalSize=50000on line 28 correctly reflects theJOIN_SUBSEARCH_MAXOUTlimitopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)
106-110: I'm unable to verify the review comment because the repository cloning is failing. However, examining the review comment itself reveals an important issue:The review comment is incomplete—it ends with a verification request `` but no verification results are provided.
The original review comment:
- Approves the
@Overrideannotation addition (lines 106-110)- Requests verification that
AbstractCalciteIndexScancontains thepushdownSortExpr()method- Includes an ast-grep shell script for verification
- But does not include the results of that verification
Since I cannot access the repository to complete the verification myself, and the review comment is presented as incomplete with an outstanding verification request, here is my rewritten review comment:
@Overrideannotation addition requires verification of parent class implementation.The
@Overrideannotation oncopy()(lines 106-110) is correct ifAbstractCalciteIndexScancontains thepushdownSortExpr()implementation. This refactoring should move pushdown logic to the abstract base class, enabling both logical and physical index scans to leverage the same SortMergeJoin pushdown capability. Verification needed: Confirm thatAbstractCalciteIndexScancontains thepushdownSortExpr()method before approving.opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java (1)
73-77: LGTM! Proper deep-copy implementation.The
copy()method correctly creates a new scan instance with a clonedpushDownContext, which is essential to prevent shared mutable state across different plan alternatives during optimization.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj.yaml (1)
1-20: Test resource properly structured for baseline validation.This YAML captures the expected plan when sort expression pushdown is disabled, showing
EnumerableSortoperators remaining above scans. This serves as an appropriate baseline for comparing against the pushdown-enabled variant.integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (1)
1-17: LGTM! Pushdown context properly demonstrates sort expression pushdown.The YAML correctly shows sort expression pushdown into the scan operator with detailed script parameters. The
PushDownContexton line 16 demonstrates the successful pushdown of theREX_EXTRACTsort expression with appropriateMISSING_MAX:falseforNULLS_FIRSThandling.integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml (1)
1-21: Excellent test coverage demonstrating mixed sort pushdown scenarios.This YAML effectively validates both complex (scripted
REX_EXTRACT) and simple (field-based) sort expression pushdown within a Sort-Merge Join. The different null handling approaches (MISSING_MAX:truevs.missing: "_last") appropriately reflect the sort requirements on each join side.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (2)
49-51: Essential convention compatibility check.The convention check correctly prevents the rule from transposing
SortandProjectoperators across convention boundaries, which is necessary since the rule now matches abstract operator types (both logical and physical).
133-140: Operand configuration correctly generalized to abstract types.The changes from
LogicalSort/LogicalProjecttoSort/Projectappropriately enable the rule to work with both logical and physical operators (e.g.,EnumerableSort), which is essential for sort expression pushdown in Sort-Merge Join scenarios. The updated predicate correctly referencesProject::containsOver.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)
1013-1021: Test correctly validates complex sort expression pushdown for SMJ.The test exercises the scenario where a
REX_EXTRACTexpression is used in a join key, forcing a Sort-Merge Join that requires sorting. The pushdown optimization should push this complex sort expression into the scan operator.Note: A past review comment (lines 1024-1028) mentions a potential runtime type casting issue between
LongandInteger, though it's noted as unrelated to this optimization.
1023-1031: Test appropriately covers simple arithmetic expression pushdown.This test validates pushdown of simple arithmetic expressions (
age + 1,balance - 20) used as join keys in a Sort-Merge Join, complementing the complex expression test case.
1033-1042: Test validates sort pass-through optimization with join.This test correctly exercises the scenario where a sort operation after a join can be pushed down through the join and into the scan operator, validating the "pass-through" optimization path for left joins.
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_sort_pass_through_join_then_pushdown.yaml (1)
1-20: Baseline test resource appropriately structured.This YAML correctly captures the expected plan without sort pushdown optimization, showing
EnumerableSort(line 15) remaining above the scan operator. This serves as the baseline for validating the pushdown optimization.opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (2)
265-265: Abstract method appropriately requires deep-copy implementation.The abstract
copy()method correctly enforces that subclasses implement proper deep-copy behavior, which is essential for the planner to duplicate relational expressions with independent pushdown state during optimization.
523-540: Based on the web search results, I can now verify the original review comment's claims:The review comment's suggestion is incorrect. OpenSearch's script-based sorting supports only "number" and "string" types. The web search confirms:
OpenSearch script sort limitation: OpenSearch
_scriptsort configuration accepts only type"number"(for numeric values) or"string"(for text values). These are the only two types supported by OpenSearch's scripting sort functionality.SQL type mapping: While the SQL plugin documents that DATE/date_nanos map to SQL TIMESTAMP types, this does not mean these types can be directly used in script-based sorts. They would require explicit conversion to string or numeric values first.
Current implementation correctness: The
getScriptSortType()method correctly:
- Maps
CHAR_TYPES→ScriptSortType.STRING- Maps
INT_TYPESandAPPROX_TYPES→ScriptSortType.NUMBER- Throws
PushDownUnSupportedExceptionfor unsupported types (DATE, TIMESTAMP, BINARY, etc.)The throw for unsupported types is the correct behavior since OpenSearch cannot handle date/timestamp values directly in script sorts without prior conversion.
The review comment does not reflect actual OpenSearch capabilities and should be corrected.
The original suggestion to extend type mapping for DATE and TIMESTAMP support is based on an incorrect assumption about OpenSearch's script sort functionality. The current implementation is appropriate and doesn't need modification.
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (4)
49-56: Convention consistency guard looks correctUsing
Sort,Project, andAbstractCalciteIndexScanin the operands and then explicitly checking that all three share the same convention is a clean way to avoid mismatched logical/physical patterns now that the rule is more generic. No changes needed here.
133-138: Helper signatures generalized appropriately to AbstractCalciteIndexScanUpdating
extractSortExpressionInfosandmapThroughProjectto takeAbstractCalciteIndexScanaligns them with the broader rule and keeps usage (getRowType().getFieldNames(), etc.) within the base API surface. The generalization looks type‑safe and consistent with the rest of the changes.Also applies to: 169-175
244-262: Operand pattern broadening matches the rule’s new scopeSwitching the config to:
- root operand
Sort.class,- middle
Project.class,- leaf
AbstractCalciteIndexScan.class,is consistent with the goal of supporting both logical and enumerable paths (e.g., SMJ‑inserted
EnumerableSort) and with theonMatchconvention check andnoAggregatePushedpredicate.Functionally this looks sound; just ensure your tests cover both logical (
LogicalSort) and physical (EnumerableSort) scenarios so we don’t miss regressions when new conventions are introduced.
96-96: I'm unable to access the repository directly due to cloning issues. To complete the verification of this review comment, I need to examine the actual code. Could you please provide:
- The code snippet from
SortExprIndexScanRule.java(lines 90-115) showing the context around the cast and theextractLimitValuelogic- The implementation of the
extractLimitValuemethod to understand what it returns for non-LogicalSortinstances- Any usages of
EnumerableSortin the codebase that might setfetchoroffsetvaluesAlternatively, if you can share the file content or the relevant diff from the PR, I can complete the verification and provide a definitive rewritten review comment.
| public void testSimpleSortExprPushdownForSMJ() throws Exception { | ||
| String query = | ||
| "source=opensearch-sql_test_index_bank | join left=a right=b on a.age + 1 = b.balance - 20" | ||
| + " opensearch-sql_test_index_bank"; | ||
| var result = explainQueryYaml(query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@songkant-aws does this issue be fixed? I don't see any IT for correctness. can you add more tests in CalcitePPLJoinIT?
| "source=opensearch-sql_test_index_bank | rex field=lastname \\\"(?<initial>^[A-Z])\\\" |" | ||
| + " join left=a right=b on a.initial = b.firstname opensearch-sql_test_index_bank"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a case of
source=opensearch-sql_test_index_bank
| rex field=lastname \\\"(?<initial>^[A-Z])\\\"
| join type=left max=1 initial opensearch-sql_test_index_bank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this explain test, and some other IT test cases to verify correctness.
| if (SqlTypeName.CHAR_TYPES.contains(relDataType.getSqlTypeName())) { | ||
| return ScriptSortType.STRING; | ||
| } else if (SqlTypeName.INT_TYPES.contains(relDataType.getSqlTypeName()) | ||
| || SqlTypeName.APPROX_TYPES.contains(relDataType.getSqlTypeName())) { | ||
| return ScriptSortType.NUMBER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it check type family?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SqlTypeFamily check is almost the same with above raw types check. Also, SqlTypeFamily check allows ANY type, which is not expected here.
Signed-off-by: Songkan Tang <songkant@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
1015-1055: Consider adding correctness verification tests.These four new test methods appropriately verify the explain plans and optimization paths for sort expression pushdown with SortMergeJoin. However, they only validate plan structure, not query execution correctness. Given past review comments noting runtime type-casting issues, consider adding integration tests in
CalcitePPLJoinITor similar that verify the actual query results to ensure the optimization produces correct output.Based on learnings, Calcite integration changes should test SQL generation and optimization paths—which these tests do. However, end-to-end correctness verification would strengthen confidence in the pushdown logic.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
🔇 Additional comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java (3)
960-975: LGTM! Validates simple expression-based join for SMJ.The test correctly validates SortMergeJoin with arithmetic expression join keys (
a.age + 3 = b.age - 2). The expected results are accurate (Jane's age 20+3 = John's age 25-2 = 23, and John's age 25+3 = Hello's age 30-2 = 28).
977-1003: LGTM! Validates complex expression-based join with derived fields.The test correctly validates SortMergeJoin with complex substring-based join keys. The expected results accurately reflect the cartesian product behavior when multiple records share the same derived field value (name2 and state2).
1005-1027: LGTM! Validates SMJ pushdown with max option and field list.The test correctly validates that sort expression pushdown for SortMergeJoin works in combination with the
max=1option and field list syntax. The expected results align with the max-limiting behavior (one match per left-side record) and the field list join semantics.
…t#4830) * Support sort expression pushdown for SortMergeJoin Signed-off-by: Songkan Tang <songkant@amazon.com> * Remove old duplicate methods in CalciteLogicalIndexScan Signed-off-by: Songkan Tang <songkant@amazon.com> * Add more tests Signed-off-by: Songkan Tang <songkant@amazon.com> --------- Signed-off-by: Songkan Tang <songkant@amazon.com>
…#4830) (#4928) * Support sort expression pushdown for SortMergeJoin (#4830) * Support sort expression pushdown for SortMergeJoin Signed-off-by: Songkan Tang <songkant@amazon.com> * Remove old duplicate methods in CalciteLogicalIndexScan Signed-off-by: Songkan Tang <songkant@amazon.com> * Add more tests Signed-off-by: Songkan Tang <songkant@amazon.com> --------- Signed-off-by: Songkan Tang <songkant@amazon.com> * Fix known issue of IT failure in 2.19-dev Signed-off-by: Songkan Tang <songkant@amazon.com> * Fix wrong pipe character Signed-off-by: Songkan Tang <songkant@amazon.com> --------- Signed-off-by: Songkan Tang <songkant@amazon.com>
Description
We observed there are some common user queries following the pattern like
source=tableA | rex '...<?rex_field>...' | join left=a right=b on a.rex_field=b.field tableB. Join keys reference to a expression output instead of original table field. This query will be transformed to SortMergeJoin where it sorts by join keys on each join child.SortMergeJoin inserts EnumerableSort operators at physical plan optimization. Previous sort expression pushdown optimization #4750 only provides ability of logical plan optimization, aka pushdown LogicalSort into scan. This PR expands the ability to push down EnumerableSort by simple or complex sort expressions in case of SortMergeJoin.
Related Issues
Resolves #4823
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.