Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Jul 10, 2025

Description

In Calcite + OpenSearch, the default physical Join operator is EnumerableMergeJoin:

EnumerableMergeJoin
\-EnumerableSort
  \-CalciteEnumerableIndexScan
\-EnumerableSort
  \-CalciteEnumerableIndexScan

After this patching, the EnumerableSort operators could be pushed down to DSL:

EnumerableMergeJoin
\-CalciteEnumerableIndexScan(PushDownContext=[SORT->"join_key"])
\-CalciteEnumerableIndexScan(PushDownContext=[SORT->"join_key"])

In v3, a string field can enable sort pushdown if

  • the field type is KEYWORD
  • or the field type is TEXT with keyword subfield

Related Issues

Resolves #3863

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin added enhancement New feature or request calcite calcite migration releated pushdown pushdown related issues labels Jul 10, 2025
…ta=true

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Map<String, OpenSearchDataType> finalResult = new HashMap<>();
for (Map<String, OpenSearchDataType> map : candidateMaps) {
OpenSearchDescribeIndexRequest.mergeObjectAndArrayInsideMap(finalResult, map);
MergeRuleHelper.merge(finalResult, map);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related. just fix a compile error in benchmarks module.

Comment on lines +6 to +12
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
Copy link
Member Author

@LantaoJin LantaoJin Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gender in test sample data should contain keyword subfield. If else, we cannot sorting, aggregate on it. Sorting on a fielddata field would get unstable result. ref link

Comment on lines -374 to -380
rows(1, 20, "f", "VA"),
rows(1, 30, "f", "IN"),
rows(1, 30, "f", "PA"),
rows(1, 30, "m", "IL"),
rows(1, 30, "m", "MD"),
rows(1, 30, "m", "TN"),
rows(1, 30, "m", "WA"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lower case values in v2 were caused by grouping by a fielddata field.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
explainQueryToString(
"source=opensearch-sql_test_index_account "
+ "| sort account_number, firstname, address, balance "
+ "| sort - balance, - gender, address "
Copy link
Member Author

@LantaoJin LantaoJin Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not sort on text type without keyword subfield in v3. Even sorting on text type with fielddata=true will get unexpected order:
For example, if gender is a text type field with fielddata=true.

Check following queries and their outputs in v2.

index=test1:

gender
M a 1
M b 1
F c 2
F d d 1

source = test1 | sort gender returns

gender
M a 1
M b 1
F d d 1
F c 2

source = test1 | sort - gender returns

gender
M a 1
M b 1
F c 2
F d d 1

index=test2:

gender
F d d 1
M a 1
F c 2
M b 1

source=test2 | sort gender returns

gender
F d d 1
M a 1
M b 1
F c 2

source = test2 | sort - gender returns

gender
M a 1
M b 1
F d d 1
F c 2

The logic of sorting a fielddata is 1) tokenization 2) internal sort by token 3) sort by first token
ASC: "F d d 1" =tokenization=> "f", "d", "d", "1" =internal sort by token=> "1", "f", "d", "d" =sort by first token=> "1"
DESC: "F d d 1" =tokenization=> "f", "d", "d", "1" =internal sort by token=> "f", "d", "d", "1" =sort by first token=> "f"

@LantaoJin
Copy link
Member Author

Could you review this again?@qianheng-aws @penghuo

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
rows("F", "KS", 7),
rows("F", "CO", 7),
rows("F", "NV", 8),
isPushdownEnabled() ? rows("F", "AR", 8) : rows("F", "NV", 8),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we introduce this disparity between with&without pushdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added "keyword" subfield for the gender, agg pushdown by gender works in caclite.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pushdown enable impact results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pushdown enable impact results?

This result difference is not introduced by code changes in this PR. You can reproduce it in main branch when change the schema from

      "gender": {
        "type": "text",
        "fielddata": true
      },

to

      "gender": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },

The three results (v2 non-pushdown, v3 non-pushdown, v3 pushdown) are different.

v2 non-pushdown v3 non-pushdown v3 pushdown
"F", "VA", 8 "F", "NV", 8 "F", "AR", 8

But they are all correct. The rare command is not a deterministic command, it result depends on the order of return from OpenSearch and implementation (for v2). When the bucket pushdown works (by gender), the fetched data order is different with data order in non-pushdown.

rows("F", "KS", 7),
rows("F", "CO", 7),
rows("F", "NV", 8),
isPushdownEnabled() ? rows("F", "AR", 8) : rows("F", "NV", 8),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pushdown enable impact results?

public void onMatch(RelOptRuleCall call) {
final LogicalSort sort = call.rel(0);
final CalciteLogicalIndexScan scan = call.rel(1);
final Sort sort = call.rel(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin @qianheng-aws couple questions?

  1. Is OpenSearchSortIndexScanRule intended to be a physical tweak rule for EnumerableRel?
  2. Do we still need Logical plan optimization rule and CalciteLogicalIndexScan? Can all existing rules be expressed as physical tweak rules for EnumerableRel?

Copy link
Member Author

@LantaoJin LantaoJin Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnumerableSort can generate from logical sort operator and logical join operator with physical SortMerge implementation. So I changed the OpenSearchSortIndexScanRule rule from logical layer to physical layer. It equals keeping the OpenSearchSortIndexScanRule in logical layer and adding a new rule for EnumerableSort + SortMergeJoin in physical layer.

Almost pushdown rules can move to physical layer but I didn't see any benefit to do that. We need logical plan rules. some optimization should happen in logical layer instead of physical. For example a LogicalJoin operator may generate multiple physical Join operators. rule applying in logical should be more efficient.

@LantaoJin LantaoJin requested a review from penghuo July 17, 2025 03:19
@penghuo penghuo merged commit 71aa9ba into opensearch-project:main Jul 17, 2025
23 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3864-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 71aa9ba0490305b81284eb369f212a8fc6e3936f
# Push it to GitHub
git push --set-upstream origin backport/backport-3864-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-3864-to-2.19-dev.

LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Jul 17, 2025
…nsearch-project#3864)

* Support pushdown physical sort operator to speedup SortMergeJoin

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Enable sort pushdown when the type is TEXT without fields and fielddata=true

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add OpenSearchTextType.toKeywordSubField()

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Only push down sort when order is ASC for fielddata text

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert CalciteNoPushdownIT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert the fielddata logic

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* 'gender' in test data should contain keyword subfield

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert the typo change in test

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix no pushdown IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 71aa9ba)
penghuo pushed a commit that referenced this pull request Jul 21, 2025
…p SortMergeJoin (#3864) (#3889)

* Support pushdown physical sort operator to speedup SortMergeJoin (#3864)

* Support pushdown physical sort operator to speedup SortMergeJoin

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Enable sort pushdown when the type is TEXT without fields and fielddata=true

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add OpenSearchTextType.toKeywordSubField()

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Only push down sort when order is ASC for fielddata text

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert CalciteNoPushdownIT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert the fielddata logic

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* 'gender' in test data should contain keyword subfield

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert the typo change in test

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix no pushdown IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
(cherry picked from commit 71aa9ba)

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert changes in build.gradle

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. calcite calcite migration releated enhancement New feature or request pushdown pushdown related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Pushdown the Sort operators of SortMergeJoin

3 participants