Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Jul 4, 2025

Description

Support partial filter push down

Related Issues

Resolves #3470

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
@qianheng-aws qianheng-aws enabled auto-merge (squash) July 5, 2025 04:53
@qianheng-aws qianheng-aws disabled auto-merge July 5, 2025 04:53
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
@qianheng-aws
Copy link
Collaborator Author

ping @LantaoJin @penghuo

{
"calcite": {
"logical": "LogicalProject(age=[$8], address=[$2])\n LogicalFilter(condition=[AND(>=($8, 1), =($2, '880 Holmes Lane'))])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n",
"physical": "EnumerableCalc(expr#0..1=[{inputs}], expr#2=['880 Holmes Lane':VARCHAR], expr#3=[=($t0, $t2)], age=[$t1], address=[$t0], $condition=[$t3])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[address, age], FILTER->AND(>=($1, 1), =($0, '880 Holmes Lane'))], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"query\":{\"bool\":{\"must\":[{\"range\":{\"age\":{\"from\":1,\"to\":null,\"include_lower\":true,\"include_upper\":true,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[\"address\",\"age\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be FILTER->[>=($1, 1)] instead of FILTER->AND(>=($1, 1), =($0, '880 Holmes Lane')) in PushDownContext?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the original condition currently as its digest now. If we needs to use the pushed condition as its digest, we need to store that RexNode as well like non-pushed condition. Both is ok for functionality, the latter one should be more appropriate for explanation.

Will make that change.

@qianheng-aws
Copy link
Collaborator Author

Find a good case for Calcite with partial push down:

Calcite with partial down could execute this PPL while v2 will throw exception:

source=opensearch-sql_test_index_beer | eval answerId= AcceptedAnswerId + 1 | where simple_query_string(['Tags'], 'taste') and answerId > 200

It's because Calcite could only push the relevance function into scan through eval command since it's unrelated to the eval command. It's made by FilterProjectTransposeRule and Partial Push Down feature. And the final plan is:

EnumerableProject(ParentId=[$0], CreationDate=[$1], Title=[$2], ViewCount=[$3], LastEditorUserId=[$4], ContentLicense=[$5], OwnerUserId=[$6], Score=[$7], FavoriteCount=[$8], LastActivityDate=[$9], AnswerCount=[$10], CommentCount=[$11], ClosedDate=[$12], Id=[$13], LastEditDate=[$14], PostTypeId=[$15], AcceptedAnswerId=[$16], Body=[$17], Tags=[$18], $f19=[+($16, 1)])
  EnumerableFilter(condition=[>(+($16, 1), 200)])
    CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_beer]], PushDownContext=[[PROJECT->[ParentId, CreationDate, Title, ViewCount, LastEditorUserId, ContentLicense, OwnerUserId, Score, FavoriteCount, LastActivityDate, AnswerCount, CommentCount, ClosedDate, Id, LastEditDate, PostTypeId, AcceptedAnswerId, Body, Tags], FILTER->simple_query_string(MAP('fields', MAP('Tags':VARCHAR, 1.0E0:DOUBLE)), MAP('query', 'taste':VARCHAR))], OpenSearchRequestBuilder(sourceBuilder={"from":0,"timeout":"1m","query":{"bool":{"must":[{"simple_query_string":{"query":"taste","fields":["Tags^1.0"],"flags":-1,"default_operator":"or","analyze_wildcard":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["ParentId","CreationDate","Title","ViewCount","LastEditorUserId","ContentLicense","OwnerUserId","Score","FavoriteCount","LastActivityDate","AnswerCount","CommentCount","ClosedDate","Id","LastEditDate","PostTypeId","AcceptedAnswerId","Body","Tags"],"excludes":[]},"sort":[{"_doc":{"order":"asc"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])

Signed-off-by: Heng Qian <qianheng@amazon.com>
LantaoJin
LantaoJin previously approved these changes Jul 15, 2025
Comment on lines +125 to +130
if (queryExpression.isPartial()) {
// Only CompoundQueryExpression could be partial.
List<RexNode> conditions = queryExpression.getUnAnalyzableNodes();
RexNode newCondition = constructCondition(conditions, getCluster().getRexBuilder());
return filter.copy(filter.getTraitSet(), newScan, newCondition);
}
Copy link
Collaborator

@penghuo penghuo Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, pushDownFilter should return newScan, if it is partial, return pair of newScan and un-pushed filters, then let IndexScanRule create new Filter node?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

…ushDown

# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java
Signed-off-by: Heng Qian <qianheng@amazon.com>
@penghuo penghuo merged commit 0b4423e into opensearch-project:main Jul 18, 2025
23 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3850-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0b4423e9f3b50670af922a8608116d7182fd728f
# Push it to GitHub
git push --set-upstream origin backport/backport-3850-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-3850-to-2.19-dev.

@LantaoJin
Copy link
Member

@qianheng-aws please manually backport via above instructions.

qianheng-aws added a commit that referenced this pull request Jul 21, 2025
* Support partial filter push down

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add doc for PushDownAction

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code to only keep non-push-down condition in the new filter

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Only show the pushed conditions in the PushDownContext

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Ignore test when push down disabled

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT after merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT because mapping changed after merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
(cherry picked from commit 0b4423e)
@qianheng-aws
Copy link
Collaborator Author

@qianheng-aws please manually backport via above instructions.

Backport PR is here: #3899

penghuo pushed a commit that referenced this pull request Jul 21, 2025
* Support partial filter push down



* Add doc for PushDownAction



* Fix IT



* Refine code to only keep non-push-down condition in the new filter



* Refine code



* Only show the pushed conditions in the PushDownContext



* Ignore test when push down disabled



* Fix IT after merging main



* Fix IT because mapping changed after merging main



---------


(cherry picked from commit 0b4423e)

Signed-off-by: Heng Qian <qianheng@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. calcite calcite migration releated enhancement New feature or request pushdown pushdown related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Partial pushdown with Calcite

3 participants