-
Notifications
You must be signed in to change notification settings - Fork 178
Convert like function call to wildcard query for Calcite filter pushdown #3915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
44a6b48
aaa83f2
381c4d7
0ddb5b2
2f2bb48
698c8ce
a7ca28f
ea9b43e
b457cb8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10])\n LogicalFilter(condition=[ILIKE($1, '%mbe%':VARCHAR, '\\')])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname], FILTER->ILIKE($1, '%mbe%':VARCHAR, '\\')], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"query\":{\"wildcard\":{\"firstname.keyword\":{\"wildcard\":\"*mbe*\",\"case_insensitive\":true,\"boost\":1.0}}},\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\",\"city\",\"employer\",\"state\",\"age\",\"email\",\"lastname\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10])\n LogicalFilter(condition=[ILIKE($2, '%Holmes%':VARCHAR, '\\')])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "EnumerableCalc(expr#0..10=[{inputs}], expr#11=['%Holmes%':VARCHAR], expr#12=['\\'], expr#13=[ILIKE($t2, $t11, $t12)], proj#0..10=[{exprs}], $condition=[$t13])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\",\"city\",\"employer\",\"state\",\"age\",\"email\",\"lastname\"],\"excludes\":[]}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10])\n LogicalFilter(condition=[ILIKE($1, '%mbe%':VARCHAR, '\\')])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "EnumerableCalc(expr#0..16=[{inputs}], expr#17=['%mbe%':VARCHAR], expr#18=['\\'], expr#19=[ILIKE($t1, $t17, $t18)], proj#0..10=[{exprs}], $condition=[$t19])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "calcite": { | ||
| "logical": "LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10])\n LogicalFilter(condition=[ILIKE($2, '%Holmes%':VARCHAR, '\\')])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n", | ||
| "physical": "EnumerableCalc(expr#0..16=[{inputs}], expr#17=['%Holmes%':VARCHAR], expr#18=['\\'], expr#19=[ILIKE($t2, $t17, $t18)], proj#0..10=[{exprs}], $condition=[$t19])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| { | ||
| "root": { | ||
| "name": "ProjectOperator", | ||
| "description": { | ||
| "fields": "[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname]" | ||
| }, | ||
| "children": [{ | ||
| "name": "OpenSearchIndexScan", | ||
| "description": { | ||
| "request": "OpenSearchQueryRequest(indexName=opensearch-sql_test_index_account, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"query\":{\"wildcard\":{\"firstname.keyword\":{\"wildcard\":\"*mbe*\",\"case_insensitive\":true,\"boost\":1.0}}},\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\",\"city\",\"employer\",\"state\",\"age\",\"email\",\"lastname\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, needClean=true, searchDone=false, pitId=*, cursorKeepAlive=1m, searchAfter=null, searchResponse=null)" | ||
| }, | ||
| "children": [] | ||
| }] | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| { | ||
| "root": { | ||
| "name": "ProjectOperator", | ||
| "description": { | ||
| "fields": "[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname]" | ||
| }, | ||
| "children": [{ | ||
| "name": "FilterOperator", | ||
| "description": { | ||
| "conditions": "like(address, \"%Holmes%\")" | ||
| }, | ||
| "children": [{ | ||
| "name": "OpenSearchIndexScan", | ||
| "description": { | ||
| "request": "OpenSearchQueryRequest(indexName=opensearch-sql_test_index_account, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\"}, needClean=true, searchDone=false, pitId=*, cursorKeepAlive=1m, searchAfter=null, searchResponse=null)" | ||
| }, | ||
| "children": [] | ||
| }] | ||
| }] | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,7 @@ | |
| import static org.opensearch.index.query.QueryBuilders.regexpQuery; | ||
| import static org.opensearch.index.query.QueryBuilders.termQuery; | ||
| import static org.opensearch.index.query.QueryBuilders.termsQuery; | ||
| import static org.opensearch.index.query.QueryBuilders.wildcardQuery; | ||
| import static org.opensearch.script.Script.DEFAULT_SCRIPT_TYPE; | ||
| import static org.opensearch.sql.calcite.utils.UserDefinedFunctionUtils.MULTI_FIELDS_RELEVANCE_FUNCTION_SET; | ||
| import static org.opensearch.sql.calcite.utils.UserDefinedFunctionUtils.SINGLE_FIELD_RELEVANCE_FUNCTION_SET; | ||
|
|
@@ -93,6 +94,7 @@ | |
| import org.opensearch.sql.opensearch.storage.script.CalciteScriptEngine.ReferenceFieldVisitor; | ||
| import org.opensearch.sql.opensearch.storage.script.CalciteScriptEngine.UnsupportedScriptException; | ||
| import org.opensearch.sql.opensearch.storage.script.CompoundedScriptEngine.ScriptEngineType; | ||
| import org.opensearch.sql.opensearch.storage.script.StringUtils; | ||
| import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchBoolPrefixQuery; | ||
| import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchPhrasePrefixQuery; | ||
| import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchPhraseQuery; | ||
|
|
@@ -325,7 +327,8 @@ public Expression visitCall(RexCall call) { | |
| case SPECIAL: | ||
| return switch (call.getKind()) { | ||
| case CAST -> toCastExpression(call); | ||
| case LIKE, CONTAINS -> binary(call); | ||
| case CONTAINS -> binary(call); | ||
| case LIKE -> like(call); | ||
| default -> { | ||
| String message = format(Locale.ROOT, "Unsupported call: [%s]", call); | ||
| throw new PredicateAnalyzerException(message); | ||
|
|
@@ -533,8 +536,6 @@ private QueryExpression binary(RexCall call) { | |
| switch (call.getKind()) { | ||
| case CONTAINS: | ||
| return QueryExpression.create(pair.getKey()).contains(pair.getValue()); | ||
| case LIKE: | ||
| throw new UnsupportedOperationException("LIKE not yet supported"); | ||
| case EQUALS: | ||
| return QueryExpression.create(pair.getKey()).equals(pair.getValue()); | ||
| case NOT_EQUALS: | ||
|
|
@@ -580,6 +581,16 @@ private QueryExpression binary(RexCall call) { | |
| throw new PredicateAnalyzerException(message); | ||
| } | ||
|
|
||
| private QueryExpression like(RexCall call) { | ||
| // The third default escape is not used here. It's handled by | ||
| // StringUtils.convertSqlWildcardToLucene | ||
| checkState(call.getOperands().size() == 3); | ||
| final Expression a = call.getOperands().get(0).accept(this); | ||
| final Expression b = call.getOperands().get(1).accept(this); | ||
| final SwapResult pair = swap(a, b); | ||
| return QueryExpression.create(pair.getKey()).like(pair.getValue()); | ||
| } | ||
|
|
||
| private static QueryExpression constructQueryExpressionForSearch( | ||
| RexCall call, SwapResult pair) { | ||
| if (isSearchWithComplementedPoints(call)) { | ||
|
|
@@ -1137,10 +1148,24 @@ public QueryExpression notExists() { | |
| return this; | ||
| } | ||
|
|
||
| /* | ||
| * Prefer to run wildcard query for keyword type field. For text type field, it doesn't support | ||
| * cross term match because OpenSearch internally break text to multiple terms and apply wildcard | ||
| * matching one by one, which is not same behavior with regular like function without pushdown. | ||
| */ | ||
| @Override | ||
| public QueryExpression like(LiteralExpression literal) { | ||
| builder = regexpQuery(getFieldReference(), literal.stringValue()); | ||
| return this; | ||
| String fieldName = getFieldReference(); | ||
| String keywordField = OpenSearchTextType.toKeywordSubField(fieldName, this.rel.getExprType()); | ||
| boolean isKeywordField = keywordField != null; | ||
| if (isKeywordField) { | ||
| builder = | ||
| wildcardQuery( | ||
| keywordField, StringUtils.convertSqlWildcardToLuceneSafe(literal.stringValue())) | ||
| .caseInsensitive(true); | ||
| return this; | ||
| } | ||
| throw new UnsupportedOperationException("Like query is not supported for text field"); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Text field can also be push down as script, Can we track it as enhancement issue? And add a Notes to explain current limitation on Text field support in LIKE.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added limitation to LIKE function doc. I tried pushdown like function script for text field. However, getting ScriptDocValues for text field throws exception to not recommend to do it.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, text data does not have doc values, we should use source, e.g.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. Created a tracking issue: #3950. Hopefully, we can find a way to read script values from source. |
||
| } | ||
|
|
||
| @Override | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
With the patch, both in v2 and v3, predicate
WHERE Like(TextKeywordBody, 'test%')will trigger wildcard query pushdown but predicateWHERE Like(TextBody, 'test%')won't trigger any pushdown. right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes