Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Make OpenSearch function registered dynamically #811

Open
dai-chen opened this issue Sep 8, 2022 · 5 comments · May be fixed by #2019
Open

[FEATURE] Make OpenSearch function registered dynamically #811

dai-chen opened this issue Sep 8, 2022 · 5 comments · May be fixed by #2019
Labels
maintenance Improves code quality, but not the product Priority-High

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Sep 8, 2022

Is your feature request related to a problem?
Currently all OpenSearch functions are "hardcoded" in ANTLR grammar file. As external storage like S3 and Prometheus added, this will cause problem. For example, in current implementation, query like "... FROM s3.logs WHERE QUERY_STRING(...)" can pass ANTLR parser but fail at execution time due to missing QUERY_STRING function support in S3 storage.

What solution would you like?
One approach is each storage engine registers specific function dynamically when plugin bootstraps. The ANTLR grammar only includes common ANSI SQL functions that has in-memory implementation along with a new udf rule that match unrecognized functions as User-Defined Function. Later analyzer can verify if the UDF is valid or not. This approach naturally provides UDF support for future.

What alternatives have you considered?
N/A

Do you have any additional context?
N/A

@dai-chen dai-chen added the enhancement New feature or request label Sep 8, 2022
@MaxKsyunz
Copy link
Collaborator

@dai-chen, v1 engine supports this syntax for relevance queries -- SELECT * FROM index WHERE field = match("string").

If we add it to v2 engine, this feature would need to support this syntax as well. Especially given #787.

@dai-chen
Copy link
Collaborator Author

dai-chen commented Sep 12, 2022

@dai-chen, v1 engine supports this syntax for relevance queries -- SELECT * FROM index WHERE field = match("string").

If we add it to v2 engine, this feature would need to support this syntax as well. Especially given #787.

Sure, we need to figure out a general way and allow each storage engine to register specific data type and function dynamically. Earlier our storage engine interface was referencing Apache Calcite. Probably we can explore further here: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/schema/Schema.java

@dai-chen dai-chen added maintenance Improves code quality, but not the product and removed enhancement New feature or request labels Oct 31, 2022
@dai-chen dai-chen added legacy Issues related to legacy query engine to be deprecated Priority-High and removed legacy Issues related to legacy query engine to be deprecated labels Nov 16, 2022
@biggoon
Copy link

biggoon commented Nov 23, 2022

Another think to add: we need to move the analyzers relevant to the user-defined functions (such as the HighlightAnalyzer) out of core and inject them into the Analyzer dynamically.
see: https://github.com/opensearch-project/sql/blob/2.x/core/src/main/java/org/opensearch/sql/analysis/Analyzer.java#L364

@dai-chen
Copy link
Collaborator Author

dai-chen commented Nov 23, 2022

Another think to add: we need to move the analyzers relevant to the user-defined functions (such as the HighlightAnalyzer) out of core and inject them into the Analyzer dynamically. see: https://github.com/opensearch-project/sql/blob/2.x/core/src/main/java/org/opensearch/sql/analysis/Analyzer.java#L364

That'd great be if we can do that for highlight as well. I've met this problem with highlight operator when refactor optimizer: https://github.com/opensearch-project/sql/pull/1091/files#diff-3334dc7821335d1c35cb35e3da00960106b6ac04c4ca092937c51c076056d272R103. I have to add a pushDownHighlight rule, however there is no fallback if push down not happen on other storage like S3/Prometheus. There is no in-memory implementation and highlight maybe only make sense for OpenSearch storage.

@acarbonetto
Copy link
Collaborator

@acarbonetto create issue to determine if we can split work up into the following:

  • PoC to split out an OpenSearch-specific function (like match) from the core ANTLR parser into a UDF-like function defined by the OpenSearch storage engine (vamsi has done this for promises and can be linked).
  • Second ticket to scope out the parser refactor and create follow-up tickets for related work.
  • For core refactor, define follow-up activities to determine other core classes that have OpenSearch-specific operations that should be moved to the OpenSearch module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Improves code quality, but not the product Priority-High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants