-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#639: allow metadata fields and score opensearch function #228
Conversation
Codecov Report
@@ Coverage Diff @@
## integ-metadata-fields #228 +/- ##
========================================================
Coverage ? 98.47%
Complexity ? 3875
========================================================
Files ? 345
Lines ? 9648
Branches ? 626
========================================================
Hits ? 9501
Misses ? 142
Partials ? 5
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
...rc/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScanQueryBuilder.java
Outdated
Show resolved
Hide resolved
not sure if opensearch-project#788 can be resolved as well? |
This problem will occur for any field that shares a name with a function. The workaround for now will be to use backticks. |
I think the issue you're referring to is already fixed here: opensearch-project#1191. I tried to fix |
* @param context analysis context for the query | ||
* @return resolved relevance function | ||
*/ | ||
public Expression visitScoreFunction(ScoreFunction node, AnalysisContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add this to opensearch
module as storage-specific function. Personally I think we should prioritize opensearch-project#811. It will become more and more difficult as we keep adding more OpenSearch logic to core
engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've got it on our radar, and we can start to scope it out. As is, there's quite a bit of work to do to pull out the opensearch specific classes, but I think it's do-able.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had storage-specific function support in opensearch-project#1354. Can we start this now instead of adding more and more special logic to core
? Agreed there is quite lots of work to move all to opensearch
but maybe easy to do this for single PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look. There's already a lot in this PR.
Would you mind if we move the score function out as a proof of concept for a set of OpenSearch storage engine functions?
public static final String METADATA_FIELD_SCORE = "_score"; | ||
public static final String METADATA_FIELD_MAXSCORE = "_maxscore"; | ||
public static final String METADATA_FIELD_SORT = "_sort"; | ||
public static final java.util.Map<String, ExprCoreType> METADATAFIELD_TYPE_MAP = new HashMap<>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not want to hardcode this in core
. Not sure if adding these fields in OpenSearchIndex.getFieldMapping()
can work for you, however we need to consider meta column in other storage engine. For example, S3 may have $path
, $partition
in each row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use OpenSearchIndex as it would create a dependency on opensearch in core. Not ideal. We kind of need to wait until we have a user-defined functions that could override core visitors, like QualifiedName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't mean you need to depend on opensearch
module. OpenSearchIndex is subclass of our Table
interface. As I understand, Analyzer
will fetch field list from Table
during query analysis. I saw you've added isMetaField
flag, so I don't see why we want to hardcode metadata field list in core
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Interesting. I'll see if I can hook that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see changes in:
- core/src/main/java/org/opensearch/sql/analysis/TypeEnvironment.java (for interface changes in the environment)
- opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java (specific changes for OpenSearch indexes), and
- core/src/main/java/org/opensearch/sql/analysis/ExpressionAnalyzer.java the visitQualifiedName function
b4912da
to
c8f0520
Compare
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
… function Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@@ -280,7 +280,7 @@ void search() { | |||
|
|||
Iterator<ExprValue> hits = response1.iterator(); | |||
assertTrue(hits.hasNext()); | |||
assertEquals(exprTupleValue, hits.next()); | |||
assertEquals(exprTupleValue.tupleValue().get("id"), hits.next().tupleValue().get("id")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was matching the whole object - and failing because the Boolean object wasn't .equal. This could be removed (I think) if I change the Boolean to a boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I got that wrong. We are now returning the metafields in the response from OpenSearch.. so they're appearing in the tuple. This way the spirit of the test remains the same.
sql/src/main/java/org/opensearch/sql/sql/parser/AstExpressionBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
public void testMetafieldIdentifierTest() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when getting a meta field from a prometheus data source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser will fail. There's no workaround for prometheus data sources for _id, _score, _index, _maxscore and _sort. But it will work for other fields that begin with an underscore (e.g. _routing or __fieldname) that is not strictly defined as an opensearch meta-field.
In other words: this is an improvement on the existing functionality... which would just fail for ALL fields starting with an underscore.
docs/user/dql/functions.rst
Outdated
``score_query(search_expression, boost)`` | ||
``scorequery(search_expression, boost)`` | ||
|
||
The score function returns the _score of any documents matching the enclosed relevance-search expression. The SCORE function expects two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a users it is unexpected that SELECT _score, phrase FROM phrases
returns _score
value yet SELECT _score, phrase FROM phrases WHERE match('phrase', 'my')
returns _score
of null
.
I'd expect _score
to be returned if requested since it can be calculated for any OpenSearch query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an OpenSearch value. I would think users need to understand the OpenSearch side of things. Once could ask why OpenSearch doesn't also just calculate the score for everything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin could also set track_scores = true
if the user asks for _score
field and is using relevance search.
At the very least, it should be mentioned here that _score
will be null unless relevance query is wrapped in score(...)
.
...rc/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScanQueryBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/OpenSearchFunctions.java
Show resolved
Hide resolved
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/analysis/ExpressionAnalyzerTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
public TypeEnvironment(TypeEnvironment parent, SymbolTable symbolTable) { | ||
this.parent = parent; | ||
this.symbolTable = symbolTable; | ||
this.reservedSymbolTable = new SymbolTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a separate symbol table needed? Why can't we add metafields to symbolTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To separate out 'reserved fields' (that are in all tables) from fields that are specific to the index. The code needs to know the difference between these two lists later on.
…ore OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
…ore OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com> (cherry picked from commit e805151)
…ore OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com> (cherry picked from commit e805151)
…e opensearch function (#228) (opensearch-project#1508) * opensearch-project#639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (opensearch-project#1456) Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com> Co-authored-by: Andrew Carbonetto <andrewc@bitquilltech.com>
…e opensearch function (#228) (opensearch-project#1509) * opensearch-project#639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (opensearch-project#1456) Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com> Co-authored-by: Andrew Carbonetto <andrewc@bitquilltech.com>
…elds and the score OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Description
OpenSearch reserved fields (_id, _index, _sort, _score, _max_score) are not allowed to be used in SQL clauses (SELECT, WHERE, ORDER BY) because the field format starting with underscore _ is not allowed.
This ticket allows for the
score()
,score_query()
andscorequery()
function to be used to wrap around relevance-search queries to force the_score
and_max_score
metadata fields to be returned with values. For some queries,_score
returnsnull
unless thescore()
function is included.score
function also allows for an optionalboost
argument to be included that boosts the score of the child relevance function.Example - metadata fields returned:
Example - Metadata fields not requested are not displayed:
Example - relevance search without and with score function
Example - boost score on the _sql/_explain call:
Issues Resolved
__
is missing in the returned result when query by SQL with extrafilter
opensearch-project/sql#783Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.