Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#639: allow metadata fields and score opensearch function (#228) #1456

Merged
merged 9 commits into from
Apr 10, 2023

Conversation

acarbonetto
Copy link
Collaborator

Description

OpenSearch reserved fields (_id, _index, _sort, _score, _max_score) are not allowed to be used in SQL clauses (SELECT, WHERE, ORDER BY) because the field format starting with underscore _ is not allowed.

  • This ticket adds specific identifiers to the language, and opens up support for OpenSearch reserved identifiers.
  • As an aside, identifiers with double underscore at the start (such as __myCoolField) is acceptable as an identifier.

This ticket allows for the score(), score_query() and scorequery() function to be used to wrap around relevance-search queries to force the _score and _max_score metadata fields to be returned with values. For some queries, _score returns null unless the score() function is included.

  • The score function also allows for an optional boost argument to be included that boosts the score of the child relevance function.

Example - metadata fields returned:

SELECT calcs.key, str0, _id, _sort, _score, _maxscore FROM calcs WHERE _id="5"
Result:
{ "key04", "OFFICE SUPPLIES", "5", -2, null, null }

Example - Metadata fields not requested are not displayed:

SELECT *, _id FROM bigint WHERE _id="2"
Result (assuming bigint only has one field):
[9223372026854775807, "2"]

Example - relevance search without and with score function

SELECT _id, _index, _score, _maxscore, str0, str1 FROM calcs WHERE QUERY_STRING([\"*\"], `BINDING SUPPLIES`);
result: 
[
    "7",
    "calcs",
    null,
    null,
    "OFFICE SUPPLIES",
    "BINDING SUPPLIES"
]

SELECT _id, _index, _score, _maxscore, str0, str1 FROM calcs WHERE SCORE(QUERY_STRING([\"*\"], `BINDING SUPPLIES`));
[
    "7",
    "calcs",
    2.4849067,
    2,
    "OFFICE SUPPLIES",
    "BINDING SUPPLIES"
]

Example - boost score on the _sql/_explain call:

SELECT _id, _score, _maxscore FROM calcs WHERE SCORE(QUERY_STRING([\"*\"], `BINDING SUPPLIES`, boost=2.5));
result: 
[
    "7",
    6.2122664,
    6
]

SELECT _id, _score, _maxscore FROM calcs WHERE SCOREQUERY(QUERY_STRING([\"*\"], `BINDING SUPPLIES`, boost=2.5), 2.0);
result: 
[
    "7",
    12.424533,
    12
]

SELECT _id, _index, _score, _maxscore, str0, str1 FROM calcs WHERE SCORE_QUERY(QUERY_STRING([\"*\"], `BINDING SUPPLIES`), 2.5);
result:
[
    "7",
    6.2122664,
    6
]

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…nction (#228)

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update to define and include metadata when visiting the expr node

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add specific metadata identifiers

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add IT tests and add parser changes

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update score function expression analyzer to return boosted relevance function

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update builder to track scores

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Remove ScoreExpression.java and cleanup checkstyle

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* cleanup checkstyle

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Cleanup and add alternative score function syntax

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Cleanup and add alternative score function syntax

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Fix some bugs and add Expression tests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add expresssion and analyzer tests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add score doctests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add score function doctests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Add metafield tests

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Move legacy test and mark old as ignore

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* fix checkstyle violations

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* fix checkstyle violations

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update tests and identifier to accept metafields

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Checkstyle fixes

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Rebase from main

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* fix checkstyle violations

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Revert bad conflict resolution

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Fix for review comments

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update IT tests and legacy tests for comments

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Minor comment

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Updates for whitespace

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update basics.rst to show OS result

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update basics.rst to show OS result

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update basics.rst description

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Change Score function to accept a double/integer not an unresolved

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update functions.rst

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Checkstyle update

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Move reserved world symbol table to OpenSearchTable

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Update functions.rst for review comments

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Removed parser meta tokens; Changes ImmutableMap to Map

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

* Removed parser meta tokens; Changes ImmutableMap to Map

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>

---------

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@acarbonetto acarbonetto requested a review from a team as a code owner March 21, 2023 18:18
@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2023

Codecov Report

Merging #1456 (05c42f7) into main (23cc0f6) will increase coverage by 0.02%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #1456      +/-   ##
============================================
+ Coverage     98.46%   98.49%   +0.02%     
- Complexity     3869     3928      +59     
============================================
  Files           345      347       +2     
  Lines          9603     9771     +168     
  Branches        616      645      +29     
============================================
+ Hits           9456     9624     +168     
  Misses          142      142              
  Partials          5        5              
Flag Coverage Δ
sql-engine 98.49% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/opensearch/sql/analysis/Analyzer.java 100.00% <100.00%> (ø)
...rg/opensearch/sql/analysis/ExpressionAnalyzer.java 100.00% <100.00%> (ø)
...rch/sql/analysis/ExpressionReferenceOptimizer.java 100.00% <100.00%> (ø)
...a/org/opensearch/sql/analysis/TypeEnvironment.java 100.00% <100.00%> (ø)
...c/main/java/org/opensearch/sql/expression/DSL.java 100.00% <100.00%> (ø)
...h/sql/expression/function/BuiltinFunctionName.java 100.00% <100.00%> (ø)
...h/sql/expression/function/OpenSearchFunctions.java 100.00% <100.00%> (ø)
...rc/main/java/org/opensearch/sql/storage/Table.java 100.00% <100.00%> (ø)
...sql/opensearch/request/OpenSearchQueryRequest.java 100.00% <100.00%> (ø)
...l/opensearch/request/OpenSearchRequestBuilder.java 100.00% <100.00%> (ø)
... and 7 more

... and 4 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

MaxKsyunz
MaxKsyunz previously approved these changes Mar 21, 2023
Copy link
Collaborator

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acarbonetto Do metafields work with aliases? E.g. SELECT h._id FROM tableA AS h

Is there a test that confirms this?

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@acarbonetto
Copy link
Collaborator Author

@acarbonetto Do metafields work with aliases? E.g. SELECT h._id FROM tableA AS h

Is there a test that confirms this?

IT test for this added in f3d89b8

GumpacG
GumpacG previously approved these changes Mar 23, 2023
Yury-Fridlyand
Yury-Fridlyand previously approved these changes Mar 25, 2023
MaxKsyunz
MaxKsyunz previously approved these changes Mar 27, 2023
Comment on lines +475 to +476
// Identifiers cannot start with a single '_' since this an OpenSearch reserved
// metadata field. Two underscores (or more) is acceptable, such as '__field'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment match the rule below? So ID_LITERAL won't accept token starting with single underscore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I need to update this comment since I've updated the pattern.
Thank you!

Yury-Fridlyand
Yury-Fridlyand previously approved these changes Apr 6, 2023
Comment on lines +100 to +103
FetchSourceContext fetchSource = this.sourceBuilder.fetchSource();
List<String> includes = fetchSource != null && fetchSource.includes() != null
? Arrays.asList(fetchSource.includes())
: List.of();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be do this in the query builder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearchQueryRequest and OpenSearchScrollRequest needs a refactor before this makes any sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assist in creating an issue to monitor and manage future modifications.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
@MaxKsyunz MaxKsyunz merged commit e805151 into opensearch-project:main Apr 10, 2023
@MaxKsyunz MaxKsyunz deleted the integ-metadata-fields branch April 10, 2023 19:42
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 10, 2023
…ction (#228) (#1456)

Allow metadata fields and score OpenSearch function.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
(cherry picked from commit e805151)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 10, 2023
…ction (#228) (#1456)

Allow metadata fields and score OpenSearch function.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
(cherry picked from commit e805151)
penghuo pushed a commit that referenced this pull request Apr 12, 2023
…ion (#228) (#1509)

* #639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (#1456)

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Co-authored-by: Andrew Carbonetto <andrewc@bitquilltech.com>
penghuo pushed a commit that referenced this pull request Apr 12, 2023
…ion (#228) (#1508)

* #639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (#1456)

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Co-authored-by: Andrew Carbonetto <andrewc@bitquilltech.com>
acarbonetto added a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Apr 18, 2023
…elds and the score OpenSearch function (#228) (opensearch-project#1456)

Allow metadata fields and score OpenSearch function.

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants