Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add match_phrase_prefix #661

Merged
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d03f3fe
Add an integration test for match_phrase_prefix with required parameters
Jun 4, 2022
b5a6832
Update SQL ANTLR files to support match_phrase_prefix.
Jun 4, 2022
9d77381
SQL parser test for match_phrase_prefix with required arguments.
Jun 4, 2022
05e74fa
The rest of the match_phrase_prefix owl with required parameters.
Jun 4, 2022
898a31c
Checkstyle fix.
Jun 4, 2022
bb946c2
Add a license header.
Jun 4, 2022
0aec6ed
Merge pull request #68 from Bit-Quill/dev-match_phrase_prefix-#186-re…
MaxKsyunz Jun 7, 2022
f9c90a3
Make MATCH_PHRASE_PREFIX_MAX_NUM_PARAMETERS public.
Jun 7, 2022
885ba5b
Add support for boost parameter in match_phrase.
Jun 8, 2022
3b113e8
Add SQL parser unit tests for optional parameters.
Jun 8, 2022
ac3fdd5
Add support for optional parameters for match_phrase_prefix.
Jun 9, 2022
efaa0bf
Add unit test for AstExpressionBuilderTest that includes all parameters.
Jun 9, 2022
22a2d0c
Add DSL.namedArgument(String, String)
Jun 9, 2022
d92da11
ExpressionAnalyzer test for match_phrase_prefix with all parameters.
Jun 9, 2022
0c49775
Support correct max number of optional parameters.
Jun 9, 2022
f68eb9e
Address checkstyle issues.
Jun 10, 2022
b4177a6
Merge pull request #70 from Bit-Quill/dev-match_phrase_prefix-#186-sq…
MaxKsyunz Jun 10, 2022
7d4a953
Merge branch 'integ-match_phrase_prefix-#186' into dev-match_phrase_p…
Jun 16, 2022
a485765
Update getRelevanceFunctionResolver usage to pass field argument type.
Jun 16, 2022
7b7a096
match_phrase_prefix PPL required parameters integration test.
Jun 10, 2022
ea512f8
match_phrase_prefix with required parameters in PPL.
Jun 11, 2022
2c65764
Integration test for match_phrase_prefix in PPL with all parameters.
Jun 11, 2022
9f38d72
match_phrase_prefix SQL integration tests.
Jun 21, 2022
0fe8f78
Merge pull request #73 from Bit-Quill/dev-match_phrase_prefix-#186-pp…
MaxKsyunz Jun 21, 2022
0925c1d
Add FilterQueryBuilderTest test for match_phrase_prefix with analyzer
Jun 21, 2022
9dc2655
Fix flaky tests
Jun 22, 2022
5286831
match_phrase_prefix documentation for SQL and PPL
Jun 22, 2022
27bf42c
Merge pull request #76 from Bit-Quill/dev-match_phrase_prefix-#186-sq…
MaxKsyunz Jun 23, 2022
451b829
Improve PPL documentation for match_phrase_prefix
Jun 23, 2022
11b8617
Add integration tests for match_phrase_prefix in PPL
Jun 23, 2022
6e61c0d
Merge pull request #77 from Bit-Quill/dev-match_phrase_prefix-#186-docs
MaxKsyunz Jun 23, 2022
e01a287
Merge pull request #78 from Bit-Quill/dev-match_phrase_prefix-#186-pp…
MaxKsyunz Jun 23, 2022
bff94be
Merge branch 'integ-match_phrase_prefix-#186' into dev-match_phrase_p…
Jun 24, 2022
224432f
Remove an empty unit test.
Jun 24, 2022
e9f2af3
Updated incorrect references to match_phrase.
Jun 27, 2022
6562109
Merge pull request #79 from Bit-Quill/dev-match_phrase_prefix-#186
MaxKsyunz Jun 28, 2022
b51eae4
Merge remote-tracking branch 'upstream/main' into integ-match_phrase_…
Jun 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions core/src/main/java/org/opensearch/sql/expression/DSL.java
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,10 @@ public NamedArgumentExpression namedArgument(String argName, Expression value) {
return new NamedArgumentExpression(argName, value);
}

public NamedArgumentExpression namedArgument(String name, String value) {
return namedArgument(name, literal(value));
}

public static ParseExpression parsed(Expression expression, Expression pattern,
Expression identifier) {
return new ParseExpression(expression, pattern, identifier);
Expand Down Expand Up @@ -658,6 +662,10 @@ public FunctionExpression match_phrase(Expression... args) {
return compile(BuiltinFunctionName.MATCH_PHRASE, args);
}

public FunctionExpression match_phrase_prefix(Expression... args) {
return compile(BuiltinFunctionName.MATCH_PHRASE_PREFIX, args);
}

public FunctionExpression multi_match(Expression... args) {
return compile(BuiltinFunctionName.MULTI_MATCH, args);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ public enum BuiltinFunctionName {
SIMPLE_QUERY_STRING(FunctionName.of("simple_query_string")),
MATCH_PHRASE(FunctionName.of("match_phrase")),
MATCHPHRASE(FunctionName.of("matchphrase")),

MATCH_PHRASE_PREFIX(FunctionName.of("match_phrase_prefix")),
/**
* Legacy Relevance Function.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ public class OpenSearchFunctions {
public static final int MIN_NUM_PARAMETERS = 2;
public static final int MULTI_MATCH_MAX_NUM_PARAMETERS = 17;
public static final int SIMPLE_QUERY_STRING_MAX_NUM_PARAMETERS = 14;
public static final int MATCH_PHRASE_PREFIX_MAX_NUM_PARAMETERS = 7;

/**
* Add functions specific to OpenSearch to repository.
Expand All @@ -43,13 +44,19 @@ public void register(BuiltinFunctionRepository repository) {
// compatibility.
repository.register(match_phrase(BuiltinFunctionName.MATCH_PHRASE));
repository.register(match_phrase(BuiltinFunctionName.MATCHPHRASE));
repository.register(match_phrase_prefix());
}

private static FunctionResolver match() {
FunctionName funcName = BuiltinFunctionName.MATCH.getName();
return getRelevanceFunctionResolver(funcName, MATCH_MAX_NUM_PARAMETERS, STRING);
}

private static FunctionResolver match_phrase_prefix() {
FunctionName funcName = BuiltinFunctionName.MATCH_PHRASE_PREFIX.getName();
return getRelevanceFunctionResolver(funcName, MATCH_PHRASE_PREFIX_MAX_NUM_PARAMETERS, STRING);
}

private static FunctionResolver match_phrase(BuiltinFunctionName matchPhrase) {
FunctionName funcName = matchPhrase.getName();
return getRelevanceFunctionResolver(funcName, MATCH_PHRASE_MAX_NUM_PARAMETERS, STRING);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import static org.opensearch.sql.ast.dsl.AstDSL.intLiteral;
import static org.opensearch.sql.ast.dsl.AstDSL.qualifiedName;
import static org.opensearch.sql.ast.dsl.AstDSL.stringLiteral;
import static org.opensearch.sql.ast.dsl.AstDSL.unresolvedArg;
import static org.opensearch.sql.data.model.ExprValueUtils.LITERAL_TRUE;
import static org.opensearch.sql.data.model.ExprValueUtils.integerValue;
import static org.opensearch.sql.data.type.ExprCoreType.BOOLEAN;
Expand Down Expand Up @@ -455,6 +456,30 @@ void simple_query_string_expression_two_fields() {
AstDSL.unresolvedArg("query", stringLiteral("sample query"))));
}

@Test
public void match_phrase_prefix_all_params() {
assertAnalyzeEqual(
dsl.match_phrase_prefix(
dsl.namedArgument("field", "test"),
dsl.namedArgument("query", "search query"),
dsl.namedArgument("slop", "3"),
dsl.namedArgument("boost", "1.5"),
dsl.namedArgument("analyzer", "standard"),
dsl.namedArgument("max_expansions", "4"),
dsl.namedArgument("zero_terms_query", "NONE")
),
AstDSL.function("match_phrase_prefix",
unresolvedArg("field", stringLiteral("test")),
unresolvedArg("query", stringLiteral("search query")),
unresolvedArg("slop", stringLiteral("3")),
unresolvedArg("boost", stringLiteral("1.5")),
unresolvedArg("analyzer", stringLiteral("standard")),
unresolvedArg("max_expansions", stringLiteral("4")),
unresolvedArg("zero_terms_query", stringLiteral("NONE"))
)
);
}

protected Expression analyze(UnresolvedExpression unresolvedExpression) {
return expressionAnalyzer.analyze(unresolvedExpression, analysisContext);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,19 @@ List<FunctionExpression> match_phrase_dsl_expressions() {
);
}

List<FunctionExpression> match_phrase_prefix_dsl_expressions() {
return List.of(
dsl.match_phrase_prefix(field, query)
);
}

@Test
public void match_phrase_prefix() {
for (FunctionExpression fe : match_phrase_prefix_dsl_expressions()) {
assertEquals(BOOLEAN, fe.type());
}
}

@Test
void match_in_memory() {
FunctionExpression expr = dsl.match(field, query);
Expand Down
41 changes: 41 additions & 0 deletions docs/user/dql/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2233,6 +2233,47 @@ Another example to show how to set custom values for the optional parameters::
+----------------------+--------------------------+


MATCH_PHRASE_PREFIX
------------

Description
>>>>>>>>>>>

``match_phrase_prefix(field_expression, query_expression[, option=<option_value>]*)``

The match_phrase_prefix function maps to the match_phrase_prefix query used in search engine,
to return the documents that match a provided text with a given field. Available parameters include:

- analyzer
- slop
- zero_terms_query
- max_expansions
- boost


Example with only ``field`` and ``query`` expressions, and all other parameters are set default values::

os> SELECT author, title FROM books WHERE match_phrase_prefix(author, 'Alexander Mil');
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+

Another example to show how to set custom values for the optional parameters::

os> SELECT author, title FROM books WHERE match_phrase_prefix(author, 'Alan Mil', slop = 2);
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+


MULTI_MATCH
-----------

Expand Down
43 changes: 43 additions & 0 deletions docs/user/ppl/functions/relevance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,49 @@ Another example to show how to set custom values for the optional parameters::
+----------------------+--------------------------+



MATCH_PHRASE_PREFIX
------------

Description
>>>>>>>>>>>

``match_phrase_prefix(field_expression, query_expression[, option=<option_value>]*)``

The match_phrase_prefix function maps to the match_phrase_prefix query used in search engine, to return the documents that match a provided text with a given field. Available parameters include:

- analyzer
- slop
- max_expansions
- boost
- zero_terms_query

Example with only ``field`` and ``query`` expressions, and all other parameters are set default values::

os> source=books | where match_phrase_prefix(author, 'Alexander Mil') | fields author, title
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+



Another example to show how to set custom values for the optional parameters::

os> source=books | where match_phrase_prefix(author, 'Alan Mil', slop = 2) | fields author, title
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+



MULTI_MATCH
-----------

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.ppl;

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BEER;
import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;

import java.io.IOException;
import org.json.JSONObject;
import org.junit.Test;

public class MatchPhrasePrefixWhereCommandIT extends PPLIntegTestCase {

@Override
public void init() throws IOException {
loadIndex(Index.BEER);
}

@Test
public void required_parameters() throws IOException {
String query = "source = %s | WHERE match_phrase_prefix(Title, 'champagne be') | fields Title";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result,
rows("Can old flat champagne be used for vinegar?"),
rows("Elder flower champagne best to use natural yeast or add a wine yeast?"));
}


@Test
public void all_optional_parameters() throws IOException {
// The values for optional parameters are valid but arbitrary.
String query = "source = %s " +
"| WHERE match_phrase_prefix(Title, 'flat champ', boost = 1.0, " +
"zero_terms_query='ALL', max_expansions = 2, analyzer=standard, slop=0) " +
"| fields Title";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result, rows("Can old flat champagne be used for vinegar?"));
}


@Test
public void max_expansions_is_3() throws IOException {
// max_expansions applies to the last term in the query -- 'bottl'
// It tells OpenSearch to consider only the first 3 terms that start with 'bottl'
// In this dataset these are 'bottle-conditioning', 'bottling', 'bottles'.

String query = "source = %s " +
"| WHERE match_phrase_prefix(Tags, 'draught bottl', max_expansions=3) | fields Tags";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result, rows("brewing draught bottling"),
rows("draught bottles"));
}

@Test
public void analyzer_english() throws IOException {
// English analyzer removes 'in' and 'to' as they are common words.
// This results in an empty query.
String query = "source = %s " +
"| WHERE match_phrase_prefix(Title, 'in to', analyzer=english)" +
"| fields Title";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
assertTrue("Expect English analyzer to filter out common words 'in' and 'to'",
result.getInt("total") == 0);
}

@Test
public void analyzer_standard() throws IOException {
// Standard analyzer does not treat 'in' and 'to' as special terms.
// This results in 'to' being used as a phrase prefix given us 'Tokyo'.
String query = "source = %s " +
"| WHERE match_phrase_prefix(Title, 'in to', analyzer=standard)" +
"| fields Title";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result, rows("Local microbreweries and craft beer in Tokyo"));
}

@Test
public void zero_term_query_all() throws IOException {
// English analyzer removes 'in' and 'to' as they are common words.
// zero_terms_query of 'ALL' causes all rows to be returned.
// ORDER BY ... LIMIT helps make the test understandable.
String query = "source = %s" +
"| WHERE match_phrase_prefix(Title, 'in to', analyzer=english, zero_terms_query='ALL') " +
"| sort -Title | head 1 | fields Title";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result, rows("was working great, now all foam"));
}


@Test
public void slop_is_2() throws IOException {
// When slop is 0, the terms are matched exactly in the order specified.
// 'open' is used to match prefix of the next term.
String query = "source = %s" +
"| where match_phrase_prefix(Tags, 'gas ta', slop=2) " +
"| fields Tags";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result, rows("taste gas"));
}

@Test
public void slop_is_3() throws IOException {
// When slop is 2, results will include phrases where the query terms are transposed.
String query = "source = %s" +
"| where match_phrase_prefix(Tags, 'gas ta', slop=3)" +
"| fields Tags";
JSONObject result = executeQuery(String.format(query, TEST_INDEX_BEER));
verifyDataRows(result,
rows("taste draught gas"),
rows("taste gas"));
}
}
Loading