Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Highlight Support in PPL #124

Merged

Conversation

forestmvey
Copy link

Signed-off-by: forestmvey forestv@bitquilltech.com

Description

Adding support for highlight in PPL with optional arguments and wildcard support in SQL and PPL.

Issues Resolved

Issue: 636

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Sep 27, 2022

Codecov Report

Merging #124 (e4eeeef) into integ-highlight-refactor (e04d6f8) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@                      Coverage Diff                       @@
##             integ-highlight-refactor     #124      +/-   ##
==============================================================
+ Coverage                       94.93%   94.96%   +0.03%     
- Complexity                       2973     2992      +19     
==============================================================
  Files                             291      291              
  Lines                            7949     8000      +51     
  Branches                          578      588      +10     
==============================================================
+ Hits                             7546     7597      +51     
  Misses                            349      349              
  Partials                           54       54              
Flag Coverage Δ
query-workbench 62.76% <ø> (ø)
sql-engine 97.84% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/opensearch/sql/analysis/Analyzer.java 100.00% <ø> (ø)
...h/sql/expression/function/OpenSearchFunctions.java 100.00% <ø> (ø)
...rg/opensearch/sql/analysis/ExpressionAnalyzer.java 100.00% <100.00%> (ø)
...org/opensearch/sql/analysis/HighlightAnalyzer.java 100.00% <100.00%> (ø)
...opensearch/sql/expression/HighlightExpression.java 100.00% <100.00%> (ø)
...ensearch/sql/planner/logical/LogicalHighlight.java 100.00% <100.00%> (ø)
...opensearch/sql/planner/logical/LogicalPlanDSL.java 100.00% <100.00%> (ø)
...l/opensearch/request/OpenSearchRequestBuilder.java 100.00% <100.00%> (ø)
...search/sql/opensearch/storage/OpenSearchIndex.java 100.00% <100.00%> (ø)
...java/org/opensearch/sql/ppl/parser/AstBuilder.java 100.00% <100.00%> (ø)
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@forestmvey forestmvey force-pushed the internal-review-highlight-refactor branch 6 times, most recently from e618c8b to 69c1e63 Compare September 27, 2022 20:11
@forestmvey forestmvey requested a review from a team September 27, 2022 21:01
public UnresolvedExpression highlight(UnresolvedExpression fieldName) {
return new HighlightFunction(fieldName);
public UnresolvedExpression highlight(UnresolvedExpression fieldName,
java.util.Map<String, Literal> arguments) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.util.Map could be imported.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.opensearch.sql.ast.expression.Map is already imported taking up that namespace unfortunately. This is less code changes than switching all the other Map instances.

Alias alias = AstDSL.alias("highlight(invalid_field)",
highlightFunction);

assertThrows(SemanticCheckException.class, () -> analyze(alias));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, on upstream, you may be asked to check the exception message here.

Copy link

@acarbonetto acarbonetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments only

return null;
}

HighlightFunction unresolved = (HighlightFunction) node.getDelegated();
HighlightFunction unresolved = (HighlightFunction) delegated;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a better pattern to doing this would be to catch a RuntimeException if the casting fails, instead of checking for the instanceof. That way you don't need to check each time, but instead try and catch for a failure.
Something like:

HighlightFunction unresolved;
try {
  unresolved = (HighlightFunction) delegated;
 } catch (RuntimeException runtimeException) {
  return null;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea to use this approach, though I do think the readability and verbosity of instanceof is nice too. It is also the more commonly found approach in the code base.

assertEquals(
highlight(AstDSL.stringLiteral("fieldA")),
highlight(AstDSL.stringLiteral("fieldA"), args),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you create the HashMap here, you don't need to import Map :)

assertEquals(
highlight(AstDSL.qualifiedName("fieldA")),
highlight(AstDSL.qualifiedName("fieldA"), args),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@forestmvey forestmvey force-pushed the internal-review-highlight-refactor branch from 69c1e63 to 729afad Compare September 28, 2022 20:41
@Yury-Fridlyand
Copy link

Yury-Fridlyand commented Sep 29, 2022

I run this PPL query:

source=beer | where multi_match([Title, Body, Tags], 'taste') | fields highlight('Body'), highlight('T*');

A piece of the output:

-[ RECORD 3 ]-------------------------
highlight('Body') | null
highlight('T*')   | {'Tags': ['brewing <em>taste</em>']}
-[ RECORD 4 ]-------------------------
highlight('Body') | [away from direct sources of heat or light; too much exposure to light can cause beer to develop a foul <em>taste</em>]
highlight('T*')   | {}

There is NULL when regular highlight matches nothing, but empty STRUCT when wildcard highlight matches nothing.

I suppose this should be consistent - both should return NULL or both return empty STRUCT/ARRAY.
What to you think, @MaxKsyunz?

@Yury-Fridlyand
Copy link

Great job, Forest!
I don't have any concerns/objections except this one.
Please, add query samples when you will post PR on upstream.

@forestmvey forestmvey force-pushed the internal-review-highlight-refactor branch from 729afad to a8e26e0 Compare September 29, 2022 21:09
…support in SQL and PPL.

Signed-off-by: forestmvey <forestv@bitquilltech.com>
@forestmvey forestmvey force-pushed the internal-review-highlight-refactor branch from a8e26e0 to e4eeeef Compare September 29, 2022 21:31
@forestmvey forestmvey merged commit 310cafb into integ-highlight-refactor Sep 29, 2022
@forestmvey forestmvey deleted the internal-review-highlight-refactor branch September 29, 2022 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants