Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support highlighting function in SQL/PPL query engine #636

Closed
5 of 6 tasks
acarbonetto opened this issue Jun 7, 2022 · 10 comments
Closed
5 of 6 tasks

Support highlighting function in SQL/PPL query engine #636

acarbonetto opened this issue Jun 7, 2022 · 10 comments
Labels
enhancement New feature or request

Comments

@acarbonetto
Copy link
Collaborator

acarbonetto commented Jun 7, 2022

Related design for relevance-based search in SQL/PPL engine - #182

TODO List

  • Support the highlight function by including it in the search engine
  • Enable highlight function in SQL syntax and parser, including parameters
  • Enable highlight function in PPL syntax and parser, including parameters
  • Add unit tests
  • Add integration tests for highlighting
  • Update the user manual - append to existing match search documentation

Function Details

(reference: https://opensearch.org/docs/latest/opensearch/ux/#highlight-query-matches)

The highlighting function adds search term results to the service response for matched terms in fields. Highlighting only works in tandem with match-search queries. The list of highlighted fields must be present in a relevance-based search function, or else the highlighted fields will result in a syntax error.

Syntax

highlight([field_list, …][, option=<option_value>]*) 

Available Options

  • pre_tags: per-term tag to embed in the highlighted result, default is <em>
  • post_tags: post-term tag to embed in the highlighted result, default is </em>

Sample query

GET shakespeare/_search
{
  "query": {
    "match": {
      "text_entry": "life"
    }
  },
  "highlight": {
    "fields": {
      "text_entry": {}
    }
  }
}

SQL

SELECT highlight("text_entry") as "highlight_text_entry" FROM shakespeare WHERE match("text_entry", "life");

PPL

source=shakespeare | where match("text_entry", "life") | highlight("text_entry")

Sample response

| highlight_text_entry                      |
| "my <em>life</em>, except my <em>life</em>." |
@forestmvey
Copy link
Collaborator

forestmvey commented Aug 15, 2022

Looking for input on what should be expected JSON output for the SQL plugin to return from a multi-field highlight query. OpenSearch will respond to a multi-field highlight query with the JSON output I have defined below:

"hits": ... {
    "highlight": {
        "Field1": [
            "highlights <em>hl</em>"
        ],
        "Field2": [
            "<p>highlight"
        ]
    }
}

The SQL plugin can handle this response with the nested type allowing for the returned fields to be accessed using the '.' notation. Should the SQL plugin nest all returned fields under a single column shown in the following output:

{
    "schema": [
      {
        "name": "highlight(\"*\")",
        "type": "nested"
      }
    ],
    "datarows": [
      [
       {
        "Field1": [
            "highlights <em>hl</em>"
        ],
        "Field2": [
            "<p>highlight"
        ]
       }
      ],
...
    ],
  }

The SQL plugin could output the JSON format as separate columns for each returned field. Similar to a "SELECT * from ..." SQL query. See the following JSON output for the SQL plugin to output multiple returned fields as multiple columns:

{
    "schema": [
      {
        "name": "highlight(\"*\").field1",
        "type": "keyword"
      },
      {
        "name": "highlight(\"*\").field2",
        "type": "keyword"
      }
    ],
    "datarows": [
      [
        [
          "<em>field 1</em> result 1",
          "<em>field 2</em> result 2"
        ]
      ],
...
    ],
  }

@MaxKsyunz
Copy link
Collaborator

@penghuo, @dai-chen, @joshuali925 which response structure do you think is more appropriate?

@dai-chen
Copy link
Collaborator

@forestmvey Quick question: If we go for option #2, is it possible there are more than one nested level to flatten? ex. highlight(*).field1.nestField1

@forestmvey
Copy link
Collaborator

@forestmvey Quick question: If we go for option #2, is it possible there are more than one nested level to flatten? ex. highlight(*).field1.nestField1

@dai-chen
I would expect if the data source has fields nested in this way, that this would be possible. Perhaps option #1 would be more predictable in this case.

@dai-chen
Copy link
Collaborator

Yeah, I'm thinking about the same. If we go for option #2, not sure about the complexity and if we may fall back to option 1 in certain case. If we only support simple unnested field case, it should be fine.

@MaxKsyunz
Copy link
Collaborator

@acarbonetto The syntaxhighlight(... -- using a parenthesis -- is different from the syntax of existing PPL commands. Looking at the docs, the different arguments are usually delimited with spaces. Shouldn't highlight be the same?

@forestmvey
Copy link
Collaborator

forestmvey commented Oct 3, 2022

@dai-chen @joshuali925 @penghuo
Here's a quick demo on usage for highlight in SQL and PPL for #827.
(NOTE: highlight in PPL is still undergoing design. discussions of PPL syntax can be made here)

highlight_demo.mp4

@dai-chen
Copy link
Collaborator

dai-chen commented Oct 3, 2022

@dai-chen @joshuali925 @penghuo Here's a quick demo on usage for highlight in SQL and PPL for #827.

highlight_demo.mp4

@forestmvey Thanks for the work! We may post it in discussion as well like what @MaxKsyunz did? #850

@forestmvey
Copy link
Collaborator

@dai-chen @joshuali925 @penghuo Here's a quick demo on usage for highlight in SQL and PPL for #827.
highlight_demo.mp4

@forestmvey Thanks for the work! We may post it in discussion as well like what @MaxKsyunz did? #850

Here I have posted it, thanks for this: #879

@dai-chen
Copy link
Collaborator

Closing this and track the only remaining item in #916.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants