Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted phrase search behaviour (v0-14-0-2) #11398

Closed
mitchdawson1982 opened this issue Sep 17, 2024 · 2 comments
Closed

Quoted phrase search behaviour (v0-14-0-2) #11398

mitchdawson1982 opened this issue Sep 17, 2024 · 2 comments
Assignees
Labels
bug Bug report

Comments

@mitchdawson1982
Copy link

mitchdawson1982 commented Sep 17, 2024

Hello, the DataHub demo instance's search appears to be broken. I am looking at an example from the docs:

If you want to:

Exact match on term or phrase
"datahub_schema" Sample results
datahub_schema Sample results
Enclosing one or more terms with double quotes will enforce exact matching on these terms, preventing further tokenization.

Firstly neither of the two links shown produce any results on the datahub demo instance any longer.

Since upgrading the quoted phrased search behaviour has changed significantly, please see the outputs below from the demo platform.

Unquoted search on the demo platform:
37 results returned including some with the phrase in the description.
image

If I wanted to narrow it to entities containing this phrase.

Quoted phrased search on the demo platform:
No results were returned.
image

Quoted phrase matching the entire description value:

1 result returned
image

We note that quoted searches were categorised for exact matching, but surely we should not expect to enter the entire phrase to locate results. This doesn't appear consistent with previous phrase matching behaviour.

@mitchdawson1982 mitchdawson1982 added the bug Bug report label Sep 17, 2024
@david-leifker david-leifker self-assigned this Sep 20, 2024
@david-leifker
Copy link
Collaborator

david-leifker commented Sep 20, 2024

  1. I think the key regression is a boolean flag where prefix matching is disabled on quoted search. What this means is that for certain fields this code is not triggered. This is not a general phrase match however and is only a prefix phrase match. Your examples show this behavior where the phrase is at the start of the matched string. If the phrase was internal or at the end it would fail to match and this is the historical behavior for search.

  2. The only place that search executes an actual phrase match is on certain fields that are designed to power autocomplete, indicated by the ngram analyzer on those fields. This doesn't cover the description however.

Solutions:

  1. I will flip the flag in the search configuration since I believe it restores some of the expected behavior and doesn't negate the other functions in the referenced PR. This will be included in the next release.

  2. The search_config.yaml can be overridden for v0.14.x and earlier versions which support search configuration in the same way.

  3. Added test case and added a test to ensure that the default configuration and the test configuration doesn't deviate unexpectedly.

Ref: #11444

@david-leifker
Copy link
Collaborator

david-leifker commented Sep 21, 2024

Solutions:
2b) If using Helm the configuration can be overridden in the helm chart values in this section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants