Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WildcardQuery do not work? #106

Open
willie68 opened this issue Apr 7, 2022 · 10 comments
Open

WildcardQuery do not work? #106

willie68 opened this issue Apr 7, 2022 · 10 comments

Comments

@willie68
Copy link

willie68 commented Apr 7, 2022

Just a short question: Is the WildcardQuery already working?
Because, i have a doc index with a text field called X-Tenant, value is MCS
Search with X-Tenant: MCS will find this doc
but X-Tenant: M* will not find anything.
Nor MC? or M??

@willie68
Copy link
Author

willie68 commented Apr 7, 2022

Just some new aspects:
mc* will find the docs. So it seems that if there is a wildcard in place, the term should be in lower case.

@mschoch
Copy link
Member

mschoch commented Apr 7, 2022

if there is a wildcard in place, the term should be in lower case

That is not universally true, and would depend on the analyzer used for the field.

To debug specific issues of working/not-working it is most helpful to provide a runnable example. Too many behaviors depend on the actual data, and the configuration of the analyzers.

@willie68
Copy link
Author

willie68 commented Apr 7, 2022

Hi,
Thanks for the answer.
I use the default for everything. Nothing special is configured for any field.

Here is some sample code:
main.go.txt

In line 61 change code from "BL*" tp "bl*" and you will find the doc.

@mschoch
Copy link
Member

mschoch commented Apr 7, 2022

Great, so what is the issue now?

@willie68
Copy link
Author

willie68 commented Apr 7, 2022

As you can see, the body contains the text bluge...
You will find the doc with bl* but not with BL*
If the body contains "Bluge..." bl* will find the doc, too. BL* not.
Even if the body is "BLuge" BL* will find nothing.
So the search term is case sensitive. You will only find if the search phrase is in lower case. (
On the other hand using a MatchQuery is case-insensitiv.
Both will find the right doc.
query := bluge.NewMatchQuery("BLUGE").SetField("Body")
query := bluge.NewMatchQuery("bluge").SetField("Body")

@willie68
Copy link
Author

willie68 commented Apr 7, 2022

And even if i'm using
querystr.ParseQueryString(query, querystr.DefaultOptions())
to parse the query string into a bluge.Query structure, this will not work. Wildcard search terms must be in lowercase to get a result. Which is difficult for the end user to understand.

@mschoch
Copy link
Member

mschoch commented Apr 7, 2022

There is a fundamental difference between how match queries and wildcard queries work, but it has to do with the application of an analyzer to the search term. This indirectly leads to the behavior you are seeing, but I will again say it is wrong think that all wildcard searches are lower-case.

When you do the following match queries:

query := bluge.NewMatchQuery("BLUGE").SetField("Body")
query := bluge.NewMatchQuery("bluge").SetField("Body")

The search terms are analyzed, that means the in BOTH cases, we are actually searching the index for bluge. However, consider the following wildcard queries:

query := bluge.NewWildcardQuery("BL*").SetField("Body")
query := bluge.NewWildcardQuery("bl*").SetField("Body")

The wildcard query does NOT analyze the search term. Because the analyzer you are using makes all index terms lower-case, the first query will never match anything. That is not a bug, that is working as expected.

Earlier, you had said, "if there is a wildcard in place, the term should be in lower case". And that is not true generally, it is true because you are using the standard analyzer on your text. If you had used a custom analyzer that did not lower-case all the input, and there were terms with upper-case letters, you could use them in your wildcard pattern, and they would work as expected.

@willie68
Copy link
Author

willie68 commented Apr 8, 2022

Ok. I understand.
So if I use directly the Query API I have to take the default analyzer into account. Ok
Than the only thing is, when I use the querystr.ParseQueryString(query, querystr.DefaultOptions()) (with the default option) to directly convert a user query, it will lead into the same problem. If the user inputs Bl* the search find nothing. What do I have to do than?

@mschoch
Copy link
Member

mschoch commented Apr 8, 2022

Basically the wildcard search does not directly do what you want, I see 2 choices:

  • use the standard analyzer to analyze the search term(s) manually, and then pass to the existing wildcard query. you'll have to decide how to handle cases when the analyzer produces more than one term.
  • switch from using wildcard query to a regular expression, this regex could match either case for the literal characters (internally we convert wildcard to regexp at the next step anyway)

Neither of these changes are easily used by the query string package, so you'd likely have to maintain your own version of that too.

@willie68
Copy link
Author

Sorry I can only reply now. Easter holidays!
The quintessence from the user's point of view is that even if I use everything in the default, querystr and the standard analyzer, the two do not work together in a user-friendly way. So I have to exchange one.
Hmmm, since I don't really like the syntax of querystr for my application anyway, I'll probably write my own parser.
Thank you for your explanations and efforts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants