Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Results #328

Closed
TomFoyster opened this issue Jan 31, 2018 · 6 comments
Closed

Inconsistent Results #328

TomFoyster opened this issue Jan 31, 2018 · 6 comments

Comments

@TomFoyster
Copy link

Hi all, I'm having some issues with inconsistent results in Lunr.

Presuming I have 1 object in my store, { label: Assistant }

A search for assist yields one result, however a search for assista yields none. In fact, there's no result for assistan either, but assistant returns the expected result.

I've created a JSFiddle that replicates the issue;

https://jsfiddle.net/mfzx2gq9/1/

There are similar issues with the words Nursing and Accommodation too.

What's going on? Is this an issue with stemming? If so how can it be resolved?

@hoelzro
Copy link
Contributor

hoelzro commented Feb 1, 2018

@TomFoyster You're absolutely right - it is an issue with stemming. If you try stemming each of those variants with this.pipeline.runString from within the builder function, you get these results:

assist    assist
assista   assista
assistan  assistan
assistant assist

I'm guessing the stemmer recognizes the -ant suffix and strips it out. What exactly do you want to accomplish? Do you want any partial string to match?

@TomFoyster
Copy link
Author

@hoelzro Thanks for your reply.

Yes, essentially we want any partial to match - I think I'm struggling to get my head around stemming - this is legacy code written by a contractor that we're now needing to support.

ass, assi, assis, and assist all return a match on assistant - so we need it to follow the whole way through the word.

This is all part of an autocomplete system, so currently as the user types results appear and then disappear, and then reappear again when they finish typing the word.

Massive thanks for your help, it's greatly appreciated.

@TomFoyster
Copy link
Author

@hoelzro I've made some progress, but I'm not quite there.

I've updated the fiddle to better show the issue;

https://jsfiddle.net/mfzx2gq9/5/

I've experimented with some code I've found elsewhere, which has reduced the issue down to a single instance returning no results - assistan. This can be solved by upping the editDistance value to 6 - increasing the fuzzyness of the search. This though would lead to some poor matches and has a noticeable negative impact on the search time, even in this very small example.

If my understanding is correct, and based on the query methods in my example;

ass, assi, and assis are returned as they partially match (with no pipeline)
assist and assistant matches as stemmed they match the stem of assistant, assist.
assista matches as it's caught by the third, fuzzy rule

assistan falls outside of all though as it isn't stemmed at all, and the seach isn't fuzzy enough.

I think, while Lunr is obviously very powerful - it should never have been chosen for this function - unless someone can show me a query method that will work in the way I need?

@hoelzro
Copy link
Contributor

hoelzro commented Feb 1, 2018

@TomFoyster Since this is part of an autocomplete system, it sounds like you need prefix search, and I would agree with your assessment that lunr probably isn't a good fit this task. If you don't need stemming itself (eg. normalizing jumped, jumps, and jumping to jump), you could turn stemming off in the pipeline, and that might give you better results. I find wildcards and stemming kind of create a strange situation, since assistan* won't match assistant, then, because the latter is stemmed down to assist. If you're looking for fuzzy searches, you may have better luck with a library like http://fusejs.io - I haven't used it myself, but it seems to be a more suitable fit. I don't know how tightly integrated lunr is into your application, though!

@olivernn
Copy link
Owner

olivernn commented Feb 5, 2018

As @hoelzro has suggested, removing the stemming is probably the right approach here, I've updated the fiddle to show how.

Hopefully without stemming you should get results that make more sense in an autocomplete. Autocomplete wasn't the original intended use case for Lunr, I would hope it would at least be possible to get reasonable results with the right configurations though.

This can be solved by upping the editDistance value to 6

Yeah, that is going to be slow! That will result in 125,549 lookups against the index:

lunr.TokenSet.fromFuzzyString("assistan", 6).toArray().length

If speed is of a concern you can balance dropping the leading wildcard from the search, it will be a tradeoff though between speed and result accuracy/recall.

I can put together a guide on the website about setting up queries/indexes for use in an autocomplete search that might help others in the future.

@olivernn olivernn closed this as completed Feb 5, 2018
@olivernn olivernn mentioned this issue Feb 19, 2018
@Frexuz
Copy link

Frexuz commented Oct 23, 2018

Borrowing issue
I have the same issue. But I can't quite figure out how to turn off the stemmer? :)

EDIT: Found it :)

this.pipeline.remove(lunr.stemmer)
this.searchPipeline.remove(lunr.stemmer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants