Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers #1330

Merged
merged 1 commit into from
Jul 30, 2019

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Jul 10, 2019

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers with the new peliasQuery analyzer.
related schema PR: pelias/schema#370

this PR will ensure that query-time synonym substitution is disabled for clauses targeting the name.* fields.
see: the schema PR linked above for more info.

I am hoping to see a performance improvement as a result and expect to see 0 regressions 🤞

@orangejulius
Copy link
Member

I tested this out on a full planet build, and there are indeed no regressions! 🎉

Thinking about it a bit further though, I doubt this would result in any performance improvements. Since we still are now exclusively using synonym expansion at index time, the same number of documents will still be matched (any document that is a match for south, for example, is a match for s, so having only one, instead of both tokens in the query wouldn't change the result).

So unless there are big performance wins from not having to consider a couple extra tokens when the document hit count is the same, I wouldn't expect us to see a difference.

But it can't hurt performance and if it keeps our analyzers a bit cleaner then it's still a win.

@missinglink
Copy link
Member Author

I figured it would need to do fewer lookups on the inverted-index and fewer bitmask operations for the same hit count.
It seems to me that considering variations of the input is unnecessary, but not sure about how much of a perf benefit will result.

…nd peliasQueryFullToken analyzers with the new peliasQuery analyzer
@orangejulius
Copy link
Member

orangejulius commented Jul 30, 2019

Okay, we took another look at this, and we definitely want to merge it! By avoiding synonym handling at query time, we can massively reduce the number of elasticsearch hits to consider.

Here are some metrics showing the significance of the reduction in es hits:
Screen Shot 2019-07-30 at 2 23 41 PM

Additionally, while there aren't any acceptance test regressions, I noticed at least one improvement
Screen Shot 2019-07-29 at 5 09 41 PM
Screen Shot 2019-07-29 at 5 09 16 PM
https://pelias.github.io/compare/#/v1/autocomplete%3Fdebug=true&layers=street,address,venue,coarse&text=South%20Street,%20Augusta-Richmond%20County,%20GA,%20USA

We suspect that reducing the hit count dramatically (in this case from 75M to <5M) reduces the chance that a record would be erroneously scored above the "correct" result.

🎉

@orangejulius orangejulius merged commit 24fd1bf into master Jul 30, 2019
@orangejulius orangejulius deleted the peliasQueryAnalyzer branch July 30, 2019 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants