replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers #1330

missinglink · 2019-07-10T11:38:39Z

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers with the new peliasQuery analyzer.
related schema PR: pelias/schema#370

this PR will ensure that query-time synonym substitution is disabled for clauses targeting the name.* fields.
see: the schema PR linked above for more info.

I am hoping to see a performance improvement as a result and expect to see 0 regressions 🤞

orangejulius · 2019-07-10T14:05:57Z

I tested this out on a full planet build, and there are indeed no regressions! 🎉

Thinking about it a bit further though, I doubt this would result in any performance improvements. Since we still are now exclusively using synonym expansion at index time, the same number of documents will still be matched (any document that is a match for south, for example, is a match for s, so having only one, instead of both tokens in the query wouldn't change the result).

So unless there are big performance wins from not having to consider a couple extra tokens when the document hit count is the same, I wouldn't expect us to see a difference.

But it can't hurt performance and if it keeps our analyzers a bit cleaner then it's still a win.

missinglink · 2019-07-10T14:31:26Z

I figured it would need to do fewer lookups on the inverted-index and fewer bitmask operations for the same hit count.
It seems to me that considering variations of the input is unnecessary, but not sure about how much of a perf benefit will result.

…nd peliasQueryFullToken analyzers with the new peliasQuery analyzer

orangejulius · 2019-07-30T18:20:46Z

Okay, we took another look at this, and we definitely want to merge it! By avoiding synonym handling at query time, we can massively reduce the number of elasticsearch hits to consider.

Here are some metrics showing the significance of the reduction in es hits:

Additionally, while there aren't any acceptance test regressions, I noticed at least one improvement

https://pelias.github.io/compare/#/v1/autocomplete%3Fdebug=true&layers=street,address,venue,coarse&text=South%20Street,%20Augusta-Richmond%20County,%20GA,%20USA

We suspect that reducing the hit count dramatically (in this case from 75M to <5M) reduces the chance that a record would be erroneously scored above the "correct" result.

🎉

missinglink requested a review from orangejulius July 10, 2019 11:38

feat(peliasQueryAnalyzer): replace usage of peliasQueryPartialToken a…

18ecfb2

…nd peliasQueryFullToken analyzers with the new peliasQuery analyzer

orangejulius force-pushed the peliasQueryAnalyzer branch from 5f39580 to 18ecfb2 Compare July 29, 2019 14:13

orangejulius approved these changes Jul 30, 2019

View reviewed changes

orangejulius merged commit 24fd1bf into master Jul 30, 2019

orangejulius deleted the peliasQueryAnalyzer branch July 30, 2019 18:25

missinglink mentioned this pull request Dec 12, 2019

Remove peliasQueryFullToken analyzer pelias/schema#407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers #1330

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers #1330

missinglink commented Jul 10, 2019 •

edited

Loading

orangejulius commented Jul 10, 2019

missinglink commented Jul 10, 2019

orangejulius commented Jul 30, 2019 •

edited

Loading

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers #1330

replace usage of peliasQueryPartialToken and peliasQueryFullToken analyzers #1330

Conversation

missinglink commented Jul 10, 2019 • edited Loading

orangejulius commented Jul 10, 2019

missinglink commented Jul 10, 2019

orangejulius commented Jul 30, 2019 • edited Loading

missinglink commented Jul 10, 2019 •

edited

Loading

orangejulius commented Jul 30, 2019 •

edited

Loading