Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use peliasQuery analyzer for address_parts.street #1444

Merged
merged 1 commit into from
Jun 29, 2020

Conversation

orangejulius
Copy link
Member

@orangejulius orangejulius commented Jun 16, 2020

This changes the query-time analyzer for the address_parts.street field from peliasStreet to peliasQuery.

The latter does not generate synonyms, so it'll be much more performant and also likely produce better quality results, so win-win 🤞

@orangejulius orangejulius force-pushed the use_peliasQuery_analyzer_for_address_parts branch from 758b7d2 to d32e2a1 Compare June 16, 2020 18:10
@missinglink missinglink force-pushed the use_peliasQuery_analyzer_for_address_parts branch from d32e2a1 to 9d4c6db Compare June 23, 2020 13:39
@missinglink
Copy link
Member

I added another commit to enable this for autocomplete

@orangejulius
Copy link
Member Author

I ran our complete automated test suite against this PR and the results are positive. There are very few changes, and they nearly all look like improvements:

For example, 15 call str used to match Callstraße 15, Hückelhoven, Germany, but now it doesn't, removing one small case of autocomplete jitter:
Screenshot_2020-06-25_17-20-24

Another interesting one is that West Wrigley Field has now dropped out of the list of results in our still-failing Wrigley Field test:
Screenshot_2020-06-25_17-20-05

I think this PR is good to merge. We are hoping for a performance improvement here, and while we don't have any hard numbers, all the test suites did finish slightly faster (up to 10% faster) running on the same hardware and configuration.

@orangejulius orangejulius marked this pull request as ready for review June 26, 2020 00:25
@orangejulius orangejulius force-pushed the use_peliasQuery_analyzer_for_address_parts branch from 870c7be to 54127dd Compare June 26, 2020 13:57
@missinglink
Copy link
Member

Looks good to me!

@orangejulius
Copy link
Member Author

For the record, we tested this PR in a split test and there is no noticeable performance difference compared to the master branch. I suppose a huge performance win would have been preferable, but we'll take it!

@orangejulius orangejulius force-pushed the use_peliasQuery_analyzer_for_address_parts branch from 54127dd to 4e25f19 Compare June 29, 2020 14:46
This is an experimental change, but one that has good impact when paired
with pelias/schema#446.

Essentially, this change continues the trend we have started for the
`name.*` and `phrase.*` fields to generate synonyms _only_ at index
time.

It may be the case that variations on input vs data text (for example
`crt` vs `ct` vs `court`) may cause different synonyms to be generated
by the same analyzer.

Many queries, especially `match_phrase` queries, will require that _all_
of those generated synonym tokens must match. This is often not
desirable.
@orangejulius orangejulius force-pushed the use_peliasQuery_analyzer_for_address_parts branch from 4e25f19 to fc6e941 Compare June 29, 2020 14:50
@orangejulius orangejulius merged commit e6bbd5f into master Jun 29, 2020
@orangejulius orangejulius deleted the use_peliasQuery_analyzer_for_address_parts branch June 29, 2020 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants