autocomplete: extend additional name fields used in multimatch queries #1620

missinglink · 2022-03-31T18:38:12Z

this PR is an experiment with splitting up the name.* fields in order to avoid the negative effects of field norms due to field length, reported in pelias/openstreetmap#507 and better explained in pelias/pelias#862

in particular we see this issue in OSM and WOF due to those sources having more alt names than others, although it applies to all sources.

as discussed on our call today, it might be that pelias/openstreetmap#435 exacerbated the issue (albeit unknown at the time) so reversing that method and moving back to multiple fields using a multi_match query should result in a significant reduction in the effects of the field norms issue on scoring.

although fairly arbitrary, I've identified 4 new fields to begin with:

alt - this field will contain all alternative names, so the norms penalty will no longer apply to the primary name. this includes variants, colloquialisms & other alternatives
abbr - abbreviations, ie. succulent representations of the primary name
code - similar to above but distinct in the case of airports, stop IDs etc.
org - brands, operators etc.

we may very well change these, maybe abbr and code can be merged, or org omitted, that's up for discussion.
the main difference is that we attempt to have only a single token indexed per field.

…queries

missinglink · 2022-03-31T18:53:05Z

I'm not sure if we want to keep using best_fields, maybe cross_fields is better if it doesn't suffer the same norms issue.

missinglink · 2022-04-01T09:40:16Z

missinglink · 2022-04-04T10:16:06Z

this looks very promising:

worth noting we will need to make similar changes to the /v1/search subqueries, otherwise some aliases which were previously searchable are now not (Phoenix Sky Harbor Intern.... in this example, no. 2 on the left)

missinglink · 2022-04-04T10:22:45Z

Interestingly, the popularity boosting may now be too strong (rather than too weak as proposed in #1619), or maybe this was always the case 🤔

For example, this /v1/search query has an exact matching result but the scoring of all top n items seems to be heavily influenced by the popularity value:
https://pelias.github.io/compare/#/v1/search?text=pyramids+of+giza&debug=1

These tests have not had great pass rate until pelias/api#1620, so we didn't know that the `distanceThresh` value for the coordinate checks weren't quite correct. Hopefully they will be passing soon!

missinglink · 2022-04-15T10:20:24Z

As discussed offline, I've pushed a new commit which changes this behaviour to use wildcards instead of explicit field names, I feel like this is more flexible. The _ delimiter is unfortunately required otherwise German would match the default field. (ie. de* == de && default), using - could potentially cause conflict with hyphenated language codes.

missinglink · 2022-04-15T10:22:51Z

as-is this PR is safe to merge since it's backward compatible.

feat(autocomplete): extend additional name fields used in multimatch …

3c8792a

…queries

missinglink mentioned this pull request Apr 1, 2022

tags: map alternate names to individual fields pelias/openstreetmap#567

Draft

orangejulius mentioned this pull request Apr 14, 2022

Improve PHX airport POI tests pelias/acceptance-tests#562

Merged

feat(autocomplete): add wilcard field matching to multimatch queries

caaf8de

missinglink marked this pull request as ready for review April 15, 2022 10:22

This was referenced Apr 19, 2022

explode name fields pelias/model#149

Draft

add wildcard & lang field matching to multimatch queries pelias/query#131

Draft

missinglink added 2 commits April 21, 2022 11:43

feat(search): add wildcard and lang field matching to search queries

5e7e9af

feat(search): add wildcard and lang field matching to search queries

82afcd8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autocomplete: extend additional name fields used in multimatch queries #1620

autocomplete: extend additional name fields used in multimatch queries #1620

missinglink commented Mar 31, 2022

missinglink commented Mar 31, 2022 •

edited

Loading

missinglink commented Apr 1, 2022

missinglink commented Apr 4, 2022 •

edited

Loading

missinglink commented Apr 4, 2022 •

edited

Loading

missinglink commented Apr 15, 2022 •

edited

Loading

missinglink commented Apr 15, 2022

autocomplete: extend additional name fields used in multimatch queries #1620

Are you sure you want to change the base?

autocomplete: extend additional name fields used in multimatch queries #1620

Conversation

missinglink commented Mar 31, 2022

missinglink commented Mar 31, 2022 • edited Loading

missinglink commented Apr 1, 2022

missinglink commented Apr 4, 2022 • edited Loading

missinglink commented Apr 4, 2022 • edited Loading

missinglink commented Apr 15, 2022 • edited Loading

missinglink commented Apr 15, 2022

missinglink commented Mar 31, 2022 •

edited

Loading

missinglink commented Apr 4, 2022 •

edited

Loading

missinglink commented Apr 4, 2022 •

edited

Loading

missinglink commented Apr 15, 2022 •

edited

Loading