run entire unsplit query as a should #1484

blackmad · 2020-08-11T15:32:49Z

This is a change to try to make POI scoring work better, starting with autocomplete.

I've noticed that for lots of queries, the parser is causing us to search for queries in ways that mean we can never find a POI match, even if it's quite good.

"Wells Fargo Bank" is getting split as "wells fargo" IN "bank"
"Bursa" [focus on SF] is always getting interpreted as Bursa the place, (when there's a restaurant in SF)
"Kells Irish Restaurant & Pub" is getting split with "Pub" as a place

so this change tries to always search for the entire query as a should match against the name field. So far it works pretty well. In testing it improves ~10/500 POI queries@1 (likely more at lower positions - Bursa is fixed @3).

I see one loss in my initial testing that I am trying to better understand. It's against our custom data so I'm not sure it's worth pasting here.

This PR includes the restrictIds debugging change because I'm using that to debug this. I will take it out as we get closer to merging.

blackmad · 2020-09-28T21:51:13Z

I want to try this PR again - I think it would help weird parsers like "VIP Auto Collision Repair" a lot. I'm going to fix the merge conflict, refresh my memory of this, and run a diff on some venue queries

blackmad · 2020-09-28T23:03:47Z

In current testing, which I can't really share because it uses private data + private queries, I see ~10% change in place queries. Of ~700 full name POI queries ~70 change, 33 positive, 5 negative, 33 neutral.

of those

two are the repeated scoring issue we've run into in other places (ES is scoring freq=2 across fields)
one is a small loss (1 & 2 switch places) I think because this query might be double boosting exact matches in the name? (so query "walgreens pharmacy" with a focus near a place called "walgreens community pharmacy" is more likely to match a farther away "walgreens pharmacy")
one I'm genuinely confused by (why is it picking a farther away "union bank" when they are named the same)
one is a debatable loss (the far away place it's matching is such a better name match that I think the original query was bad)

running this now on streets

blackmad · 2020-09-28T23:22:30Z

On ~500 queries, a mix of fully qualified US addresses and first lines only, I get 20 diffs. 7 improved, 0 worse, 13 neutral.

Most of the wins are when a query has something tacked onto the end that autocomplete can't figure out -

{ "text": "1009 Murrieta Blvd #83" }
{ "text": "1111 8th Street (CCA/Adobe)" }

blackmad · 2020-10-29T19:10:31Z

This change has an issue where it makes the queries too loose, so we get totally incorrect exact matches for the number+street part of a query because the admin fields have been moved to should, not must

blackmad requested review from orangejulius and missinglink August 11, 2020 15:32

orangejulius mentioned this pull request Aug 12, 2020

Update Pelias Parser to 1.56.1 #1483

Closed

blackmad force-pushed the whole-query-autocomplete branch from 6d487a4 to c073bfd Compare August 12, 2020 21:49

blackmad and others added 7 commits September 28, 2020 17:52

kind works to optionalize whole query

84caaa9

use requested endpoint in esExplainUrl + lint fixes

9bd83a0

match over multi match

de63a0d

make it work again

2c81b56

clean up + make work, tests still broken

069c78f

looser

3ae0232

simplify change more

cf7f877

blackmad force-pushed the whole-query-autocomplete branch from 54d0dfe to cf7f877 Compare September 28, 2020 22:03

clean up tests, add tests

839441e

blackmad changed the title ~~WIP: run entire unsplit query as a should~~ run entire unsplit query as a should Sep 30, 2020

blackmad and others added 4 commits October 12, 2020 08:39

Merge branch 'master' into whole-query-autocomplete

e731cd4

fix tests

63c35c6

rename raw tokens query

7d3864d

rename again

81b7739

blackmad closed this Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run entire unsplit query as a should #1484

run entire unsplit query as a should #1484

blackmad commented Aug 11, 2020

blackmad commented Sep 28, 2020

blackmad commented Sep 28, 2020

blackmad commented Sep 28, 2020

blackmad commented Oct 29, 2020 •

edited

Loading

run entire unsplit query as a should #1484

run entire unsplit query as a should #1484

Conversation

blackmad commented Aug 11, 2020

blackmad commented Sep 28, 2020

blackmad commented Sep 28, 2020

blackmad commented Sep 28, 2020

blackmad commented Oct 29, 2020 • edited Loading

blackmad commented Oct 29, 2020 •

edited

Loading