Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run entire unsplit query as a should #1484

Closed
wants to merge 12 commits into from
Closed

Conversation

blackmad
Copy link
Contributor

This is a change to try to make POI scoring work better, starting with autocomplete.

I've noticed that for lots of queries, the parser is causing us to search for queries in ways that mean we can never find a POI match, even if it's quite good.

"Wells Fargo Bank" is getting split as "wells fargo" IN "bank"
"Bursa" [focus on SF] is always getting interpreted as Bursa the place, (when there's a restaurant in SF)
"Kells Irish Restaurant & Pub" is getting split with "Pub" as a place

so this change tries to always search for the entire query as a should match against the name field. So far it works pretty well. In testing it improves ~10/500 POI queries@1 (likely more at lower positions - Bursa is fixed @3).

I see one loss in my initial testing that I am trying to better understand. It's against our custom data so I'm not sure it's worth pasting here.

This PR includes the restrictIds debugging change because I'm using that to debug this. I will take it out as we get closer to merging.

@blackmad
Copy link
Contributor Author

I want to try this PR again - I think it would help weird parsers like "VIP Auto Collision Repair" a lot. I'm going to fix the merge conflict, refresh my memory of this, and run a diff on some venue queries

@blackmad
Copy link
Contributor Author

In current testing, which I can't really share because it uses private data + private queries, I see ~10% change in place queries. Of ~700 full name POI queries ~70 change, 33 positive, 5 negative, 33 neutral.

of those

  • two are the repeated scoring issue we've run into in other places (ES is scoring freq=2 across fields)
  • one is a small loss (1 & 2 switch places) I think because this query might be double boosting exact matches in the name? (so query "walgreens pharmacy" with a focus near a place called "walgreens community pharmacy" is more likely to match a farther away "walgreens pharmacy")
  • one I'm genuinely confused by (why is it picking a farther away "union bank" when they are named the same)
  • one is a debatable loss (the far away place it's matching is such a better name match that I think the original query was bad)

running this now on streets

@blackmad
Copy link
Contributor Author

On ~500 queries, a mix of fully qualified US addresses and first lines only, I get 20 diffs. 7 improved, 0 worse, 13 neutral.

Most of the wins are when a query has something tacked onto the end that autocomplete can't figure out -

{ "text": "1009 Murrieta Blvd #83" }
{ "text": "1111 8th Street (CCA/Adobe)" }

@blackmad blackmad changed the title WIP: run entire unsplit query as a should run entire unsplit query as a should Sep 30, 2020
@blackmad
Copy link
Contributor Author

blackmad commented Oct 29, 2020

This change has an issue where it makes the queries too loose, so we get totally incorrect exact matches for the number+street part of a query because the admin fields have been moved to should, not must

@blackmad blackmad closed this Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant