-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use exponential scoring and consistent scale values for focus.point
#1209
Conversation
exponential
scoring and consistent scale
values for focus.point
focus.point
I think this is great, the only concern I have is potential performance regression, however, the other PRs we discussed today regarding performance will probably more than makeup for it. |
It turns out from our testing that there is a small performance hit for this change (we saw a query have average times go from around 700 to around 800ms). Since some of our shorter text length autocomplete queries are already quite slow, we're going to look at some performance improvements like #1219 or #1215 before merging this. |
dc890df
to
a9da289
Compare
We have made great improvements in slow autocomplete in #1219, I think this can be merged now. |
88a981a
to
982221f
Compare
Linear scoring, by design, gives all records the same score past a certain point. This has the disadvantage that identical records that are very far away cannot be sorted by distance. By using exponential scoring, we can achieve decent sorting of even very far away records. This is very helpful for cities and postalcodes. Connects #1206
The `scale` parameter controls how quickly scores decrease from the maximum as the distance from the `center_point` to the record in question increases. Set this to 50km, which is the same as search. Connects #1206
982221f
to
6d9e511
Compare
Passes as of pelias/api#1209
Our
scale
values for thecenter_point
query used by Elasticsearch to score ourfocus.point
queries was inconsistent between search and autocomplete. (50km for search, 250km for autocomplete).This PR changes autocomplete to 50km, to be consistent with search. It's tough to judge which is the better value, but here's my reasoning:
scale
means scores drop off faster. So a smaller value means that only very close records would have a high enough distance score to outweigh a far away record with high importance or population.Additionally, this PR changes the decay function from
linear
to exponential. I don't recall why we settled on linear, as it was very long ago, but I suspect it might have been an attempt to prevent very far away records from being scored at all. In that case, our understanding of Elasticsearch at the time was incorrect.By using exponential scoring, records of any distance will receive a non-zero score for the distance query. This means we can differentiate between otherwise identically-scoring records that are very far away from the focus point. This is a huge help when searching for administrative areas like localities, as well as postal codes.
Looking at the acceptance tests, I could not find any cases where these changes cause a failure. However, some newly added acceptance tests now pass.
connects #1206