Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve geocoding fuzziness, remove street corners #5401

Merged

Conversation

leonardehrenfried
Copy link
Member

@leonardehrenfried leonardehrenfried commented Oct 6, 2023

Summary

@miles-grant-ibigroup has requested the following changes to the sandbox geocoder:

  • Fuzziness is improved through a more elaborate query strategy
  • An NGram index is added which allows also matches in the middle of a word so "exanderp" matches "Alexanderplatz"
  • The stop clusters, which group stops with the same name, now have a larger radius from 10m to 100m.

Feature removal

This PR also removes the street corners from the geocoding index, which improves startup time. @flaktack, who originally added it, said that he would be fine if it was removed.

Unit tests

Lots added.

Documentation

Updated.

@codecov
Copy link

codecov bot commented Oct 6, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (00c0304) 66.59% compared to head (3dcbc98) 66.64%.
Report is 87 commits behind head on dev-2.x.

❗ Current head 3dcbc98 differs from pull request most recent head 2714adc. Consider uploading reports for the commit 2714adc to get more accurate results

Additional details and impacted files
@@              Coverage Diff              @@
##             dev-2.x    #5401      +/-   ##
=============================================
+ Coverage      66.59%   66.64%   +0.05%     
- Complexity     15284    15290       +6     
=============================================
  Files           1790     1791       +1     
  Lines          69388    69389       +1     
  Branches        7308     7306       -2     
=============================================
+ Hits           46206    46244      +38     
+ Misses         20704    20668      -36     
+ Partials        2478     2477       -1     
Files Coverage Δ
...tripplanner/ext/geocoder/EnglishNGramAnalyzer.java 100.00% <100.00%> (ø)
...pentripplanner/ext/geocoder/StopClusterMapper.java 96.55% <100.00%> (ø)
...ntripplanner/framework/geometry/WgsCoordinate.java 90.56% <100.00%> (+0.56%) ⬆️
...entripplanner/openstreetmap/model/OSMWithTags.java 88.73% <ø> (ø)
...opentripplanner/ext/geocoder/GeocoderResource.java 0.00% <0.00%> (ø)
.../org/opentripplanner/ext/geocoder/LuceneIndex.java 87.68% <91.66%> (+19.10%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leonardehrenfried leonardehrenfried added the IBI Developed by or important for IBI Group label Oct 6, 2023
@leonardehrenfried leonardehrenfried changed the title Improve geocoding fuzziness Improve geocoding fuzziness, remove street corners Oct 10, 2023
@leonardehrenfried
Copy link
Member Author

Add test for rounding.

@leonardehrenfried
Copy link
Member Author

@flaktack Could you perhaps review this?

@t2gran t2gran added this to the 2.5 (next release) milestone Oct 10, 2023
Copy link
Member

@t2gran t2gran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NONE Sandbox changes look good! I have not looked at the code in the Sandbox.

Co-authored-by: Zsombor Welker <flaktack@users.noreply.github.com>
@leonardehrenfried leonardehrenfried merged commit 0c97ab9 into opentripplanner:dev-2.x Oct 20, 2023
5 checks passed
@leonardehrenfried leonardehrenfried deleted the geocoder-fuzziness branch October 25, 2023 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IBI Developed by or important for IBI Group Improvement Sandbox Skip Changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants