-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postcode comparison template #1230
Conversation
Test: test_2_rounds_1k_duckdbPercentage change: -28.0%
Test: test_2_rounds_1k_sqlitePercentage change: -24.0%
Click here for vega lite time series charts |
@zslade this looks good at first look (but currently on my phone so will look at properly tomorrow). One thing I think should be added is the ability to add multiple km distance levels. You can see an example of how to deal with multiple levels at splink/splink/comparison_template_library.py Line 195 in 6e3a2e1
So you will need to use ensure_is_iterable then distance_threshold_comparison_levels to get this working for the levels themselves.
Plus add to the comparison description similar to splink/splink/comparison_template_library.py Line 275 in 6e3a2e1
With distance_threshold_description
|
Also, it would be good to update the feature engineering topic guide with this function for the postcode section |
And there are postcode columns in some of the splink demos which should be updated with this function instead |
…-services/splink into postcode_wrapper_template
…old_comparison_levels`
Hey @zslade , I was just having a skim through the fe docs. Apologies, I should have been clearer on what I was thinking here. The focus of that topic guide is on what adding additional columns can do to improve a splink model, so the one around postcodes was mainly looking at how adding lat/long could add additional levels to match on. So here, all I was thinking was to replace the initial Apologies for any confusion - that's my bad! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I just pushed some small changes, and now happy for you to merge! 🎉
Addresses issue #215
Comparison template for postcode column. The default arguments will give a comparison with levels:
- Exact match on full postcode
- Exact match on sector
- Exact match on district
- Exact match on area
- All other comparisons
with an optional 'distance in km' comparison level