-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default postcode comparison function #215
Comments
would something like this work or does this need to be fully in SQL format? def PCmatch(pc_l, pc_r):
count = 0
while count < len(pc_l):
if pc_l[count] == pc_r[count]:
count += 1
else:
break
return count |
Yeah, needs to be in the form of a SQL case expression |
have a look on Slack for a ScalaUDF solution. Or is there an easier way and I am overcomplicating things? |
Is that too naive? Wouldn't it be better to convert to a more general geographic indicator (rather than a postal area), like lat/lon, and then compare? With this sort of comparison you might have moved one street but have an entirely different postcode? There must be a lot of solutions to this already with UK gov codebases. |
We have support for distance as the crow flies using lat and long input columns (not yet in the pypi version, will land in the next release): splink/splink/comparison_level_library.py Line 228 in d80b0c7
It can just sometimes be a bit of a faff to join on the geolocation to the input data, so this would provide a quick and dirty solution that would get 80% of the way there |
Could build ontop of #1190 |
Closed by #1230 |
Generate a case expression with 3-5 levels:
The text was updated successfully, but these errors were encountered: