-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit for email comparison level feature. #1277
Conversation
…ison to comparison template libraries, as well as necessary changes in duckdb and spark comparison template libraries, and added rurudimentary test for this feature.
One additional thing worth doing for this one is adding |
…s, but others using incorrect m_value parameters for specific levels. Have fixed these.
for gamma, id_pairs in size_gamma_lookup.items(): | ||
for left, right in id_pairs: | ||
print(f"Checking IDs: {left}, {right}") | ||
assert ( | ||
linker_output.loc[ | ||
(linker_output.unique_id_l == left) | ||
& (linker_output.unique_id_r == right) | ||
]["gamma_email"].values[0] | ||
== gamma | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using this all over the place now.
Can you migrate it into its own function?
You could even add it to confest.py
so it can be reused across scripts more easily. Lmk if you need more info on conftest.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep agreed, this seems like a good idea
This seems to be working pretty well. One thing I did notice is that the chart labels for JW aren't super informative e.g. I added a parameter to specify the column name manually for the exact match call function. I will add this quickly on Monday to the fuzzy level functions on this PR to improve this chart. I'm still not entirely sure how happy I am with the JW on full email than on username - but I will think about it a bit more to see if I can come up with an alternative |
Test: test_2_rounds_1k_duckdbPercentage change: -25.9%
Test: test_2_rounds_1k_sqlitePercentage change: -21.7%
Click here for vega lite time series charts |
…l-services/splink into email_comparrison_template
Also, there has been a slight restructure to the backend specific comparison folders do I have updated this branch to master to line up with it |
…l-services/splink into email_comparrison_template
…nishing as tests need to be run via the AP.
Looking into Athena linking capability- as JW is not supported I've taken the decision to leave this out of the PR. We can re-consider if it gets raised as an issue in future. |
…ross multiple tests. Initial commit as not yet working for spark.
…irectory structure merged in from master in the prior commit. Created a pytest fixture called test_gamma_assert to prevent repetition of asserting the gamma lookup specified and the resultant comparison levels are the same. Updated the email test to reflect this.
…d in the function doccumentaton that were corrected.
…l-services/splink into email_comparrison_template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sama-ds for your work on this 👍 I think it is good to be merged, but I have made some changes so have a look through to double-check you are happy before hitting go 😊
Added email comparison to comparison template libraries, as well as necessary changes in duckdb and spark comparison template libraries, and added rudimentary test for this feature.
STILL TO DO: