Releases: moj-analytical-services/splink
Releases · moj-analytical-services/splink
v3.8.0
What's Changed
- Make the example notebooks run faster by @RobinL in #1160
- Add tags by @RossKen in #1165
- Benchmark timeseries commit workflow to run only in upstream repo by @ADBond in #1152
- Create
_register_input_tables
method in our main linker class by @ThomasHepworth in #1172 - Documentation examples by @RossKen in #1159
- Fix autoblack checkout step by @ADBond in #1169
- Add emojis rather than bullets by @RobinL in #1180
- Add option to pass seed into
estimate_u_using_random_sampling
by @RossKen in #1161 - Adjust the outputs of
truth_space_table_from_labels_with_predictions_sqls
to be lowercase by @ThomasHepworth in #1183 - Improve Logging by @NickCrews in #1084
- Improve logging by @ThomasHepworth in #1186
- Add docs for Feature Engineering by @RossKen in #1178
- Add UDFs dev guide by @RossKen in #1182
- [BUG] Fix source dataset issue when running link jobs by @ThomasHepworth in #1193
- 1107 add jaro similarity by @RossKen in #1167
- migrate ComparisonProperties by @ThomasHepworth in #1195
- revert to old comparison script structure by @RossKen in #1197
- 1030 option for auto typecasting datediff by @aliceoleary0 in #1162
- Athena updates by @ThomasHepworth in #1187
- Release 3.8.0 by @RossKen in #1201
New Contributors
- @aliceoleary0 made their first contribution in #1162
Full Changelog: v3.7.3...v3.8.0
v3.7.3
What's Changed
- Linting update by @ADBond in #1131
- Fix autoblack workflow for forks by @ADBond in #1133
- remove invalid comma by @wilko77 in #1143
- Improve readme what does splink do by @RobinL in #1129
- Improve copy writing on readme by @RobinL in #1144
- Improve readme images for clarity by @RobinL in #1145
- Attempt to make examples notebooks action faster by @RobinL in #1147
- ComparisonLevel composition v2 by @NickCrews in #1114
- Update run_demos_examples.yml by @RobinL in #1154
- 1151 fix term frequencies for cols reversed by @afua-moj in #1156
- Add previously breaking tf case to tests by @RossKen in #1157
- Version 3.7.3 by @RossKen in #1158
New Contributors
Full Changelog: v3.7.2...v3.7.3
v3.7.2
v3.7.1
What's Changed
- Fix a couple of typos by @ADBond in #1123
- Fix athena linker invalid reference by @davidschrooten in #1135
- Fix clustering in issue 1136 by @RobinL in #1137
New Contributors
- @davidschrooten made their first contribution in #1135
Full Changelog: v3.7.0...v3.7.1
v3.7.0
What's Changed
- Adjust caching for our concat tables by @ThomasHepworth in #1013
- _initialise_df_concat optionally returns list by @RobinL in #1023
- Df concat and df concat with tf return SplinkDataframe or None by @RobinL in #1033
- [Docs] Add a dev guide for creating new ComparisonLevels and Comparisons to Splink libraries by @ADBond in #1041
- correct module name duckb_base -> duckdb_base by @ADBond in #1046
- Some cache tests by @ADBond in #1050
- Improving the cache and make cache invalidation easier and more robust by @RobinL in #987
- Bump version to 3.7.0 by @RobinL in #1056
- Release 3.7.0 as dev version by @RobinL in #1057
- Adds the ability to read directly from a settings filepath by @ThomasHepworth in #1062
- Add code to produce tf cols from concat_with_tf by @ThomasHepworth in #1065
- Use Ruff as a linter by @NickCrews in #1004
- cast all value values to varchar by @ThomasHepworth in #1049
- Automatically add tables of comparison (level)s compatible with each dialect to docs by @ADBond in #1035
- add the ability to pass pandas df into the
SparkLinker
by @ThomasHepworth in #1068 - Ruff by @ThomasHepworth in #1070
- Replace
flake8
withruff
as our main linter by @ThomasHepworth in #1071 - Loosen dependency ranges by @NickCrews in #1080
- add new award by @RossKen in #1081
- Merge load settings methods by @ThomasHepworth in #1078
- Fix docs workflow by @ADBond in #1073
- Add docs for testing and creating a venv by @ThomasHepworth in #1086
- Added ability to profile nested lists by @zslade in #1074
- Workflow test multiple python versions by @ADBond in #1090
- WIP: Update new_library_comparisons_and_levels.md by @RossKen in #1082
- Added error message to catch pandas null casting issue when read into duckdb by @zslade in #1098
- add a bash script for linting by @ThomasHepworth in #1100
- parametrize datediff tests to clean them up by @ThomasHepworth in #1101
- update with parametrize to test more file loading options by @ThomasHepworth in #1105
- Improve citation by @RobinL in #1108
- Simplify specific implementations of SplinkDataframe by @RobinL in #1116
- Create Distance in KM Comparison library function by @RossKen in #1117
- Rename target_rows argument to max_pairs in estimate_u_using_random_sampling() by @NickCrews in #1087
- Create wrapper function for date comparisons by @RossKen in #1094
- Rename target rows as max_pairs by @RossKen in #1119
- Small Fixes by @NickCrews in #1115
- Fix benchmark comment action to work better with forks by @ADBond in #1122
- Release 3.7.0 proper by @ADBond in #1126
Full Changelog: v3.6.0...v3.7.0
v3.7.0.dev01
What's Changed
- Adjust caching for our concat tables by @ThomasHepworth in #1013
- _initialise_df_concat optionally returns list by @RobinL in #1023
- Df concat and df concat with tf return SplinkDataframe or None by @RobinL in #1033
- [Docs] Add a dev guide for creating new ComparisonLevels and Comparisons to Splink libraries by @ADBond in #1041
- correct module name duckb_base -> duckdb_base by @ADBond in #1046
- Some cache tests by @ADBond in #1050
- Improving the cache and make cache invalidation easier and more robust by @RobinL in #987
- Bump version to 3.7.0 by @RobinL in #1056
- Release 3.7.0 as dev version by @RobinL in #1057
Full Changelog: v3.6.0...v3.7.0.dev01
v3.6.0
What's Changed
- Conda install in readme by @ADBond in #1002
- fix: Safeguard against rounding/overflow errors in great_circle_distance_km_sql() by @NickCrews in #1006
- Create FEATURE_REQUEST.md by @OlivierBinette in #1010
- Drop python 3.6 support by @NickCrews in #1003
- Pin Black to v22 temporariliy, since v23 was removing lines by @RobinL in #1015
- Fixing tooltip for count and sum_matches in comparison viewer dashboard tooltip by @James-Osmond in #1016
- feat: Support sqlglot versions >=5.1.0 by @NickCrews in #1018
- Bump version to 3.6.0 ready for release by @RobinL in #1020
New Contributors
- @OlivierBinette made their first contribution in #1010
Full Changelog: v3.5.4...v3.6.0
v3.5.4
What's Changed
- threshold_actual parameter in precision-recall charts by @James-Osmond in #968
- upating enable_splink method to take spark as input parameter by @robertwhiffin in #989
- James osmond pr chart threshold by @RossKen in #971
- Udf register fix by @robertwhiffin in #993
- Update spark udf jars by @ThomasHepworth in #998
New Contributors
Full Changelog: v3.5.3...v3.5.4
v3.5.3
What's Changed
- Lint SparkLinker by @RobinL in #955
- Fix __splink_df_concat_with_tf_left name by @RobinL in #963
- Cumulative BRs - minor tweaks to fix documentation and rr error by @ThomasHepworth in #974
- Fix distance function comparison typo by @ADBond in #980
add_l_or_r_to_identifier
now has case for type exp.Lambda by @James-Osmond in #979- Edit unlinkables table name when logged in db by @ThomasHepworth in #978
- Add
datediff
comparison levels by @ThomasHepworth in #972 - docs updates by @mamonu in #976
- Update
find_matches_to_new_records
to automatically generate our tf tables by @ThomasHepworth in #983
Full Changelog: v3.5.2...v3.5.3