ENH: for training set preparation add option to drop same names witho… #7

mbaak · 2023-12-22T11:25:36Z

For a training set creation, in prepare_name_pairs_pd(), added option to remove all equal names that are not considered a match. This can happen a lot in actual data, e.g. with franchises that are independent but do have the same name. So it's a true effect in data, but it screws up our intuitive notion that identical names should be related. E.g. you may want to set this to true for a model without rank features, which evaluates string similarity.

…ut match For a training set creation, in prepare_name_pairs_pd(), added option to remove all equal names that are not considered a match. This can happen a lot in actual data, e.g. with franchises that are independent but do have the same name. So it's a true effect in data, but it screws up our intuitive notion that identical names should be related. E.g. you may want to set this to true for a model without rank features, which evaluates string similarity.

mbaak requested a review from sbrugman February 16, 2024 19:25

sbrugman approved these changes Feb 16, 2024

View reviewed changes

sbrugman merged commit 5d7d8f3 into main Feb 16, 2024
4 checks passed

sbrugman deleted the drop_samename_nomatch_option branch February 16, 2024 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: for training set preparation add option to drop same names witho… #7

ENH: for training set preparation add option to drop same names witho… #7

mbaak commented Dec 22, 2023

ENH: for training set preparation add option to drop same names witho… #7

ENH: for training set preparation add option to drop same names witho… #7

Conversation

mbaak commented Dec 22, 2023