Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: for training set preparation add option to drop same names witho… #7

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

mbaak
Copy link
Contributor

@mbaak mbaak commented Dec 22, 2023

For a training set creation, in prepare_name_pairs_pd(), added option to remove all equal names that are not considered a match. This can happen a lot in actual data, e.g. with franchises that are independent but do have the same name. So it's a true effect in data, but it screws up our intuitive notion that identical names should be related. E.g. you may want to set this to true for a model without rank features, which evaluates string similarity.

…ut match

For a training set creation, in prepare_name_pairs_pd(), added option to remove all equal names that are
not considered a match. This can happen a lot in actual data, e.g. with franchises that are independent
but do have the same name. So it's a true effect in data, but it screws up our intuitive notion that identical
names should be related. E.g. you may want to set this to true for a model without rank features, which
evaluates string similarity.
@mbaak mbaak requested a review from sbrugman February 16, 2024 19:25
@sbrugman sbrugman merged commit 5d7d8f3 into main Feb 16, 2024
4 checks passed
@sbrugman sbrugman deleted the drop_samename_nomatch_option branch February 16, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants