Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect score for similarity=True #3

Open
bemgreem opened this issue Apr 13, 2021 · 0 comments
Open

Incorrect score for similarity=True #3

bemgreem opened this issue Apr 13, 2021 · 0 comments

Comments

@bemgreem
Copy link

Great package but I just noticed a bug with the the score in certain situations. If I run
damerauLevenshtein('some string', 'another one but longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 0.03636... but if I run
damerauLevenshtein('some string', 'another one but longer and longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 1.0 implying the two strings are identical.

From what I could see, it looks like the issue stems from the line of code
maxDist = min(len1, len2) * min(replaceWeight, deleteWeight + insertWeight) + (max(len1, len2) - min(len1, len2)) * min(deleteWeight, insertWeight)
which is (assuming I've understood your code) supposed to calculate the maximum distance as the cost of swapping out letters in the shorter word + the cost of adding/removing any excess letters

But for my example strings, I believe it should use the insertWeight at the end rather than min(deleteWeight, insertWeight) - there's no way to get from string1 to string2 by deletion, it definitely needs insertion. So I think basically the min() needs to be replaced with an if that checks whether insertions or deletions will be required to get from string1 to string2.

I'm running python 3.7.3 and fastDamerauLevenshtein v1.0.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant