You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great package but I just noticed a bug with the the score in certain situations. If I run damerauLevenshtein('some string', 'another one but longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 0.03636... but if I run damerauLevenshtein('some string', 'another one but longer and longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 1.0 implying the two strings are identical.
From what I could see, it looks like the issue stems from the line of code maxDist = min(len1, len2) * min(replaceWeight, deleteWeight + insertWeight) + (max(len1, len2) - min(len1, len2)) * min(deleteWeight, insertWeight)
which is (assuming I've understood your code) supposed to calculate the maximum distance as the cost of swapping out letters in the shorter word + the cost of adding/removing any excess letters
But for my example strings, I believe it should use the insertWeight at the end rather than min(deleteWeight, insertWeight) - there's no way to get from string1 to string2 by deletion, it definitely needs insertion. So I think basically the min() needs to be replaced with an if that checks whether insertions or deletions will be required to get from string1 to string2.
I'm running python 3.7.3 and fastDamerauLevenshtein v1.0.7
The text was updated successfully, but these errors were encountered:
Great package but I just noticed a bug with the the score in certain situations. If I run
damerauLevenshtein('some string', 'another one but longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 0.03636... but if I run
damerauLevenshtein('some string', 'another one but longer and longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 1.0 implying the two strings are identical.
From what I could see, it looks like the issue stems from the line of code
maxDist = min(len1, len2) * min(replaceWeight, deleteWeight + insertWeight) + (max(len1, len2) - min(len1, len2)) * min(deleteWeight, insertWeight)
which is (assuming I've understood your code) supposed to calculate the maximum distance as the cost of swapping out letters in the shorter word + the cost of adding/removing any excess letters
But for my example strings, I believe it should use the insertWeight at the end rather than min(deleteWeight, insertWeight) - there's no way to get from string1 to string2 by deletion, it definitely needs insertion. So I think basically the min() needs to be replaced with an if that checks whether insertions or deletions will be required to get from string1 to string2.
I'm running python 3.7.3 and fastDamerauLevenshtein v1.0.7
The text was updated successfully, but these errors were encountered: