You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.
Hi, I have a test case that I would like to ask about. It appears that two versions of what I believe to be logically equivalent code are returning different results.
First, this is the test data that I am using for both cases. I have a unicode string string and a list of unicode strings collection. Note that string is equal to collection[1].
For my first test, I will use the extractOne method to try to find the closest match in the list.
# -*- coding: utf-8 -*-
from fuzzywuzzy import process
closest_match, ratio = process.extractOne(
string,
collection
)
print closest_match, ratio
This returns the string from collection[0] with a 0 as the matching ratio. This does not appear to be correct because collection[1] is an exact match to the input.
So, am I misunderstanding the purpose of the process.extract methods, or is there an intentional design difference between how these two methods select a result?
Notes:
This behavior is happening consistently in both Python 2.7, 3.4 and 3.5
The issue is process.extract runs using fuzz.WRatio. In turn WRatio runs utils.full_process on both query and on the items in collection which turns your unicode string into an empty string. By design empty string comparisons return 0.
Interestingly the first item in your list is the only item that full_process does not turn into the empty string. This leads to an interesting issue if you use a different scorer.
This occurs because process.extract is still running this full_process on the 'choices' but also it's not running it on the 'query' (I think this is a bug and will submit it shortly).
If you did want to use process.extract the method would be to bypass this full_process. Luckily you can do this.
Hi, I have a test case that I would like to ask about. It appears that two versions of what I believe to be logically equivalent code are returning different results.
First, this is the test data that I am using for both cases. I have a unicode string
string
and a list of unicode stringscollection
. Note thatstring
is equal tocollection[1]
.For my first test, I will use the
extractOne
method to try to find the closest match in the list.This returns the string from
collection[0]
with a 0 as the matching ratio. This does not appear to be correct becausecollection[1]
is an exact match to the input.For my second test, I use
fuzz
and a for loop to find the closest match.This returns the correct matching statement and a match ratio of 100. This seems like the correct behavior.
So, am I misunderstanding the purpose of the
process.extract
methods, or is there an intentional design difference between how these two methods select a result?Notes:
The text was updated successfully, but these errors were encountered: