You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.
I am trying to use string fuzzy-matching with both R and Python. I am actually using two packages:
stringdist from R
fuzzywuzzy from Python
When I try amatch("PARI", c("HELLO", "WORLD"), maxDist = 2) on R, I get NA as a result, which is intuitive. But when I try the same thing with Python : process.extract("PARI", ["HELLO", "WORLD"], limit = 2), I get [('world', 22), ('HELLO', 0)]
How could I get the same result as in R ?
Thanks in advance
The text was updated successfully, but these errors were encountered:
There are a couple of important differences between the two packages:
In FuzzyWuzzy limit specifies how many elements you want extract to return. extract does not provide an argument to specify a maxDist. For this purpose you would have to use the extractBests with the score_cutoff argument.
Stringdist appears to use an edit distance, while FuzzyWuzzy only provides normalized string metrics (0-100). So you would have to use e.g. score_cutoff=90. You can specify the string metric using the scorer argument.
FuzzWuzzy preprocesses strings by default in the extract function (lowercase + replaces non alphanumeric characters). You can disable this using processor=None
As an alternative you could use RapidFuzz which allows the usage of edit distances and a score_cutoff parameter in the extract function:
I am trying to use string fuzzy-matching with both R and Python. I am actually using two packages:
When I try amatch("PARI", c("HELLO", "WORLD"), maxDist = 2) on R, I get NA as a result, which is intuitive. But when I try the same thing with Python : process.extract("PARI", ["HELLO", "WORLD"], limit = 2), I get [('world', 22), ('HELLO', 0)]
How could I get the same result as in R ?
Thanks in advance
The text was updated successfully, but these errors were encountered: