-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string:jaro_distance/2
#7863
Add string:jaro_distance/2
#7863
Conversation
Previously, if a case failed and expected a non-string, non-tuple value, the 'io:format/2' call in the 'test_1/5' helper would error rather than printing because of the '~ts' control sequence. For example, writing an incorrect case in the `length` case like so will fail: ?TEST("abc", [], 4) To fix this we swap the clauses so that we use '~ts' for binaries and lists and '~w' for everything else.
@the-mikedavis Please consider having this functionality as |
@okeuday I don't like that, they have very different properties and return values, (I have implemented most of them). For what would you use the Levenstein algorithm when jaro is 10 times faster? |
@dgud Applications of the Levenshtein algorithm are described here and most usages are likely uncommon in Erlang, but a shell feature that could provide help based on module or function spelling mistakes could be an Erlang use-case. If the strings are relatively short the latency may be justifiable with getting a better result. Not attempting to advocate for a slower shell. It just seems best to care about more than 1 string distance algorithm. |
> a shell feature that could provide help based on module or function spelling mistakes could be an Erlang use-case. Which is one reason why we want to add jaro, and why we considered the other ones. |
@the-mikedavis We have another variant of the code coming, we have concluded that the elixir variant calculates the transpositions "wrongly", at least it's different than the defacto standard, for example: The original paper also seems to agree more with rosetta algorithms, then elixirs, though the original code But I'll keep this PR open for inspiration, i.e. test changes and docs. |
Thanks, for the work. |
@garazdawi mentioned this might be a nice addition to
string
(erlang/rebar3#2844 (comment)). It's a translation from Elixir'sString.jaro_distance/2
adapted to allowunicode:chardata()
rather than only binaries.@garazdawi also mentioned someone from the Erlang/OTP team was interested in working on this so I am submitting what I have now - please feel free to take it over or supersede this PR if you'd like!