A trigrams and similarity refinement for Ruby string
The #trigrams
method returns an array of trigrams for the string.
The #similarity
method uses the trigrams for the current string and that passed in as an argument to return a Float
value between 0.0 and 1.0 that quantifies the similarity between strings. A case insensitive comparison can be invoked with the case_insensitive
parameter. The default is that the comparison be case sensitive.
A Ruby refinement is a safe option to monkey-patching, particularly where you are modifying the behaviour of “someone else's” class – a core Ruby class, a Rails class, or a gem class.
Ruby 2.4 refinements documentation
By using the extension within a class, you can send the methods directly to any string.
With:
module Test
using StringSimilarityExtensions
def self.trigrams(string)
string.trigrams
end
def self.similarity(string1, string2, case_insensitive = false)
string1.similarity(string2, case_insensitive)
end
end
Test.trigrams("a")
Test.similarity("celebrities", "Celebrity")
Test.similarity("celebrities", "celebrity")
Test.similarity("celebrities", "Celebrity", true)
Test.similarity("celebrities", "celebrity", true)
Then:
2.4.4 :001 > module Test
2.4.4 :002?> using StringSimilarityExtensions
2.4.4 :003?> def self.trigrams(string)
2.4.4 :004?> string.trigrams
2.4.4 :005?> end
2.4.4 :006?> def self.similarity(string1, string2, case_insensitive
2.4.4 :007?> string1.similarity(string2, case_insensitive)
2.4.4 :008?> end
2.4.4 :009?> end
=> :similarity
2.4.4 :010 >
2.4.4 :011 > Test.trigrams("a")
=> [" a", " a "]
2.4.4 :012 >
2.4.4 :013 > Test.similarity("celebrities", "Celebrity")
=> 0.29411764705882354
2.4.4 :014 > Test.similarity("celebrities", "celebrity")
=> 0.5714285714285714
2.4.4 :015 >
2.4.4 :016 > Test.similarity("celebrities", "Celebrity", true)
=> 0.5714285714285714
2.4.4 :017 > Test.similarity("celebrities", "celebrity", true)
=> 0.5714285714285714
2.4.4 :018 >
The trigram implementation is intended to reproduce the trigrams generated by the PostgreSQL pg_trgm extension.