Our Daily Method #15: Kernel#levenshtein
Geplaatst door Michiel de Mare ma, 25 feb 2008 08:00:00 GMT
Not in any way invented by us, but still extremely useful and not a part of the standard library: the Levenshtein Distance
What does it calculate? The number of characters to add, delete or switch to turn the first string into the second. Ideal for detecting typos.
# example:
levenshtein('levenstine', 'levenshtein') # => 3
Or how about instead of a LIKE-clause? Not quite as fast (cough) but much more accurate.
# In the old days:
User.find(:all,
:conditions => ['name like ?', params[:name] + '%']])
# Now:
User.find(:all).
sort_by {|u| levenshtein(params[:name], u.name)}.
first
def levenshtein(s1, s2)
d = {}
(0..s1.size).each { |row| d[[row, 0]] = row }
(0..s2.size).each { |col| d[[0, col]] = col }
(1..s1.size).each do |i|
(1..s2.size).each do |j|
i1,j1 = i-1,j-1
cost = s1[i1] != s2[j1] ? 1 : 0
d[[i, j]] = [d[[i1, j1]] + cost,
d[[i1, j]] + 1,
d[[i, j1]] + 1].min
if i1*j1 > 0 and s1[i1] == s2[j-2] and s1[i-2] == s2[j1]
d[[i, j]] = [d[[i,j]] , d[[i-2, j-2]] + cost].min
end
end
end
d[[s1.size, s2.size]]
end