# Edit distance

### From HaskellWiki

The edit distance is the minimum number of (delete/insert/modify) operations required to edit one given string into another given string. The functions given on this page calculate the Levenshtein distance of two strings.

The edit distance of two strings can be found using a dynamic programming algorithm. The time complexity of this algorithm is O(nm), where n and m are the lengths of the input strings.

The function which follows uses an array to store the table used by the algorithm:

editDistance :: Eq a => [a] -> [a] -> Int editDistance xs ys = table ! (m,n) where (m,n) = (length xs, length ys) x = array (1,m) (zip [1..] xs) y = array (1,n) (zip [1..] ys) table :: Array (Int,Int) Int table = array bnds [(ij, dist ij) | ij <- range bnds] bnds = ((0,0),(m,n)) dist (0,j) = j dist (i,0) = i dist (i,j) = minimum [table ! (i-1,j) + 1, table ! (i,j-1) + 1, if x ! i == y ! j then table ! (i-1,j-1) else 1 + table ! (i-1,j-1)]

Lloyd Allison's paper, Lazy Dynamic-Programming can be Eager, describes a more efficient method for computing the edit distance.

The following Haskell function computes the edit distance in O(length a * (1 + dist a b)) time complexity. It is a translation of the function presented in Allison's paper, which is written in lazy ML.

dist :: Eq a => [a] -> [a] -> Int dist a b = last (if lab == 0 then mainDiag else if lab > 0 then lowers !! (lab - 1) else{- < 0 -} uppers !! (-1 - lab)) where mainDiag = oneDiag a b (head uppers) (-1 : head lowers) uppers = eachDiag a b (mainDiag : uppers) -- upper diagonals lowers = eachDiag b a (mainDiag : lowers) -- lower diagonals eachDiag a [] diags = [] eachDiag a (bch:bs) (lastDiag:diags) = oneDiag a bs nextDiag lastDiag : eachDiag a bs diags where nextDiag = head (tail diags) oneDiag a b diagAbove diagBelow = thisdiag where doDiag [] b nw n w = [] doDiag a [] nw n w = [] doDiag (ach:as) (bch:bs) nw n w = me : (doDiag as bs me (tail n) (tail w)) where me = if ach == bch then nw else 1 + min3 (head w) nw (head n) firstelt = 1 + head diagBelow thisdiag = firstelt : doDiag a b firstelt diagAbove (tail diagBelow) lab = length a - length b min3 x y z = if x < y then x else min y z