# Talk:Toy compression implementations

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Much kudos for fixing the underflow error. The new LZW implementation is much smaller, but... how in the name of God does it actually work? o_O MathematicalOrchid 11:36, 9 March 2007 (UTC)

To understand it I rewrote it a bit:

```encode_LZW :: (Eq t) => [t] -> [t] -> [Int]
encode_LZW alphabet = work (map (:[]) alphabet) where
work table []  = []
work table lst = index : work table' rest
where (tok, rest) = last . takeWhile ((`elem` table) . fst) . tail \$ zip (inits lst) (tails lst)
index = fromJust (elemIndex tok table)
table' = table ++ [tok']
tok' = tok ++ [head rest]
```

The idea of the the table, which is the 1st argument to 'work', is that some prefix of the input is already in the table.

(encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings.

The `where (tok,rst)` definition can be read right to left:

• The `zip (inits lst) (tails lst)` computes every possible way to split `lst` input into a prefix and suffix, in increasing length of prefix.
• The `tail` function just drops the head because it doesn't want to consider the length 0 prefix
• `takeWhile` applies the predicate `(`elem` table)` to the prefix. This will always succeed on the length 1 prefix, and may find longer prefixes in the table.
• The `last` function take the last prefix in the table, which will always be the longest such prefix
• `tok` is this prefix, and `rest` is the remaining suffix to process.

Wow... a most ingenious (and inefficient) approach! Well, now it makes sense anyway. MathematicalOrchid 13:34, 9 March 2007 (UTC)