# Difference between revisions of "Talk:Toy compression implementations"

From HaskellWiki

(My mind is blown...) |
(I'm impressed...) |
||

(One intermediate revision by one other user not shown) | |||

Line 1: | Line 1: | ||

Much kudos for fixing the underflow error. The new LZW implementation is much smaller, but... how in the name of God does it actually work? o_O [[User:MathematicalOrchid|MathematicalOrchid]] 11:36, 9 March 2007 (UTC) |
Much kudos for fixing the underflow error. The new LZW implementation is much smaller, but... how in the name of God does it actually work? o_O [[User:MathematicalOrchid|MathematicalOrchid]] 11:36, 9 March 2007 (UTC) |
||

+ | |||

+ | To understand it I rewrote it a bit: |
||

+ | |||

+ | <haskell> |
||

+ | encode_LZW :: (Eq t) => [t] -> [t] -> [Int] |
||

+ | encode_LZW alphabet = work (map (:[]) alphabet) where |
||

+ | work table [] = [] |
||

+ | work table lst = index : work table' rest |
||

+ | where (tok, rest) = last . takeWhile ((`elem` table) . fst) . tail $ zip (inits lst) (tails lst) |
||

+ | index = fromJust (elemIndex tok table) |
||

+ | table' = table ++ [tok'] |
||

+ | tok' = tok ++ [head rest] |
||

+ | </haskell> |
||

+ | |||

+ | The idea of the the table, which is the 1st argument to 'work', is that some prefix of the input is already in the table. |
||

+ | |||

+ | (encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings. |
||

+ | |||

+ | The <hask>where (tok,rst)</hask> definition can be read right to left: |
||

+ | * The <hask>zip (inits lst) (tails lst)</hask> computes every possible way to split <hask>lst</hask> input into a prefix and suffix, in increasing length of prefix. |
||

+ | * The <hask>tail</hask> function just drops the head because it doesn't want to consider the length 0 prefix |
||

+ | * <hask>takeWhile</hask> applies the predicate <hask>(`elem` table)</hask> to the prefix. This will always succeed on the length 1 prefix, and may find longer prefixes in the table. |
||

+ | * The <hask>last</hask> function take the last prefix in the table, which will always be the longest such prefix |
||

+ | * <hask>tok</hask> is this prefix, and <hask>rest</hask> is the remaining suffix to process. |
||

+ | |||

+ | |||

+ | Wow... a most ingenious (and inefficient) approach! Well, now it makes sense anyway. [[User:MathematicalOrchid|MathematicalOrchid]] 13:34, 9 March 2007 (UTC) |

## Latest revision as of 13:34, 9 March 2007

Much kudos for fixing the underflow error. The new LZW implementation is much smaller, but... how in the name of God does it actually work? o_O MathematicalOrchid 11:36, 9 March 2007 (UTC)

To understand it I rewrote it a bit:

```
encode_LZW :: (Eq t) => [t] -> [t] -> [Int]
encode_LZW alphabet = work (map (:[]) alphabet) where
work table [] = []
work table lst = index : work table' rest
where (tok, rest) = last . takeWhile ((`elem` table) . fst) . tail $ zip (inits lst) (tails lst)
index = fromJust (elemIndex tok table)
table' = table ++ [tok']
tok' = tok ++ [head rest]
```

The idea of the the table, which is the 1st argument to 'work', is that some prefix of the input is already in the table.

(encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings.

The `where (tok,rst)`

definition can be read right to left:

- The
`zip (inits lst) (tails lst)`

computes every possible way to split`lst`

input into a prefix and suffix, in increasing length of prefix. - The
`tail`

function just drops the head because it doesn't want to consider the length 0 prefix -
`takeWhile`

applies the predicate`(`elem` table)`

to the prefix. This will always succeed on the length 1 prefix, and may find longer prefixes in the table. - The
`last`

function take the last prefix in the table, which will always be the longest such prefix -
`tok`

is this prefix, and`rest`

is the remaining suffix to process.

Wow... a most ingenious (and inefficient) approach! Well, now it makes sense anyway. MathematicalOrchid 13:34, 9 March 2007 (UTC)