Shootout/Fannkuch

From HaskellWiki

This is an entry for the Computer Language Benchmarks Game benchmark fannkuch-redux (the benchmark might have changed since the text and code below were written)

WARNING: The permutations (at least the printed ones) must be in the order shown below.

Each program should

  • "Take a permutation of {1,...,n}, for example: {4,2,1,5,3}.
  • Take the first element, here 4, and reverse the order of the first 4 elements: {5,1,2,4,3}.
  • Repeat this until the first element is a 1, so flipping won't change anything more: {3,4,2,1,5}, {2,4,3,1,5}, {4,2,3,1,5}, {1,3,2,4,5}.
  • Count the number of flips, here 5.
  • Do this for all n! permutations, and record the maximum number of flips needed for any permutation.
  • Write the first 30 permutations and the number of flips.

The conjecture is that this maximum count is approximated by n*log(n) when n goes to infinity.

FANNKUCH is an abbreviation for the German word Pfannkuchen, or pancakes, in analogy to flipping pancakes."

Correct output N = 7 is:

1234567
2134567
2314567
3214567
3124567
1324567
2341567
3241567
3421567
4321567
4231567
2431567
3412567
4312567
4132567
1432567
1342567
3142567
4123567
1423567
1243567
2143567
2413567
4213567
2345167
3245167
3425167
4325167
4235167
2435167
Pfannkuchen(7) = 16

The fannkuch benchmark is defined in Performing Lisp Analysis of the FANNKUCH Benchmark, Kenneth R. Anderson and Duane Rettig (27KB PDF)

New Code[edit]

These are being scavenged from the discussion on the Haskell-café mailing list.

Performance is summarised (on OpenBSD/x86 1.6Ghz Pentium M):

Author Time in seconds (N=10)
Original entry: > 2 minutes
Sebastian Sylvan: 15.4
Kimberley Burchett: 7.2
Cale Gibbard: 5.8
Bertram Felgenhauer: 4.7
Clean imperative 2.1
Fastest pure 1.9
Fastest impure 1.4
C gcc 1.35
C gcc -O2 0.5

Proposals[edit]

Port to bytestrings

Fastest pure version[edit]

Bertram's with a couple of refactors, a small let binding by Don, a new permutation by Matthias Neubauer, and an ugly flop by Josh and another by David Place. Compile with -O2 -optc-O3.

{-# OPTIONS -O2 -optc-O3 #-}
import System
import Data.List

rotate 2 (x1:x2:xs) = x2:x1:xs
rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs
rotate 4 (x1:x2:x3:x4:xs) = x2:x3:x4:x1:xs
rotate 5 (x1:x2:x3:x4:x5:xs) = x2:x3:x4:x5:x1:xs
rotate 6 (x1:x2:x3:x4:x5:x6:xs) = x2:x3:x4:x5:x6:x1:xs
rotate 7 (x1:x2:x3:x4:x5:x6:x7:xs) = x2:x3:x4:x5:x6:x7:x1:xs
rotate 8 (x1:x2:x3:x4:x5:x6:x7:x8:xs) = x2:x3:x4:x5:x6:x7:x8:x1:xs
rotate 9 (x1:x2:x3:x4:x5:x6:x7:x8:x9:xs) = x2:x3:x4:x5:x6:x7:x8:x9:x1:xs
rotate 10 (x1:x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = x2:x3:x4:x5:x6:x7:x8:x9:x10:x1:xs
rotate n (x:xs) = rotate' n xs
    where rotate' 1 xs     = x:xs
          rotate' n (x:xs) = x:rotate' (n-1) xs

permutations l = foldr permutations' [l] [2..length l]
    where permutations' n = foldr (takeIter n (rotate n)) []

takeIter 0 f x rest = rest
takeIter n f x rest = x : takeIter (n-1::Int) f (f x) rest

flop :: Int -> [Int] -> (Int, [Int])
flop 2  (x2:xs) = (x2, 2:xs)
flop 3  (x2:x3:xs) = (x3, x2:3:xs)
flop 4  (x2:x3:x4:xs) = (x4, x3:x2:4:xs)
flop 5  (x2:x3:x4:x5:xs) = (x5, x4:x3:x2:5:xs)
flop 6  (x2:x3:x4:x5:x6:xs) = (x6, x5:x4:x3:x2:6:xs)
flop 7  (x2:x3:x4:x5:x6:x7:xs) = (x7, x6:x5:x4:x3:x2:7:xs)
flop 8  (x2:x3:x4:x5:x6:x7:x8:xs) = (x8, x7:x6:x5:x4:x3:x2:8:xs)
flop 9  (x2:x3:x4:x5:x6:x7:x8:x9:xs) = (x9, x8:x7:x6:x5:x4:x3:x2:9:xs)
flop 10 (x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = (x10,x9:x8:x7:x6:x5:x4:x3:x2:10:xs)
flop n xs = rs
  where (rs, ys)          = flop' n xs ys
        flop' 2 (x:xs) ys = ((x, ys), n:xs)
        flop' n (x:xs) ys = flop' (n-1) xs (x:ys)

steps :: Int -> [Int] -> Int
steps n    (a:as) = steps' n (a,as)

steps' n (1,_)  = n
steps' n (t,ts) = steps' (n+1) (flop t ts)

main = do n <- getArgs >>= return . read . head
          let p = permutations [1..n]
          mapM_ (putStrLn . concatMap show) $ take 30 p
          putStr $ "Pfannkuchen(" ++ show n ++ ") = "
          print $ foldl' (flip (max . steps 0)) 0 p

Fastest impure[edit]

Another translation of the C version, using unboxed math, and no IORefs. Much faster again. Compile with -O2 -optc-O3 -fglasgow-exts. Probably a little more could be squeezed.

--
-- translation of the C version to Haskell by Don Stewart
--

import Control.Monad
import Foreign
import System
import GHC.Base
import GHC.Ptr
import GHC.IOBase

main = do 
    n <- getArgs >>= return . read . head
    k <- if n < 1 then return (0::Int) else fannkuch n
    putStrLn $ "Pfannkuchen(" ++ show n ++ ") = " ++ show (k - 1)

fannkuch n@(I# n#) = do
    perm            <- mallocArray n       :: IO (Ptr Int)
    (Ptr c#)        <- mallocArray n       :: IO (Ptr Int)
    perm1@(Ptr p1#) <- newArray [0 .. n-1] :: IO (Ptr Int)
    (Ptr rP)        <- newArray [n]        :: IO (Ptr Int)
    (Ptr flipsMaxP) <- newArray [0]        :: IO (Ptr Int)

    let go didpr = do
            didpr' <- if didpr < (30 :: Int)
                      then ppr 0 n perm1 >> putStr "\n" >> return (didpr + 1)
                      else return didpr

            IO $ \s -> 
                case readIntOffAddr# rP 0# s of
                    (# s, r# #) -> case setcount c# r# s of
                        (# s, _ #) -> case writeIntOffAddr# rP 0# 1# s of
                            s -> (# s, () #)

            t <- IO $ \s -> 
                case readIntOffAddr# p1# 0# s of
                    (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of
                        (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #)
            
            when t $ exchange n perm perm1 flipsMaxP

            fm   <- IO $ \s -> case readIntOffAddr# flipsMaxP 0# s of 
                                (# s, x #) -> (# s, I# x #)
            done <- IO $ \s -> rot rP n# p1# c# s
            if done then return fm else go didpr'
    go 0

------------------------------------------------------------------------

exchange n p@(Ptr a) p1@(Ptr b) fm = do
    copyArray (p `advancePtr` 1) (p1 `advancePtr` 1) (n-1) 
    IO $ \s -> 
        case readIntOffAddr# b 0# s  of { (# s, k #) -> 
        case doswap k a 0# s         of { (# s, f #) -> 
        case readIntOffAddr# fm 0# s of { (# s, m #) -> 
        if m <# f then case writeIntOffAddr# fm 0# f s of s -> (# s, () #)
                  else (# s, () #)
        } } }
{-# INLINE exchange #-}

doswap k a f s = 
    case swap 1# (k -# 1#) a s    of { (# s, _ #) -> 
    case readIntOffAddr# a k s    of { (# s, j #) -> 
    case writeIntOffAddr# a k k s of { s          -> 
    if k /=# 0# then doswap j a (f +# 1#) s else (# s, (f +# 1#) #)
    } } }
{-# INLINE doswap #-}

swap i j a s =
    if i <# j then case readIntOffAddr#  a i s   of { (# s, x #) -> 
                   case readIntOffAddr#  a j s   of { (# s, y #) -> 
                   case writeIntOffAddr# a j x s of { s          -> 
                   case writeIntOffAddr# a i y s of { s          -> 
                   swap (i +# 1#) (j -# 1#) a s 
                   } } } }
              else (# s, () #)
{-# INLINE swap #-}

loop r i a s = 
    if i <# r then case readIntOffAddr# a (i +# 1#) s of
                    (# s, x #) -> case writeIntOffAddr# a i x s of
                        s -> loop r (i +# 1#) a s
              else (# s, () #)
{-# INLINE loop #-}

setcount p r s =
    if r ==# 1# then (# s, () #)
                else case writeIntOffAddr# p (r -# 1#) r s of
                        s -> setcount p (r -# 1#) s
{-# INLINE setcount #-}

rot rP n a cp s =
  case readIntOffAddr# rP 0# s of { (# s, r #) -> 
  if r ==# n then (# s, True #) 
             else case readIntOffAddr# a 0# s             of { (# s, p0 #) -> 
                  case loop r 0# a s                      of { (# s, _ #)  -> 
                  case writeIntOffAddr# a r p0 s          of { s           -> 
                  case readIntOffAddr# cp r s             of { (# s, cr #) -> 
                  case writeIntOffAddr# cp r (cr -# 1#) s of { s           -> 
                  if cr -# 1# ># 0# then (# s, False #) 
                                    else case inc s of s -> rot rP n a cp s
  } } } } } }
  where inc s = case readIntOffAddr# rP 0# s of 
                  (# s, x #) -> writeIntOffAddr# rP 0# (x +# 1#) s
{-# INLINE rot #-}

ppr i n p = when (i < n) $ do
    putStr . show . (+1) =<< peek (p `advancePtr` i)
    ppr (i+1) n p

Cleaner impure version[edit]

Here's a translation of the fast C version. It's unoptimised so far, but already runs much faster than our best `pure' version. Use -O2 -optc-O3. It's not pretty (or easy to reason about -- how do the C programmers do it?), but it works :)

--
-- translation of the C version to Haskell by Don Stewart
--

import Control.Monad
import Data.Array.Base
import Data.Array.IO
import Data.IORef
import System

main = do 
    n <- getArgs >>= return . read . head
    k <- if n < 1 then return (0::Int) else fannkuch n
    putStrLn $ "Pfannkuchen(" ++ show n ++ ") = " ++ show (k - 1)

fannkuch n = do
    perm   <- newArray_ (0,n-1) :: IO (IOUArray Int Int)
    perm1  <- newArray_ (0,n-1) :: IO (IOUArray Int Int)
    count  <- newArray_ (0,n-1) :: IO (IOUArray Int Int)
    sequence_ [ set perm1 n n | n <- [0 .. n-1] ]

    rP        <- newIORef n
    flipsMaxP <- newIORef 0

    let go didpr = do
            didpr' <- if didpr < (30 :: Int)
                      then ppr 0 n perm1 >> putStr "\n" >> return (didpr + 1)
                      else return didpr

            readIORef rP >>= setcount count >>= writeIORef rP

            p1 <- perm1 !. 0
            pn <- perm1 !. (n-1)
            when (not $ p1 == 0 || pn == n-1) $ exchange n perm perm1 flipsMaxP

            fm   <- readIORef flipsMaxP
            done <- rotate rP n perm1 count
            if done then return fm else go didpr'
    go 0

rotate rP n p1 c = do
    r <- readIORef rP
    if r == n then return True else do -- rotate down perm[0..r] by one
        p0 <- p1 !. 0
        loop r 0
        set p1 r p0
        cr <- c !. r
        set c r (cr - 1)
        if cr - 1 > 0 then return False else inc rP >> rotate rP n p1 c
    where
        loop r i = when (i < r) $ p1 !. (i+1) >>= set p1 i >> loop r (i+1)

exchange n p p1 fm = do
    setperm 1 n p p1
    k <- p1 !. 0
    f <- doswap k p 0
    m <- readIORef fm
    when (m < f) $ writeIORef fm f

doswap k p f = do
    swap 1 (k-1) p
    j <- p !. k
    set p k k
    if k /= 0 then doswap j p (f+1) else return (f+1)

swap i j p = when (i < j) $ do
    xch p i j
    swap (i+1) (j-1) p

xch p i j = do
    x <- p !. i
    y <- p !. j
    set p i y
    set p j x

setperm i n p p1 = when (i < n) $ do
    p1 !. i >>= set p i
    setperm (i+1) n p p1

setcount _ 1 = return 1
setcount p r = set p (r-1) r >> setcount p (r-1)

ppr i n p = when (i < n) $ do
    putStr . show . (+1) =<< p !. i
    ppr (i+1) n p

p !. i    = unsafeRead p i
set p i j = unsafeWrite p i j

inc p = readIORef p >>= writeIORef p . (+1)

Josh Goldfoot[edit]

My ugly "flops" function cut the time for the proposed entry from 9.246 to 6.401 seconds on my machine (N=10). It's as above, but:

flop :: Int -> [Int] -> [Int]
flop 2 (x1:x2:xs) = x2:x1:xs
flop 3 (x1:x2:x3:xs) = x3:x2:x1:xs
flop 4 (x1:x2:x3:x4:xs) = x4:x3:x2:x1:xs
flop 5 (x1:x2:x3:x4:x5:xs) = x5:x4:x3:x2:x1:xs
flop 6 (x1:x2:x3:x4:x5:x6:xs) = x6:x5:x4:x3:x2:x1:xs
flop 7 (x1:x2:x3:x4:x5:x6:x7:xs) = x7:x6:x5:x4:x3:x2:x1:xs
flop 8 (x1:x2:x3:x4:x5:x6:x7:x8:xs) = x8:x7:x6:x5:x4:x3:x2:x1:xs
flop 9 (x1:x2:x3:x4:x5:x6:x7:x8:x9:xs) = x9:x8:x7:x6:x5:x4:x3:x2:x1:xs
flop 10 (x1:x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = x10:x9:x8:x7:x6:x5:x4:x3:x2:x1:xs
flop n xs = rs
  where (rs, ys)       = fl n xs ys
        fl 0 xs     ys = (ys, xs)
        fl n (x:xs) ys = fl (n-1) xs (x:ys)

Perhaps using Template Haskell could make this code look less ugly. But hard-coding in this way significantly speeds up a frequently used function.

This really does speed things up nicely. -- Don

Ian Lynagh provides this TH version. Note that this is quite probably not as clear as the explicit patterns.

import Language.Haskell.TH

$(do let xName n = mkName ('x':show n)
         consPat p ps = infixP (varP p) (mkName ":") ps
         xPats n = foldr consPat (varP $ mkName "xs") (map xName [1..n])
         consExp e es = infixE (Just $ varE e) (conE $ mkName ":") (Just es)
         xExps n = foldr consExp (varE $ mkName "xs") (map xName [n, n-1..1])
         mkClause n = clause [litP (integerL n), xPats n] (normalB $ xExps n) []
     d <- funD (mkName "flop") (map mkClause [2..10])
     runIO $ putStrLn $ pprint d
     return [d]
 )

Sebastian Sylvan[edit]

I contributed what I think is a "neat and elegant" solution which emphasizes clarity over speed (but is still pretty fast). The inlinings here really helped a lot (something like 2x improvement). It's been submitted (and accepted) in the shootout already as an example of an idiomatic "elegant" approach and is currently the fastest Haskell entry (note that they have changed the benchmark to use N=10). I think that if we want anything which is to compete with the imperative languages we need to use imperative style code (in-place reversions etc.). It's probably a good idea to have an "idiomatic" version and a "fast" version.

Note the permutations generator (a rewritten version of Bertram's) which on my system performed slightly better than Bertrams and is also a lot clearer (IMHO). It basically does the same thing but with less "magic" syntax :-) I should clarify that Bertram's version is certainly faster altogether (my version is all about elegance and clarity), but I didn't experience any downside to rewriting the permutation generator in a clearer way (in fact, i got a slight speedup).

import System
import Data.List(foldl')
import GHC.Base

{-# INLINE rotate #-}
rotate n (x:xs) = let (a,b) = splitAt (n-1) xs in a ++ x : b

{-# INLINE perms #-}
perms l = foldr perm' [l] [2..length l]
    where perm' n ls = concat [take n (iterate (rotate n) l) | l <- ls]

{-# INLINE flop #-}
flop (1:_) = 0
flop xs = 1 + flop (rev xs)

{-# INLINE rev #-}
rev (x:xs) = reverse a ++ x : b
      where (a,b) = splitAt (x-1) xs

fannuch xs = foldl' max 0 $ map flop xs

main = do [n] <- getArgs
          let xs = perms [1..read n]
          putStr $ unlines $ map (concatMap show) $ take 30 xs
          putStrLn $ "Pfannkuchen(" ++ n ++ ") = " ++ show (fannuch xs)

In retrospect I should probably submit a new version which doesn't import GHC.Base (why did I import that?) and uses "where" in rotate as well to make it look the same as e.g. rev.

Bertram Felgenhauer[edit]

combining a few ideas from the list, and with 'correct' permutation order:

import System (getArgs)
import Data.List (foldl', tails)

rotate n (x:xs) = rot' n xs where
    rot' 1 xs     = x:xs
    rot' n (x:xs) = x:rot' (n-1) xs

permutations l = foldr perm' [l] [2..length l] where
    perm' n l = l >>= take n . iterate (rotate n)

flop :: Int -> [Int] -> [Int]
flop n xs = rs
  where (rs, ys) = fl n xs ys
        fl 0 xs     ys = (ys, xs)
        fl n (x:xs) ys = fl (n-1) xs (x:ys)

steps :: Int -> [Int] -> Int
steps n (1:_)    = n
steps n ts@(t:_) = (steps $! (n+1)) (flop t ts)

main = do
    args <- getArgs
    let arg = if null args then 7 else read $ head args
    mapM_ (putStrLn . concatMap show) $ take 30 $ permutations [1..arg]
    putStr $ "Pfannkuchen(" ++ show arg ++ ") = "
    print $ foldl' (flip (max . steps 0)) 0 $ permutations [1..arg]

Kimberley Burchett[edit]

Updated with Bertram's permutations function to get the correct order. Seems about 1.5x faster than Bertram's.

import System(getArgs)
import Data.Int
import Data.List(foldl')

main = do   [n] <- getArgs
            let p = permutations [1..(read n)]
            mapM_ (putStrLn . concatMap show) $ take 30 p
            putStr $ "Pfannkuchen(" ++ n ++ ") = "
            print $ findmax p

findmax :: [[Int]] -> Int
findmax xss = foldl' max 0 maxes
    where maxes = map countFlops xss

countFlops :: [Int] -> Int
countFlops (1:xs) = 0
countFlops list@(x:xs) = 1 + (countFlops (flop x list []))

flop :: Int -> [Int] -> [Int] -> [Int]
flop 0 xs ys = ys ++ xs
flop n (x:xs) ys = flop (n-1) xs (x:ys)

-- rotate initial n elements of the list left by one place
rotate n (x:xs) = rot' n xs where
    rot' 1 xs     = x:xs
    rot' n (x:xs) = x:rot' (n-1) xs

permutations l = foldr perm' [l] [2..length l] where
    perm' n l = l >>= take n . iterate (rotate n)


Cale Gibbard[edit]

This does not use the right permutation order

import Data.Word
import Data.Array.Unboxed
import System.Environment
import Data.Ord (comparing)

type Perm = Word8 -> Word8

main = do let ns = "9" -- [ns] <- getArgs
          let n = read ns
              ps = perms n
              p = maximum $ map (flops n . perm) ps
          mapM_ (putStrLn . (>>= show)) (take 30 ps)
          putStrLn ("Pfannkuchen(" ++ ns ++ ") = " ++ (show p))


       -- NB. element subtree siblings! This is an n-ary tree
data Tree a = Node a (Tree a) (Tree a) | Empty

flop n f = fs `seq` \x -> fs ! x
    where fs :: UArray Word8 Word8
          fs = array (1,n) [(k, f' k) | k <- [1..n]] 
          f' x | x <= n    = f (n-x+1)
               | otherwise = f x
            where n = f 1
            
flops n = length . (takeWhile ((/= 1) . ($ 1))) . (iterate (flop n))

showPerm n f = [1..n] >>= show . f

perm :: [Word8] -> (Word8 -> Word8)
perm [] n = n
perm (x:xs) 1 = x
perm (x:xs) n = perm xs (n-1)

paths depth t =  -- paths from the root of t to given depth
 let across d ancestors  Empty       rest = rest
     across d ancestors (Node e l r) rest =
        down d (e:ancestors) l (across d ancestors r rest)
       
     down d ancestors t rest =
        if d >= depth then ancestors:rest
        else across (d+1) ancestors t rest
 in across 1 [] t []

build n = 
 let
   t = toplevel n  

   toplevel m
     | m < 1     = Empty
     | otherwise = Node m (f n m t) (toplevel (m-1))

   f col banned  Empty                = Empty 
   f col banned (Node a subtree sibs) =
    | banned == a = others
    | otherwise   = Node a (f (col-1) banned subtree) others
    where others = f col banned sibs
 in t

perms n = paths n (build n)

Iavor Diatchki[edit]

This does not use the right permutation order

import System(getArgs)

flop xs@(x:_) = reverse ys ++ zs where (ys,zs) = splitAt x xs
flops xs      = takeWhile ((1 /=) . head) (iterate flop xs)

perms xs      = foldr (concatMap . ins) [[]] xs

ins x []      = [[x]]
ins x (y:ys)  = (x:y:ys) : map (y:) (ins x ys)

pfannkuchen x = maximum (map (length . flops) (perms [1..x]))

main          = do let a = "9" -- a:_ <- getArgs
                   let n = read a :: Int
                   mapM_ print (take 30 (perms [1..n]))
                   print (pfannkuchen n)

Josh Goldfoot[edit]

I was able to significantly speed up the code by replacing the flip function with a function that relies entirely on pattern matching (no splitAts or reverses). It looks ugly, though:

mangle list@(1:xs) = list
mangle (2:x2:xs) = x2:2:xs
mangle (3:x2:x3:xs) = x3:x2:3:xs
... and so on.

Jan-Willem Maessen[edit]

I was surprised to learn that indexed insertion:

permutations (x:xs) =
    [insertAt n x perms | perms <- permutations xs,
                          n <- [0..length xs] ]

insertAt :: Int -> a -> [a] -> [a]
insertAt 0 y xs = y:xs
insertAt n y (x:xs) = x:(insertAt (n-1) y xs)

was faster than the usual version of permutation based on "inserts":

permutations (x:xs) =
    [insertAt n x perms | perms <- permutations xs,
                          n <- [0..length xs] ]
insertAt 0 y xs = y:xs
insertAt n y (x:xs) = x:(insertAt (n-1) y xs)

However, try these on for size. The non-strict "flop", which traverses its input exactly once, is the most surprising and made by far the biggest difference:

findmax :: [[Int]] -> Int
findmax xss = fm xss 0
  where fm []     mx = mx
        fm (p:ps) mx = fm ps $! (countFlops p `max` mx)

countFlops :: [Int] -> Int
countFlops as = cf as 0
  where cf    (1:_) flops = flops
        cf xs@(x:_) flops = cf (flop x xs) $! (flops+1)

flop :: Int -> [Int] -> [Int]
flop n xs = rs
  where (rs,ys) = fl n xs ys
        fl 0 xs     ys = (ys, xs)
        fl n (x:xs) ys = fl (n-1) xs (x:ys)

Old Code[edit]

This is the *slowest* entry for this benchmark, 800x slower than C, 500x slower than OCaml.

{-  The Computer Language Shootout
    http://shootout.alioth.debian.org/
    compile with:  ghc -O2 -o fannkuch fannkuch.hs
    contributed by Josh Goldfoot, "fixing" the version by Greg Buchholz
     permutations function translated from the C version by Heiner Marxen -}

import System(getArgs)
import Data.Int

main = do   [n] <- getArgs
            let p = permutations [1..(read n)]
            mapM_ putStrLn $ map (concatMap show) $ take 30 (permutations [1..(read n)])
            putStr $ "Pfannkuchen(" ++ n ++ ") = "
            print $ findmax 0 p

findmax :: Int8 -> [[Int8]] -> Int8
findmax soFar [] = soFar
findmax soFar (x:xs) =
   max (flop 0 x) (findmax soFar xs)

flop :: Int8 -> [Int8] -> Int8
flop acc (1:xs) = acc
flop acc list@(x:xs) = flop (acc+1) mangle
    where   mangle = (reverse front) ++ back
            (front,back) = splitAt (fromIntegral x) list

permutations :: [Int8] -> [[Int8]]
permutations arry =
   arry : (permuteloop arry [1..n] 1)
   where n = fromIntegral (length arry)

permuteloop :: [a] -> [Int8] -> Int8 -> [[a]]
permuteloop arry count r
   | r == ((fromIntegral . length) arry) = []
   | count' !! (fromIntegral r) > 0 = arry' : (permuteloop arry' (reload r count') 1)
   | otherwise = permuteloop arry' count' (r+1)
   where count' = (take (fromIntegral r) count) ++ ((count !! (fromIntegral r)) - 1):(drop (fromIntegral r+1) count)
         arry' = rotate (fromIntegral r) arry

rotate :: Int8 -> [a] -> [a]
rotate r (x:xs) =
   begin ++ x:end
   where (begin, end) = splitAt (fromIntegral r) xs

reload :: Int8 -> [Int8] -> [Int8]
reload r count
   | r == 1 = count
   | otherwise = [1..r] ++ (drop (fromIntegral r) count)