Difference between revisions of "Runtime compilation"
(Cleanup) 

(3 intermediate revisions by 2 users not shown)  
Line 1:  Line 1:  
−  '''Note: This article was written by Andrew Bromage and originally appeared at http://haskell.org/wikisnapshot/RunTimeCompilation.html''' 

+  Many algorithms require a preprocessing step which builds some data structure for later use in the algorithm proper. Consider making this preprocessing step build a Haskell function instead. In other words, use [[functions not data structures]]. 

−  Many algorithms require a preprocessing step which builds some data structure for later use in the algorithm proper. Consider making this preprocessing step build a Haskell function instead. In other words, use FunctionsNotDataStructures. 

+  Consider, for example, [[KnuthMorrisPratt substring search]]ing. In a conventional language, the approach would be to compile the string to be searched for into an array of overlaps (this is the preprocessing step) which the search algorithm then uses to actually perform the match. One benefit is that if you need to search for the same substring multiple times, you can share the preprocessing step. 

−  Consider, for example, KnuthMorrisPratt? substring searching. In a conventional language, the approach would be to compile the string to be searched for into an array of overlaps (this is the preprocessing step) which the search algorithm then uses to actually perform the match. One benefit is that if you need to search for the same substring multiple times, you can share the preprocessing step. 

+  However, consider how you'd implement substring searching if speed were crucial and the string were fixed at compile time. Suppose, for example, you wanted to search for "aab". You might write something like this (note that this uses the [[Not just Maybe]] idiom): 

−  However, consider how you'd implement substring searching if speed were crucial and the string were fixed at compile time. Suppose, for example, you wanted to search for "aab". You might write something like this (note that this uses the NotJustMaybe idiom): 

+  <haskell> 

+  search :: (Monad m) => String > m (String, String) 

+  search cs 

+  = search_aab [] cs 

+  where 

+  search_fail = fail "can't find aab" 

−  > search :: (Monad m) => String > m (String, String) 

+  search_aab prev [] = search_fail 

−  > search cs 

+  search_aab prev (c@'a':cs) 

−  > = search_aab [] cs 

+  = search_ab (c:prev) cs 

−  > where 

+  search_aab prev (c:cs) 

−  > search_fail = fail "can't find aab" 

+  = search_aab (c:prev) cs 

−  > 

+  
−  +  search_ab prev [] = search_fail 

−  +  search_ab prev (c@'a':cs) 

−  +  = search_b (c:prev) cs 

−  +  search_ab prev (c:cs) 

−  +  = search_aab prev (c:cs) 

−  +  
−  +  search_b prev [] = search_fail 

−  +  search_b prev (c@'b':cs) 

−  +  = return (reverse (c:prev), cs) 

−  +  search_b prev (c:cs) 

−  +  = search_ab prev (c:cs)  Note special case here 

−  +  </haskell> 

−  > search_b prev [] = search_fail 

−  > search_b prev (c@'b':cs) 

−  > = return (reverse (c:prev), cs) 

−  > search_b prev (c:cs) 

−  > = search_ab prev (c:cs)  Note special case here 

It's not too hard to think of how to write a preprocessor to do this, since the translation is mechanical. However, if you're clever, you can do this compilation at run time. 
It's not too hard to think of how to write a preprocessor to do this, since the translation is mechanical. However, if you're clever, you can do this compilation at run time. 

Line 30:  Line 35:  
This Haskell function builds the KMP overlap table: 
This Haskell function builds the KMP overlap table: 

−  [TODO: Clean the code up a bit.] 
+  [''TODO:'' Clean the code up a bit.] 
−  > overlap :: (Eq a) => [a] > [Int] 

+  <haskell> 

−  > overlap str 

+  overlap :: (Eq a) => [a] > [Int] 

−  +  overlap str 

−  +  = overlap' [0] str 

−  +  where 

−  +  overlap' prev [] 

−  +  = reverse prev 

−  +  overlap' prev (x:xs) 

−  +  = let get_o o 

−  +   o <= 1  str !! (o2) == x = o 

−  +   otherwise = get_o (1 + prev !! (length prev  o + 1)) 

+  in overlap' (get_o (head prev + 1):prev) xs 

+  </haskell> 

−  Then using this, we build up a Haskell function using 
+  Then using this, we build up a Haskell function using [[continuation passing style]] to handle the failure states: 
−  > matchKMP :: (Monad m, Eq a) => [a] > ([a] > m ([a],[a])) 

+  <haskell> 

−  > matchKMP [] 

+  matchKMP :: (Monad m, Eq a) => [a] > ([a] > m ([a],[a])) 

−  > = error "Can't match empty list" 

+  matchKMP [] 

−  > matchKMP xs 

+  = error "Can't match empty list" 

−  > = matchfunc [] 

+  matchKMP xs 

−  > where 

+  = matchfunc [] 

−  > matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs)) 

+  where 

−  > dofail = \ps xs > case xs of 

+  matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs)) 

−  > [] > fail "can't match" 

+  dofail = \ps xs > case xs of 

−  > (y:ys) > matchfunc (y:ps) ys 

+  [] > fail "can't match" 

−  > 

+  (y:ys) > matchfunc (y:ps) ys 

−  > type PartialMatchFunc? m a = [a] > [a] > m ([a], [a]) 

+  
−  > 

+  type PartialMatchFunc m a = [a] > [a] > m ([a], [a]) 

−  > makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc? m a] > [(a, Int)] 

+  
−  +  makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] > [(a, Int)] 

−  +  > PartialMatchFunc m a 

−  +  makeMatchFunc prev [] 

−  +  = \ps xs > return (reverse ps, xs) 

−  +  makeMatchFunc prev ((x,failstate):ms) 

−  +  = thisf 

−  +  where 

−  +  mf = makeMatchFunc (thisf:prev) ms 

−  +  failcont = prev !! (length prev  failstate  1) 

−  +  thisf = \ps xs > case xs of 

−  +  [] > fail "can't match" 

−  +  (y:ys) > if (x == y) then mf (y:ps) ys 

+  else failcont ps xs 

+  </haskell> 

We can now either use it as a standard substring match function: 
We can now either use it as a standard substring match function: 

−  > matchKMP "aab" "babaaba" 

+  <haskell> 

+  matchKMP "aab" "babaaba" 

+  </haskell> 

or we can use currying to share the compiled function: 
or we can use currying to share the compiled function: 

−  > match_aab :: (Monad m) => String > m (String, String) 

+  <haskell> 

−  > match_aab = matchKMP "aab" 

+  match_aab :: (Monad m) => String > m (String, String) 

+  match_aab = matchKMP "aab" 

+  </haskell> 

−  This latter example only compiles the KMP machine once and, thanks to 
+  This latter example only compiles the KMP machine once and, thanks to [[lazy evaluation]], it's only compiled if it's ever used. 
−  Getting recursion into your generated code can be hard 
+  Getting recursion into your generated code can be hard. 
+  The above example was relatively simple, but in the general case you may need something more sophisticated. 

−  Caveat: This technique is considered by many to be a hack at best, and a kludge at worst. Things can get very fiddly and extremely hard to debug. RunTimeCompilation can sometimes win you significant efficiency gains, but can often win you almost nothing (at the cost of the your increased stress and reduced productivity). It's almost always worth implementing your algorithm the naive way first and only then if it's found to be the bottleneck, handcompiling a few examples first to see if that speeds things up sufficiently. 

+  See [[tying the knot]] for some ideas. 

+  
+  Caveat: This technique is considered by many to be a hack at best, and a kludge at worst. Things can get very fiddly and extremely hard to debug. Runtime compilation can sometimes win you significant efficiency gains, but can often win you almost nothing at the cost of the your increased stress and reduced productivity. 

+  
+  It's almost always worth implementing your algorithm the naive way first and only then if it's found to be the bottleneck, handcompiling a few examples first to see if that speeds things up sufficiently. 

You have been warned. 
You have been warned. 

−   AndrewBromage? 

+  == Filesorting example == 

+  '''Note: The following article is migrated from the old wiki.''' 

+  
+  Consider the following problem: we have a list of files, each described by the following structure: 

+  
+  <haskell> 

+  import Data.List 

+  import System.Time 

+  
+  data FileInfo = 

+  FileInfo { fiPathname :: FilePath  directory, such as "C:/WINDOWS" 

+  , fiBasename :: FilePath  basename, such as "COMMAND" 

+  , fiExtension :: FilePath  extension, such as "COM" 

+  , fiSize :: Integer  size of file 

+  , fiTime :: ClockTime  file creation time 

+  , fiGroup :: Int  number of group which includes this file 

+  } 

+  </haskell> 

+  We need to sort this list according to hierarchy of criteria, given by list of chars. For example, "ebs" means "sort by extension, then by basename, then by size". With a 6 criteria to compare files, we have 6!=720 possible sort orders. So, if we want to do fast sorting in a userselectable order, we need to generate sorting function on the fly: 

+  <haskell> 

+  sortFileList :: String > [FileInfo] > [FileInfo] 

+  sortFileList orderString = sortOn (key_func orderString) 

+  
+   Sort list by function result 

+  sortOn f = sortBy (map2cmp f) 

+  
+   Converts "key_func" to "compare_func" 

+  map2cmp f x y = (f x) `compare` (f y) 

+  </haskell> 

+  
+  The key function here is key_func This function converts list of sort criteria to function, which will convert ?FileInfo to it's "sorting key": 

+  <haskell> 

+    Map `orderString` to function returning ordering key 

+  key_func :: String > (FileInfo > [SortOrder]) 

+  key_func orderString = 

+  map_functions [sort_on c  c < orderString] 

+  where sort_on 'p' = (OrderFilePath .fiPathname) 

+  sort_on 'b' = (OrderFilePath .fiBasename) 

+  sort_on 'e' = (OrderFilePath .fiExtension) 

+  sort_on 's' = (OrderFileSize .fiSize) 

+  sort_on 't' = (OrderFileTime .fiTime) 

+  sort_on 'g' = (OrderGroup .fiGroup) 

+  
+  data SortOrder = OrderFilePath FilePath 

+   OrderFileSize Integer 

+   OrderFileTime ClockTime 

+   OrderGroup Int deriving (Eq, Ord) 

+  
+   Map on functions instead of its' arguments! 

+  map_functions :: [a>b] > a > [b] 

+  map_functions [] x = [] 

+  map_functions (f:fs) x = f x : map_functions fs x 

+  </haskell> 

+  Now we can sort filelist in any order. But this solution is not optimal: each sorting key is created as a list of values while better solution will be using a tuple of appropriate size. For example, adding the following line before general definition of sortFileList speed up by 25% processing of this common case: 

+  <haskell> 

+  sortFileList "eb" = sortOn (\fi > (fiExtension fi, fiBasename fi)) 

+  </haskell> 

+  TODO: Add story about compiling regular expressions thorough internal structure down to (?FileInfo>Bool) predicates 

+  
+   BulatZiganshin (bulat_z##mail.ru) 

+  
+  == 

+  
+  [[Category:Idioms]] 
Latest revision as of 02:03, 25 August 2008
Many algorithms require a preprocessing step which builds some data structure for later use in the algorithm proper. Consider making this preprocessing step build a Haskell function instead. In other words, use functions not data structures.
Consider, for example, KnuthMorrisPratt substring searching. In a conventional language, the approach would be to compile the string to be searched for into an array of overlaps (this is the preprocessing step) which the search algorithm then uses to actually perform the match. One benefit is that if you need to search for the same substring multiple times, you can share the preprocessing step.
However, consider how you'd implement substring searching if speed were crucial and the string were fixed at compile time. Suppose, for example, you wanted to search for "aab". You might write something like this (note that this uses the Not just Maybe idiom):
search :: (Monad m) => String > m (String, String)
search cs
= search_aab [] cs
where
search_fail = fail "can't find aab"
search_aab prev [] = search_fail
search_aab prev (c@'a':cs)
= search_ab (c:prev) cs
search_aab prev (c:cs)
= search_aab (c:prev) cs
search_ab prev [] = search_fail
search_ab prev (c@'a':cs)
= search_b (c:prev) cs
search_ab prev (c:cs)
= search_aab prev (c:cs)
search_b prev [] = search_fail
search_b prev (c@'b':cs)
= return (reverse (c:prev), cs)
search_b prev (c:cs)
= search_ab prev (c:cs)  Note special case here
It's not too hard to think of how to write a preprocessor to do this, since the translation is mechanical. However, if you're clever, you can do this compilation at run time.
This Haskell function builds the KMP overlap table:
[TODO: Clean the code up a bit.]
overlap :: (Eq a) => [a] > [Int]
overlap str
= overlap' [0] str
where
overlap' prev []
= reverse prev
overlap' prev (x:xs)
= let get_o o
 o <= 1  str !! (o2) == x = o
 otherwise = get_o (1 + prev !! (length prev  o + 1))
in overlap' (get_o (head prev + 1):prev) xs
Then using this, we build up a Haskell function using continuation passing style to handle the failure states:
matchKMP :: (Monad m, Eq a) => [a] > ([a] > m ([a],[a]))
matchKMP []
= error "Can't match empty list"
matchKMP xs
= matchfunc []
where
matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs))
dofail = \ps xs > case xs of
[] > fail "can't match"
(y:ys) > matchfunc (y:ps) ys
type PartialMatchFunc m a = [a] > [a] > m ([a], [a])
makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] > [(a, Int)]
> PartialMatchFunc m a
makeMatchFunc prev []
= \ps xs > return (reverse ps, xs)
makeMatchFunc prev ((x,failstate):ms)
= thisf
where
mf = makeMatchFunc (thisf:prev) ms
failcont = prev !! (length prev  failstate  1)
thisf = \ps xs > case xs of
[] > fail "can't match"
(y:ys) > if (x == y) then mf (y:ps) ys
else failcont ps xs
We can now either use it as a standard substring match function:
matchKMP "aab" "babaaba"
or we can use currying to share the compiled function:
match_aab :: (Monad m) => String > m (String, String)
match_aab = matchKMP "aab"
This latter example only compiles the KMP machine once and, thanks to lazy evaluation, it's only compiled if it's ever used.
Getting recursion into your generated code can be hard. The above example was relatively simple, but in the general case you may need something more sophisticated.
See tying the knot for some ideas.
Caveat: This technique is considered by many to be a hack at best, and a kludge at worst. Things can get very fiddly and extremely hard to debug. Runtime compilation can sometimes win you significant efficiency gains, but can often win you almost nothing at the cost of the your increased stress and reduced productivity.
It's almost always worth implementing your algorithm the naive way first and only then if it's found to be the bottleneck, handcompiling a few examples first to see if that speeds things up sufficiently.
You have been warned.
Filesorting example
Note: The following article is migrated from the old wiki.
Consider the following problem: we have a list of files, each described by the following structure:
import Data.List
import System.Time
data FileInfo =
FileInfo { fiPathname :: FilePath  directory, such as "C:/WINDOWS"
, fiBasename :: FilePath  basename, such as "COMMAND"
, fiExtension :: FilePath  extension, such as "COM"
, fiSize :: Integer  size of file
, fiTime :: ClockTime  file creation time
, fiGroup :: Int  number of group which includes this file
}
We need to sort this list according to hierarchy of criteria, given by list of chars. For example, "ebs" means "sort by extension, then by basename, then by size". With a 6 criteria to compare files, we have 6!=720 possible sort orders. So, if we want to do fast sorting in a userselectable order, we need to generate sorting function on the fly:
sortFileList :: String > [FileInfo] > [FileInfo]
sortFileList orderString = sortOn (key_func orderString)
 Sort list by function result
sortOn f = sortBy (map2cmp f)
 Converts "key_func" to "compare_func"
map2cmp f x y = (f x) `compare` (f y)
The key function here is key_func This function converts list of sort criteria to function, which will convert ?FileInfo to it's "sorting key":
  Map `orderString` to function returning ordering key
key_func :: String > (FileInfo > [SortOrder])
key_func orderString =
map_functions [sort_on c  c < orderString]
where sort_on 'p' = (OrderFilePath .fiPathname)
sort_on 'b' = (OrderFilePath .fiBasename)
sort_on 'e' = (OrderFilePath .fiExtension)
sort_on 's' = (OrderFileSize .fiSize)
sort_on 't' = (OrderFileTime .fiTime)
sort_on 'g' = (OrderGroup .fiGroup)
data SortOrder = OrderFilePath FilePath
 OrderFileSize Integer
 OrderFileTime ClockTime
 OrderGroup Int deriving (Eq, Ord)
 Map on functions instead of its' arguments!
map_functions :: [a>b] > a > [b]
map_functions [] x = []
map_functions (f:fs) x = f x : map_functions fs x
Now we can sort filelist in any order. But this solution is not optimal: each sorting key is created as a list of values while better solution will be using a tuple of appropriate size. For example, adding the following line before general definition of sortFileList speed up by 25% processing of this common case:
sortFileList "eb" = sortOn (\fi > (fiExtension fi, fiBasename fi))
TODO: Add story about compiling regular expressions thorough internal structure down to (?FileInfo>Bool) predicates
 BulatZiganshin (bulat_z##mail.ru)
==