Difference between revisions of "Data.List.Split"

From HaskellWiki
Jump to: navigation, search
(fix splitEquidistant, rename to more common name 'chunk')
Line 14: Line 14:
 
** keep the separators as their own separate pieces of the result list
 
** keep the separators as their own separate pieces of the result list
 
** what to do with separators at the beginning/end? create a blank split before/after, or not?
 
** what to do with separators at the beginning/end? create a blank split before/after, or not?
  +
  +
An important caveat: we should strive to keep things flexible yet SIMPLE. The more complicated things get, the closer this gets to just being a general parsing or regex library. So the right balance needs to be struck.
   
 
Add your implementations below! Once we converge on something good we can upload it to hackage.
 
Add your implementations below! Once we converge on something good we can upload it to hackage.
Line 69: Line 71:
 
where
 
where
 
(y1, y2) = splitAt n xs
 
(y1, y2) = splitAt n xs
  +
  +
</haskell>
  +
  +
== A combinator approach? ==
  +
  +
Here are some initial thoughts on a combinator approach. The trick is to find nice implementations of the declarations below. Please add your own thoughts, other combinators, etc.
  +
  +
<haskell>
  +
  +
data Splitter a
  +
  +
split :: Splitter a -> [a] -> [[a]]
  +
  +
onElts :: [a] -> Splitter a -- split on any of these elements
  +
onSublist :: [a] -> Splitter a -- split on this exact subsequence
  +
whenElt :: (a -> Bool) -> Splitter a
  +
keepingDelims :: Splitter a -> Splitter a
  +
collapsingNulls :: Splitter a -> Splitter a
  +
-- other basic combinators?
  +
  +
-- now you can write things like
  +
--
  +
-- split (collapsingNulls $ onElts " ,") "abc,def , gh"
  +
--
  +
-- which should evaluate to ["abc", "def", "gh"].
  +
  +
-- some convenience functions can be provided, such as...
  +
  +
splitOn = split . onElts
  +
splitWhen = split . whenElt
   
 
</haskell>
 
</haskell>

Revision as of 21:31, 13 December 2008

A theoretical module which contains implementations/combinators for implementing every possible method of list-splitting known to man. This way no one has to argue about what the correct interface for split is, we can just have them all.

Some possible ways to split a list, to get your creative juices flowing:

  • what to split on?
    • single-element separator
    • sublist separator
    • use a list of possible separators instead of just one
    • use a predicate on elements or sublists instead of giving explicit separators
    • use approximate matching?
    • chunks of fixed length
  • how to split?
    • discard the separators
    • keep the separators with the preceding or following splits
    • keep the separators as their own separate pieces of the result list
    • what to do with separators at the beginning/end? create a blank split before/after, or not?

An important caveat: we should strive to keep things flexible yet SIMPLE. The more complicated things get, the closer this gets to just being a general parsing or regex library. So the right balance needs to be struck.

Add your implementations below! Once we converge on something good we can upload it to hackage.

{-# LANGUAGE ViewPatterns #-}

import Data.List (unfoldr)


-- intercalate :: [a] -> [[a]] -> [a]
-- intercalate x [a,b,c,x,y,z] = [a,x,b,x,c,x,x,y,x,z,x]

-- unintercalate :: [a] -> [a] -> [[a]]
-- unintercalate x [a,x,b,x,c,x,x,y,x,z,x] = [a,b,c,[],y,z]

-- unintercalate is the "inverse" of intercalate

match [] string = Just string
match (_:_) [] = Nothing
match (p:ps) (q:qs) | p == q = match ps qs
match (_:_)  (_:_)  | otherwise = Nothing

chopWith delimiter (match delimiter -> Just tail) = return ([], tail)
chopWith delimiter (c:cs) = chopWith delimiter cs >>= \(head, tail) ->
                              return (c:head, tail)
chopWith delimiter [] = Nothing
-- note: chopWith could be make 'more efficient' i.e. remove the >>=\-> bit
--       by adding an accumulator


unintercalate delimiter = unfoldr (chopWith delimiter)

-- > unintercalate "x" "axbxcxxyxzx"
-- ["a","b","c","","y","z"]

splitOn :: (a -> Bool) -> [a] -> [[a]]
splitOn _ [] = []
splitOn f l@(x:xs)
  | f x = splitOn f xs
  | otherwise = let (h,t) = break f l in h:(splitOn f t)

-- take the element who make predict true as delimiter
-- > splitOn even [1,3,5,6,7,3,3,2,1,1,1]
-- [[1,3,5],[7,3,3],[1,1,1]]

-- | like String split, except for any element that obeys Eq
splitEq :: Eq a => a -> [a] -> [[a]]
splitEq e = splitOn (==e)

-- | split at regular intervals
chunk :: Int -> [a] -> [[a]]
chunk _ [] = [[]]
chunk n xs = y1 : chunk n y2
  where
    (y1, y2) = splitAt n xs

A combinator approach?

Here are some initial thoughts on a combinator approach. The trick is to find nice implementations of the declarations below. Please add your own thoughts, other combinators, etc.

data Splitter a

split :: Splitter a -> [a] -> [[a]]

onElts :: [a] -> Splitter a     -- split on any of these elements
onSublist :: [a] -> Splitter a  -- split on this exact subsequence
whenElt :: (a -> Bool) -> Splitter a
keepingDelims :: Splitter a -> Splitter a
collapsingNulls :: Splitter a -> Splitter a
-- other basic combinators?

-- now you can write things like
--
--   split (collapsingNulls $ onElts " ,") "abc,def   , gh"
--
-- which should evaluate to ["abc", "def", "gh"]. 

-- some convenience functions can be provided, such as...

splitOn   = split . onElts
splitWhen = split . whenElt