List function suggestions: Difference between revisions
No edit summary |
Nmessenger (talk | contribs) (General cleanup) |
||
Line 9: | Line 9: | ||
<i> | <i> | ||
Hacking up your own custom split (or a tokens/splitOnGlue) must be one | Hacking up your own custom split (or a tokens/splitOnGlue) must be one | ||
of the most common questions from beginners on the | of the most common questions from beginners on the IRC channel. | ||
Anyone | Anyone remember what the result of the "let's get split into the base | ||
library" movement's work was? | library" movement's work was? | ||
ISTR there wasn't a | ISTR there wasn't a consensus, so nothing happened. Which is silly, | ||
really - I agree we should definitely have a Data.List.split. | really - I agree we should definitely have a Data.List.split. | ||
</i> | </i> | ||
Line 28: | Line 28: | ||
== Goal == | == Goal == | ||
The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to | The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to accommodate every possible use of these functions should not be a sufficient reason to abandon the whole project. | ||
Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, FPS/ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from FastPackedString and/or ByteString, etc. | Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, FPS/ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from FastPackedString and/or ByteString, etc. | ||
Line 47: | Line 47: | ||
<haskell> | <haskell> | ||
join sep | join sep . split sep = id | ||
</haskell> | </haskell> | ||
Line 63: | Line 63: | ||
<haskell> | <haskell> | ||
split' sep | split' sep = filter (not . null) . split sep | ||
</haskell> | </haskell> | ||
Line 71: | Line 71: | ||
tokensws = tokens' (`elem` " \f\v\t\n\r\b") | tokensws = tokens' (`elem` " \f\v\t\n\r\b") | ||
tokensws "Hello there\n \n Haskellers! " == | tokensws "Hello there\n \n Haskellers! " == | ||
["Hello", "there", "Haskellers!"] | ["Hello", "there", "Haskellers!"] | ||
</haskell> | </haskell> | ||
<i> | |||
Would a </i><hask>nonnulls = filter (not . null)</hask><i> function be a better alternative to defining separate filtered splits? [[User:Nmessenger|Nmessenger]] 11:09, 28 December 2006 (UTC) | |||
</i> | |||
'''TODO: add version like python with multi-element separator''' | '''TODO: add version like python with multi-element separator''' | ||
Line 113: | Line 117: | ||
'''TODO: list names and reasons for/against''' | '''TODO: list names and reasons for/against''' | ||
== | ==Function Behavior Changes== | ||
It is dangerous to change the behavior of prelude functions, however currently unlines will add an additional, unnecessary newline. <hask>unlines . lines</hask> is effectively <hask>(++"\n")</hask> rather than <hask>id</hask>. | |||
It is dangerous to change the behavior of prelude functions, however currently unlines will add an additional, | |||
I propose this definition instead: | I propose this definition instead: | ||
<haskell> | |||
unlines = concat . intersperse "\n" | unlines = concat . intersperse "\n" | ||
</haskell> | |||
Also, lines/words will be effectively defined in terms of split. | Also, lines/words will be effectively defined in terms of split. |
Revision as of 11:09, 28 December 2006
Let's fix this
We need these useful functions in Data.List; I'll call them 'split' (and variants) and 'replace'. These are easily implemented but everyone always reinvents them. The goal is clarity/uniformity (everyone uses them widely and recognizes them) and portability (I don't have to keep reimplementing these or copying that one file UsefulMissingFunctions.hs).
Use this page to record consensus as reached on the Talk Page. (Use four tildes to sign your post automatically with your name/timestamp.) Diverging opinions welcome! Note: a lot of good points (diverging opinions!) are covered in the mailing lists, but if we include all these various cases, split* will have 9 variants! I'm working on trying to organize all this into something meaningful.
Summary
Hacking up your own custom split (or a tokens/splitOnGlue) must be one of the most common questions from beginners on the IRC channel.
Anyone remember what the result of the "let's get split into the base library" movement's work was?
ISTR there wasn't a consensus, so nothing happened. Which is silly, really - I agree we should definitely have a Data.List.split.
A thread July 2006
http://www.haskell.org/pipermail/haskell-cafe/2006-July/thread.html#16559
A thread July 2004
http://www.haskell.org/pipermail/libraries/2004-July/thread.html#2342
Goal
The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to accommodate every possible use of these functions should not be a sufficient reason to abandon the whole project.
Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, FPS/ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from FastPackedString and/or ByteString, etc.
The Data.List functions
split (working name)
We need a few of these:
split :: Eq a => a -> [a] -> [[a]]
splitWith :: (a -> Bool) -> [a] -> [[a]]
tokens :: (a -> Bool) -> [a] -> [[a]]
That preserve:
join sep . split sep = id
See below for 'join'
And some that use above split but filter to remove empty elements (but do not preserve above property). Easy enough:
split' :: Eq a => a -> [a] -> [[a]]
splitWith' :: (a -> Bool) -> [a] -> [[a]]
tokens' :: (a -> Bool) -> [a] -> [[a]]
i.e.
split' sep = filter (not . null) . split sep
Usage would be:
tokensws = tokens' (`elem` " \f\v\t\n\r\b")
tokensws "Hello there\n \n Haskellers! " ==
["Hello", "there", "Haskellers!"]
Would a nonnulls = filter (not . null)
function be a better alternative to defining separate filtered splits? Nmessenger 11:09, 28 December 2006 (UTC)
TODO: add version like python with multi-element separator
TODO: give code, copy-paste from threads mentioned above
TODO: list names and reasons for/against
replace (working name)
replace :: [a] -> [a] -> [a] -> [a]
like Python replace:
replace "the" "a" "the quick brown fox jumped over the lazy black dog"
===>
"a quick brown fox jumped over a lazy black dog"
TODO: give code, copy-paste from threads mentioned above
TODO: list names and reasons for/against
join (working name)
join :: [a] -> [[a]] -> [a]
join sep = concat . intersperse sep
TODO: copy-paste things from threads mentioned above
TODO: list names and reasons for/against
Function Behavior Changes
It is dangerous to change the behavior of prelude functions, however currently unlines will add an additional, unnecessary newline. unlines . lines
is effectively (++"\n")
rather than id
.
I propose this definition instead:
unlines = concat . intersperse "\n"
Also, lines/words will be effectively defined in terms of split.