Difference between revisions of "Performance/Strings"
DonStewart (talk | contribs) (Clean up) |
DonStewart (talk | contribs) (FPS -> Data.ByteString) |
||
Line 7: | Line 7: | ||
* The standard Data.PackedString type |
* The standard Data.PackedString type |
||
− | * One of the newer packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html |
+ | * One of the newer packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html Data.ByteString] |
* Unboxed arrays of Word8 or Char |
* Unboxed arrays of Word8 or Char |
||
* Ptrs to foreign malloced Word8 buffers |
* Ptrs to foreign malloced Word8 buffers |
||
Line 14: | Line 14: | ||
Char types, in that they provide the usual list-like operations. |
Char types, in that they provide the usual list-like operations. |
||
− | Some interesting results for |
+ | Some interesting results for Data.ByteString are documented |
[http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular, |
[http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular, |
||
it compares FPS against the existing PackedString and [Char] functions, |
it compares FPS against the existing PackedString and [Char] functions, |
||
− | and is used successfully with 1 |
+ | and is used successfully with 1 terabyte strings. |
==Example== |
==Example== |
Revision as of 12:16, 20 May 2006
Haskell Performance Resource
Constructs: Techniques: |
Strings
Sometimes the cost of representing strings as lists of Char can be too much. In this case, you can instead use packed strings. There are a number of options:
- The standard Data.PackedString type
- One of the newer packed string libraries, for example Data.ByteString
- Unboxed arrays of Word8 or Char
- Ptrs to foreign malloced Word8 buffers
The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.
Some interesting results for Data.ByteString are documented here. In particular, it compares FPS against the existing PackedString and [Char] functions, and is used successfully with 1 terabyte strings.
Example
Pete Chown asked the question:
I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file.
The python version completes in around 0.05s.
Attempt 1 : [Char]
import System.IO
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.
Attempt 2 : Packed Strings
Using fast, packed strings, we get:
import qualified Data.FastPackedString as P
import IO
main = P.readFile "/usr/share/dict/words" >>= P.putStr . last . P.lines
Runs in 0.063s
Attempt 3 : No Lists
Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):
import qualified Data.ByteString as P
import Maybe
import IO
main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'
Runs in 0.013s
Related work
An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is here.