Performance/Strings: Difference between revisions
DonStewart (talk | contribs) (Take string/packed string problem from haskell-cafe@ as example) |
m (Minor formatting changes) |
||
(10 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
{{Performance infobox}} | {{Performance infobox}} | ||
[[Category:Performance|Strings]] | |||
==Strings== | ==Strings== | ||
Sometimes the cost of representing strings as lists of | Sometimes the cost of representing strings as lists of <tt>Chars</tt> can be | ||
too much. In this case, you can instead use packed strings. There are a | too much. In this case, you can instead use packed strings. There are a | ||
number of options: | number of options: | ||
* One of the packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html Data.ByteString] | |||
* One of the | * Unboxed arrays of <tt>Word8</tt> or <tt>Char</tt> | ||
* Unboxed arrays of Word8 or Char | * Ptrs to foreign malloced <tt>Word8</tt> buffers | ||
* Ptrs to foreign malloced Word8 buffers | |||
The packed string libraries have the benefit over arrays of Word8 or | The packed string libraries have the benefit over arrays of <tt>Word8</tt> or | ||
Char types, in that they provide the usual list-like operations. | <tt>Char</tt> types, in that they provide the usual list-like operations. | ||
Some interesting results for <tt>Data.ByteString</tt> are documented | |||
[http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular, | |||
it compares FPS against the existing <tt>PackedString</tt> and <tt>[Char]</tt> functions, | |||
and is used successfully with 1 terabyte strings. | |||
==Example== | ==Example== | ||
Line 18: | Line 23: | ||
Pete Chown asked the question: | Pete Chown asked the question: | ||
<blockquote> | |||
I want to read a text file. As an example, let's use <tt>/usr/share/dict/words</tt> and try to print out the last line of the file. | |||
</blockquote> | |||
The python version completes in around 0.05s. | The python version completes in around 0.05s. | ||
=== Attempt 1 : [Char] === | === Attempt 1 : <tt>[Char]</tt> === | ||
<haskell> | <haskell> | ||
Line 34: | Line 40: | ||
completes in a fairly quick 0.2s. Still, we can do better. | completes in a fairly quick 0.2s. Still, we can do better. | ||
=== Attempt 2 : | === Attempt 2 : <tt>Data.ByteString</tt> === | ||
Using [http://www.cse.unsw.edu.au/~dons/fps.html | Using [http://www.cse.unsw.edu.au/~dons/fps.html ByteString], we get: | ||
<haskell> | <haskell> | ||
import qualified Data. | import qualified Data.ByteString as B | ||
import IO | import IO | ||
main = | main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines | ||
</haskell> | </haskell> | ||
Line 52: | Line 58: | ||
<haskell> | <haskell> | ||
import qualified Data. | import qualified Data.ByteString as P | ||
import Maybe | |||
import IO | import IO | ||
main = | main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n' | ||
</haskell> | </haskell> | ||
Runs in 0.013s | Runs in 0.013s | ||
=== Related work === | |||
An extended tutorial on using <tt>PackedStrings</tt>/<tt>ByteStrings</tt> for high performance string manipulating code is [[Wc|here]]. | |||
[http://groups.google.com/group/fa.haskell/browse_thread/thread/4133fa71ce97eb0e/fef34d1c3943bbe0#fef34d1c3943bbe0 A discussion] of the fastest way to parse a file of numbers, comparing various approaches using <tt>ByteStrings</tt>. |
Latest revision as of 22:15, 18 April 2021
Haskell Performance Resource
Constructs: Techniques: |
Strings
Sometimes the cost of representing strings as lists of Chars can be too much. In this case, you can instead use packed strings. There are a number of options:
- One of the packed string libraries, for example Data.ByteString
- Unboxed arrays of Word8 or Char
- Ptrs to foreign malloced Word8 buffers
The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.
Some interesting results for Data.ByteString are documented here. In particular, it compares FPS against the existing PackedString and [Char] functions, and is used successfully with 1 terabyte strings.
Example
Pete Chown asked the question:
I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file.
The python version completes in around 0.05s.
Attempt 1 : [Char]
import System.IO
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines
Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.
Attempt 2 : Data.ByteString
Using ByteString, we get:
import qualified Data.ByteString as B
import IO
main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines
Runs in 0.063s
Attempt 3 : No Lists
Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):
import qualified Data.ByteString as P
import Maybe
import IO
main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'
Runs in 0.013s
Related work
An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is here.
A discussion of the fastest way to parse a file of numbers, comparing various approaches using ByteStrings.