Difference between revisions of "Performance/Strings"

From HaskellWiki
Jump to navigation Jump to search
(Use fps if your strings are 1G or bigger)
m (Minor formatting changes)
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{Performance infobox}}
 
{{Performance infobox}}
  +
[[Category:Performance|Strings]]
 
==Strings==
 
==Strings==
   
Sometimes the cost of representing strings as lists of ''Char'' can be
+
Sometimes the cost of representing strings as lists of <tt>Chars</tt> can be
 
too much. In this case, you can instead use packed strings. There are a
 
too much. In this case, you can instead use packed strings. There are a
 
number of options:
 
number of options:
   
 
* One of the packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html Data.ByteString]
* The standard Data.PackedString type
 
 
* Unboxed arrays of <tt>Word8</tt> or <tt>Char</tt>
* One of the newer packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html FastPackedString]
 
 
* Ptrs to foreign malloced <tt>Word8</tt> buffers
* Unboxed arrays of Word8 or Char
 
* Ptrs to foreign malloced Word8 buffers
 
   
The packed string libraries have the benefit over arrays of Word8 or
+
The packed string libraries have the benefit over arrays of <tt>Word8</tt> or
Char types, in that they provide the usual list-like operations.
+
<tt>Char</tt> types, in that they provide the usual list-like operations.
   
Some interesting results for FastPackedString are documented
+
Some interesting results for <tt>Data.ByteString</tt> are documented
 
[http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular,
 
[http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular,
it compares FPS against the existing PackedString and [Char] functions,
+
it compares FPS against the existing <tt>PackedString</tt> and <tt>[Char]</tt> functions,
and is used successfully with 1 gigabyte strings.
+
and is used successfully with 1 terabyte strings.
   
 
==Example==
 
==Example==
Line 23: Line 23:
 
Pete Chown asked the question:
 
Pete Chown asked the question:
   
  +
<blockquote>
I want to read a text file. As an example, let's use
 
/usr/share/dict/words and try to print out the last line of the file.
+
I want to read a text file. As an example, let's use <tt>/usr/share/dict/words</tt> and try to print out the last line of the file.
  +
</blockquote>
   
 
The python version completes in around 0.05s.
 
The python version completes in around 0.05s.
   
=== Attempt 1 : [Char] ===
+
=== Attempt 1 : <tt>[Char]</tt> ===
   
 
<haskell>
 
<haskell>
Line 39: Line 40:
 
completes in a fairly quick 0.2s. Still, we can do better.
 
completes in a fairly quick 0.2s. Still, we can do better.
   
=== Attempt 2 : Packed Strings ===
+
=== Attempt 2 : <tt>Data.ByteString</tt> ===
   
Using [http://www.cse.unsw.edu.au/~dons/fps.html fast, packed strings], we get:
+
Using [http://www.cse.unsw.edu.au/~dons/fps.html ByteString], we get:
   
 
<haskell>
 
<haskell>
import qualified Data.FastPackedString as P
+
import qualified Data.ByteString as B
 
import IO
 
import IO
main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines
+
main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines
 
</haskell>
 
</haskell>
   
Line 57: Line 58:
   
 
<haskell>
 
<haskell>
import qualified Data.FastPackedString as P
+
import qualified Data.ByteString as P
  +
import Maybe
 
import IO
 
import IO
   
main = do P.readFile "/usr/share/dict/words" >>= P.hPut stdout . snd . P.spanEnd (/='\n') . P.init
+
main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'
putChar '\n'
 
 
</haskell>
 
</haskell>
   
 
Runs in 0.013s
 
Runs in 0.013s
  +
  +
=== Related work ===
  +
  +
An extended tutorial on using <tt>PackedStrings</tt>/<tt>ByteStrings</tt> for high performance string manipulating code is [[Wc|here]].
  +
  +
[http://groups.google.com/group/fa.haskell/browse_thread/thread/4133fa71ce97eb0e/fef34d1c3943bbe0#fef34d1c3943bbe0 A discussion] of the fastest way to parse a file of numbers, comparing various approaches using <tt>ByteStrings</tt>.

Latest revision as of 22:15, 18 April 2021

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Strings

Sometimes the cost of representing strings as lists of Chars can be too much. In this case, you can instead use packed strings. There are a number of options:

  • One of the packed string libraries, for example Data.ByteString
  • Unboxed arrays of Word8 or Char
  • Ptrs to foreign malloced Word8 buffers

The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.

Some interesting results for Data.ByteString are documented here. In particular, it compares FPS against the existing PackedString and [Char] functions, and is used successfully with 1 terabyte strings.

Example

Pete Chown asked the question:

I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file.

The python version completes in around 0.05s.

Attempt 1 : [Char]

import System.IO                                                                                  
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines

Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.

Attempt 2 : Data.ByteString

Using ByteString, we get:

import qualified Data.ByteString as B                                                     
import IO                                                                                       
main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines

Runs in 0.063s

Attempt 3 : No Lists

Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):

import qualified Data.ByteString as P
import Maybe
import IO

main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'

Runs in 0.013s

Related work

An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is here.

A discussion of the fastest way to parse a file of numbers, comparing various approaches using ByteStrings.