Difference between revisions of "Performance/Strings"

Latest revision as of 22:15, 18 April 2021

Haskell Performance Resource Constructs: Data Types - Functions Overloading - FFI - Arrays Strings - Integers - I/O Floating point - Concurrency Modules - Monads Techniques: Strictness - Laziness Avoiding space leaks Accumulating parameter Implementation-Specific: GHC - nhc98 - Hugs Yhc - JHC

Strings

Sometimes the cost of representing strings as lists of Chars can be too much. In this case, you can instead use packed strings. There are a number of options:

One of the packed string libraries, for example Data.ByteString
Unboxed arrays of Word8 or Char
Ptrs to foreign malloced Word8 buffers

The packed string libraries have the benefit over arrays of Word8 or Char types, in that they provide the usual list-like operations.

Some interesting results for Data.ByteString are documented here. In particular, it compares FPS against the existing PackedString and [Char] functions, and is used successfully with 1 terabyte strings.

Example

Pete Chown asked the question:

I want to read a text file. As an example, let's use /usr/share/dict/words and try to print out the last line of the file.

The python version completes in around 0.05s.

Attempt 1 : `[Char]`

import System.IO                                                                                  
main = readFile "/usr/share/dict/words" >>= putStrLn.last.lines

Run in hugs, this program took several seconds to complete. Problem: interpreted (solution, use a Haskell compiler). Compiled, the program completes in a fairly quick 0.2s. Still, we can do better.

Attempt 2 : `Data.ByteString`

Using ByteString, we get:

import qualified Data.ByteString as B                                                     
import IO                                                                                       
main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines

Runs in 0.063s

Attempt 3 : No Lists

Avoid splitting the file into lists at all, and just keep a single buffer (as a C programmer would perhaps do):

import qualified Data.ByteString as P
import Maybe
import IO

main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'

Runs in 0.013s

Related work

An extended tutorial on using PackedStrings/ByteStrings for high performance string manipulating code is here.

A discussion of the fastest way to parse a file of numbers, comparing various approaches using ByteStrings.

@@ Line 1: / Line 1: @@
 {{Performance infobox}}
+[[Category:Performance|Strings]]
 ==Strings==
-Sometimes the cost of representing strings as lists of ''Char'' can be
+Sometimes the cost of representing strings as lists of <tt>Chars</tt> can be
 too much. In this case, you can instead use packed strings. There are a
 number of options:
+* One of the packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html Data.ByteString]
-* The standard Data.PackedString type
+* Unboxed arrays of <tt>Word8</tt> or <tt>Char</tt>
-* One of the newer packed string libraries, for example [http://www.cse.unsw.edu.au/~dons/fps.html FastPackedString]
+* Ptrs to foreign malloced <tt>Word8</tt> buffers
-* Unboxed arrays of Word8 or Char
-* Ptrs to foreign malloced Word8 buffers
-The packed string libraries have the benefit over arrays of Word8 or
+The packed string libraries have the benefit over arrays of <tt>Word8</tt> or
-Char types, in that they provide the usual list-like operations.
+<tt>Char</tt> types, in that they provide the usual list-like operations.
-Some interesting results for FastPackedString are documented
+Some interesting results for <tt>Data.ByteString</tt> are documented
 [http://www.cse.unsw.edu.au/~dons/code/fps/README here]. In particular,
-it compares FPS against the existing PackedString and [Char] functions,
+it compares FPS against the existing <tt>PackedString</tt> and <tt>[Char]</tt> functions,
-and is used successfully with 1 gigabyte strings.
+and is used successfully with 1 terabyte strings.
 ==Example==
@@ Line 23: / Line 23: @@
 Pete Chown asked the question:
+<blockquote>
- I want to read a text file.  As an example, let's use
- /usr/share/dict/words and try to print out the last line of the file.
+I want to read a text file.  As an example, let's use <tt>/usr/share/dict/words</tt> and try to print out the last line of the file.
+</blockquote>
 The python version completes in around 0.05s.
-=== Attempt 1 : [Char] ===
+=== Attempt 1 : <tt>[Char]</tt> ===
 <haskell>
@@ Line 39: / Line 40: @@
 completes in a fairly quick 0.2s. Still, we can do better.
-=== Attempt 2 : Packed Strings ===
+=== Attempt 2 : <tt>Data.ByteString</tt> ===
-Using [http://www.cse.unsw.edu.au/~dons/fps.html fast, packed strings], we get:
+Using [http://www.cse.unsw.edu.au/~dons/fps.html ByteString], we get:
 <haskell>
-import qualified Data.FastPackedString as P
+import qualified Data.ByteString as B
 import IO
-main = P.readFile "/usr/share/dict/words" >>= P.hPut stdout . last . P.lines
+main = B.readFile "/usr/share/dict/words" >>= B.putStr . last . B.lines
 </haskell>
@@ Line 57: / Line 58: @@
 <haskell>
-import qualified Data.FastPackedString as P
+import qualified Data.ByteString as P
+import Maybe
 import IO
-main = do P.readFile "/usr/share/dict/words" >>= P.hPut stdout . snd . P.spanEnd (/='\n') . P.init
+main = P.readFile "/usr/share/dict/words" >>= P.putStrLn . snd . fromJust . P.breakLast '\n'
-          putChar '\n'
 </haskell>
 Runs in 0.013s
+=== Related work ===
+An extended tutorial on using <tt>PackedStrings</tt>/<tt>ByteStrings</tt> for high performance string manipulating code is [[Wc|here]].
+[http://groups.google.com/group/fa.haskell/browse_thread/thread/4133fa71ce97eb0e/fef34d1c3943bbe0#fef34d1c3943bbe0 A discussion] of the fastest way to parse a file of numbers, comparing various approaches using <tt>ByteStrings</tt>.

Difference between revisions of "Performance/Strings"

Latest revision as of 22:15, 18 April 2021

Contents

Strings

Example

Attempt 1 : `[Char]`

Attempt 2 : `Data.ByteString`

Attempt 3 : No Lists

Related work

Navigation menu

Search

Difference between revisions of "Performance/Strings"

Latest revision as of 22:15, 18 April 2021

Strings

Example

Attempt 1 : [Char]

Attempt 2 : Data.ByteString

Attempt 3 : No Lists

Related work

Navigation menu

Search

Attempt 1 : `[Char]`

Attempt 2 : `Data.ByteString`