From HaskellWiki
Revision as of 00:07, 21 June 2007 by Mrd (talk | contribs) (more discussion of I/O)

Jump to: navigation, search

SPOJ is Sphere Online Judge, a automated programming contest website with lots of problems and solutions ranked.

Solution to TEST

TEST description: the input consists of one number per line, the program should echo each number until 42 is reached, at which point the program should exit.

main = mapM_ putStrLn . takeWhile (/="42") . lines =<< getContents

Solution to INTEST

INTEST description: the first line of input contains two numbers: n and k. The input then consists of n lines of numbers. The output of the program should be the count of the numbers which are divisible by k.

Prior to the installation of GHC 6.6.1, it was quite difficult, if not impossible, to pass this demonstration. This solution shows a simple, reasonably efficient manner of using the new ByteString library to achieve acceptable times.

import Data.List (unfoldr)
import qualified Data.ByteString.Char8 as SS

divisibleBy :: Int -> Int -> Bool
a `divisibleBy` n = a `rem` n == 0

readInt1 :: SS.ByteString -> Maybe (Int, SS.ByteString)
readInt1 cs = do
  (n, cs') <- SS.readInt cs
  return (n, SS.tail cs')

main = do
  cs <- SS.getContents  -- This is the only line of I/O
  let n:k:ns = unfoldr readInt1 cs
      count  = length $ filter (`divisibleBy` k) ns
  print count

Techniques for dealing with problems efficiently

This section accumulates some earned wisdom about writing Haskell programs which overcome some of the technical obstacles in SPOJ. These are not spoilers, hopefully, but useful tips in general about auxiliary issues.


SPOJ has finally installed the GHC version 6.6.1. This makes the new, alternative I/O module available: Data.ByteString. Hopefully, it will be possible to find solutions for many more problems -- which could not be solved efficiently in the past because of mundane issues like I/O.

Brief Details

ByteStrings come in two varieties: Strict (default), and Lazy. Strict ByteStrings, in brief, are implemented by a foreign pointer to a block of memory which is filled by low level operations. The ByteString data type points to this memory region and also contains two Ints which track relative position and length. This means that many operations on ByteStrings can re-use memory and merely manipulate the two Ints.

Data.ByteString.Lazy is a layer on top which provides the ByteString API, but now the data is only read in chunks of Strict ByteStrings when necessary.

Don Stewart and others have put a great deal of excellent work into the library to ensure that the high-level interface will be optimized into efficient code. However, it may behoove you to inspect the source code for yourself, which is quite accessible and available in the GHC repository under libraries/base/Data/.

For SPOJ purposes, the most common operation is reading some kind of list of Ints. The code for INTEST above demonstrates one simple way to take advantage of ByteString. Using a similar technique may suffice for many problems. However, there are more advanced possibilities which could offer improvements in some cases.

The module is normally imported qualified because it shares many names with standard Prelude functions. I use the prefix SS in my examples, for Strict byteString.

Span and Break

The span and break functions available from ByteString can be used to parse the occasional non-numeric value in input. For example, the following code skips a line:

import qualified Data.ByteString.Char8 as SS
-- ...
skipLine cs = SS.span (== '\n') cs'
  where (_, cs') = SS.break (== '\n') cs

The nice part about span and break is that when it is found in the form span (== c) for all characters c, it is optimized down to a simple memchr.

Lazy ByteStrings

In general, I find that the Strict ByteStrings are more than adequate, and sometimes more complicated usage does not result in better performance. There are cases where Lazy ByteStrings do seem handle better. You can use Lazy ByteStrings as a drop-in replacement for Strict, mostly. But that does not generally confer any advantage. The best part about Lazy ByteStrings is the function called toChunks. This function provides an interface to the lower-level portions which actually operate by reading chunks of data into Strict ByteStrings acting as buffers. The chunks are kept in a lazy list, so each chunk is only read on-demand. However, you are getting the data as efficiently as the library deems possible.

Don Stewart Discusses Chunked ByteString Input



Garbage Collection