Haskell Quiz/IP to Country/Solution Dolio

From HaskellWiki
Jump to navigation Jump to search

Searching a big CSV file seemed like an ideal use of the famed ByteString library, so I hacked up a quick solution. It uses lazy chunked input for hopefully cache-efficient processing, but deals with the chunks in terms of their strict byte string implementations to also avoid as much overhead as possible (not a particularly hard scheme to set up once you've seen it).

It's fast. Looking up the IP on the quiz page takes roughly 0.04 seconds (as opposed to 0.30 on the reference implementation), and about half a second on an IP that isn't in the database (forcing the entire file to be processed), which seems not too shabby. However, no specs were given for the machine the reference implementation was run on, so the comparisons above are rather worthless. :)

This just processes the raw file downloaded from the website linked in the quiz, and processes it linearly. One could probably devise an optimized version of the database, or a more efficient searching scheme and gain performance, but the naive solution is still plenty fast.

{-# LANGUAGE PatternGuards #-}

module Main(main) where

import Data.Maybe
import System.Environment

import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as L

-- Process a file by line. For each line in the file denoted by
-- the FilePath, the function is called. If the result of the
-- computation is True, processing is cut off early.
-- This uses lazy chunked reading, but operates on the chunks
-- one by one for (hopefully) maximum speed.
processFile :: FilePath -> (B.ByteString -> IO Bool) -> IO ()
processFile path op = proc . L.toChunks =<< L.readFile path
 proc []     = return ()
 proc [c]    = proc' (B.lines c) >> return ()
 proc (c:cc:cs) = do b <- proc' (B.lines c')
                     if b then return () else proc cs'
  where (c', t) = B.breakEnd (=='\n') c
        cs' = B.append t cc : cs
 proc' []     = return False
 proc' (x:xs) = do b <- op x
                   if b then return True else proc' xs

-- Given an ip, represented as a 4-tuple, and a line expected to come
-- from the ip database, determines whether the ip matches. If it does,
-- the corresponding country is printed, and an exit is signaled.
ipSearch :: (Int, Int, Int, Int) -> B.ByteString -> IO Bool
ipSearch (a,b,c,d) s
    | Just (from, to, country) <- parse s,
      from <= ip,
      ip <= to                             = B.putStrLn country >> return True
    | otherwise                            = return False
 ip = d + 256*c + 256*256*b + 256*256*256*a

parse s = case B.split ',' s of
               [f,t,_,_,_,_,c] -> do (from,_) <- B.readInt (B.tail f)
                                     (to,  _) <- B.readInt (B.tail t)
                                     return (from, to, B.tail (B.init c))
               _               -> Nothing

main = do (ips:_) <- getArgs
          processFile "IpToCountry.csv" (ipSearch $ ipParse ips)
 ipParse = convert . B.split '.' . B.pack
 convert [a,b,c,d] = fromJust $ do (a',_) <- B.readInt a
                                   (b',_) <- B.readInt b
                                   (c',_) <- B.readInt c
                                   (d',_) <- B.readInt d
                                   return (a',b',c',d')