Difference between revisions of "Avoiding IO"
m (Parts of "fine print" transferred from "How to get rid of IO") |
m (Haskell's non-strict semantics mentioned; various other changes) |
||
Line 1: | Line 1: | ||
− | Haskell requires an explicit type for operations involving input and output. |
+ | Haskell requires an explicit type for operations involving input and output. This way it makes a problem explicit, that exists in every language: input and output definitions can have so many effects, that the type signature says more or less that almost everything must be expected. It is hard to test them, because they can in principle depend on every state of the real world. |
+ | |||
− | This way it makes a problem explicit, that exists in every language: |
||
⚫ | |||
− | input and output definitions can have so many effects, that the type signature says more or less that almost everything must be expected. |
||
+ | |||
− | It is hard to test them, because they can in principle depend on every state of the real world. |
||
− | Thus in order to maintain modularity you should avoid I/O wherever possible. |
||
⚫ | |||
== Lazy definition of structured data == |
== Lazy definition of structured data == |
||
− | You can avoid a series of output functions |
+ | You can avoid a series of output functions by constructing a complex data structure with non-I/O code and output it with one output definition. |
− | by constructing a complex data structure with non-I/O code |
||
− | and output it with one output definition. |
||
Instead of |
Instead of |
||
Line 44: | Line 40: | ||
=== Writer === |
=== Writer === |
||
− | If the only reason that you need I/O is to output information (e.g. logging, collecting statistics), a Writer monad might do the job. |
+ | If the only reason that you need I/O is to output information (e.g. logging, collecting statistics), a Writer monad might do the job. This technique works just fine with lazy construction, especially if the lazy object that you need to create is a [[Monoid]]. |
− | This technique works just fine with lazy construction, especially if the lazy object that you need to create is a [[Monoid]]. |
||
An inefficient example of logging: |
An inefficient example of logging: |
||
Line 63: | Line 58: | ||
=== State === |
=== State === |
||
− | If you want to maintain a running state, it is tempting to use <code>IORef</code>. |
+ | If you want to maintain a running state, it is tempting to use <code>IORef</code>. But this is not necessary, since there is the comfortable <code>State</code> monad and its transformer counterpart. |
− | But this is not necessary, since there is the comfortable <code>State</code> monad and its transformer counterpart. |
||
+ | Another example is random number generation. In cases where no real random numbers are required, but only arbitrary numbers, you do not need access to the outside world. You can simply use a pseudo random number generator with an explicit state. This state can be hidden in a State monad. |
||
− | Another example is random number generation. |
||
− | In cases where no real random numbers are required, but only arbitrary numbers, |
||
− | you do not need access to the outside world. |
||
− | You can simply use a pseudo random number generator with an explicit state. |
||
− | This state can be hidden in a State monad. |
||
Example: A definition which computes a random value with respect to a custom distribution (<code>distInv</code> is the inverse of the distribution function) can be defined using I/O |
Example: A definition which computes a random value with respect to a custom distribution (<code>distInv</code> is the inverse of the distribution function) can be defined using I/O |
||
Line 95: | Line 85: | ||
=== ST === |
=== ST === |
||
− | In some cases a state monad is simply not efficient enough. |
+ | In some cases a state monad is simply not efficient enough. Let's say the state is an array and the update operations are modification of single array elements. For this kind of application the [[Monad/ST|State Thread monad]] <code>ST</code> was invented. |
+ | |||
− | Say the state is an array and the update operations are modification of single array elements. |
||
+ | It provides <code>STRef</code> as replacement for <code>IORef</code>, <code>STArray</code> as replacement for <code>IOArray</code>, <code>STUArray</code> as replacement for <code>IOUArray</code>, and you can define new operations in <code>ST</code>. You can escape from <code>ST</code> to non-monadic code in a safe, and in many cases efficient, way e.g. by using <code>runST</code>. |
||
− | For this kind of application the [[Monad/ST|State Thread monad]] <code>ST</code> was invented. |
||
− | It provides <code>STRef</code> as replacement for <code>IORef</code>, |
||
− | <code>STArray</code> as replacement for <code>IOArray</code>, |
||
− | <code>STUArray</code> as replacement for <code>IOUArray</code>, |
||
− | and you can define new operations in <code>ST</code>, but then you need to resort to unsafe operations by using the <code>unsafeIOtoST</code> operation. |
||
− | You can escape from <code>ST</code> to non-monadic code in a safe, and in many cases efficient, way. |
||
== Applicative functor style == |
== Applicative functor style == |
||
Line 137: | Line 122: | ||
== Custom monad-based type class == |
== Custom monad-based type class == |
||
− | If you only use a small set of I/O operations in otherwise non-I/O code |
+ | If you only use a small set of I/O operations in otherwise non-I/O code you may define a custom monad-based type class which implements just these operations. You can then implement them based on I/O for the application and without I/O for the test suite. |
− | you may define a custom monad-based type class which implements just these operations. |
||
− | You can then implement them based on I/O for the application and without I/O for the test suite. |
||
As an example consider the operation |
As an example consider the operation |
||
Line 147: | Line 130: | ||
</haskell> |
</haskell> |
||
− | which converts an English phrase to the currently configured user language of the system. |
+ | which converts an English phrase to the currently configured user language of the system. You can abstract the <code>IO</code> type away using |
− | You can abstract the <code>IO</code> type away using |
||
<haskell> |
<haskell> |
||
Line 161: | Line 143: | ||
</haskell> |
</haskell> |
||
− | where the first instance can be used for the application and the second one for "dry" tests. |
+ | where the first instance can be used for the application and the second one for "dry" tests. For more sophisticated tests, you may load a dictionary into a <code>Map</code> and use this for translation. |
− | For more sophisticated tests, you may load a dictionary into a <code>Map</code> and use this for translation. |
||
<haskell> |
<haskell> |
||
Line 222: | Line 203: | ||
== The last resort == |
== The last resort == |
||
− | The method of last resort is <code>unsafePerformIO</code>. If you're writing something as complex as e.g. |
+ | The method of last resort is <code>unsafePerformIO</code>. If you're writing something as complex as e.g. an [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.155.967&rep=rep1&type=pdf operating system] or a [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.9123&rep=rep1&type=pdf webserver], <b>sometimes</b> it's necessary. But for any other task, your safest option is to avoid using it altogether. |
− | So if you absolutely have to use <code>unsafePerformIO</code>, think very carefully about how to minimise its use and how you can encapsulate it in a library with a well-chosen interface. |
+ | So if you absolutely have to use <code>unsafePerformIO</code>, think very carefully about how to minimise its use and how you can encapsulate it in a library with a well-chosen interface. Because Haskell's non-strict semantics makes the direct use of observable effects thoroughly impractical, you must only use it with <code>IO</code> actions that behave like pure Haskell functions. For example, it must not be used to hide file-access operations, whereas careful memory manipulation (as used in the <code>Data.ByteString</code> module) may be safe. But always remember: your Haskell implementation <i>cannot</i> check if you're using it correctly - that is your responsibility. |
=== Using the FFI instead === |
=== Using the FFI instead === |
Revision as of 06:45, 6 April 2022
Haskell requires an explicit type for operations involving input and output. This way it makes a problem explicit, that exists in every language: input and output definitions can have so many effects, that the type signature says more or less that almost everything must be expected. It is hard to test them, because they can in principle depend on every state of the real world.
Thus in order to maintain modularity you should avoid I/O wherever possible. It is too tempting to unsafely disguise the use of I/O with unsafePerformIO
, so here instead are some clean techniques to avoid I/O.
Lazy definition of structured data
You can avoid a series of output functions by constructing a complex data structure with non-I/O code and output it with one output definition.
Instead of
-- import Control.Monad (replicateM_)
replicateM_ 10 (putStr "foo")
you can also create the complete string and output it with one call of putStr
:
putStr (concat $ replicate 10 "foo")
Similarly,
do
h <- openFile "foo" WriteMode
replicateM_ 10 (hPutStr h "bar")
hClose h
can be shortened to
writeFile "foo" (concat $ replicate 10 "bar")
which also ensures proper closing of the handle h
in case of failure.
Since you have now an expression for the complete result as string, you have a simple object that can be re-used in other contexts. For example, you can also easily compute the length of the written string using length
without bothering the file system, again.
Use simpler monadic types
It may be possible to use a simpler, more specific type than IO
for certain tasks.
Writer
If the only reason that you need I/O is to output information (e.g. logging, collecting statistics), a Writer monad might do the job. This technique works just fine with lazy construction, especially if the lazy object that you need to create is a Monoid.
An inefficient example of logging:
logText :: (MonadWriter String m) => String -> m ()
logText text = tell (text ++ "\n")
do
logText "Before operation A"
opA
logText "After operation A"
(This is "inefficient", because String
means [Char]
, tell
"writes" to the "end" of the log using mappend
, and code
for lists (i.e. (++)
) is O(n), where n is the length of the left-hand list (i.e. the log). In other words, the bigger the log gets, the slower logging becomes. To avoid this, you should generally use a type that has O(1) mappend
, such as Data.Sequence
, and fold
the complete log (using Foldable) afterwards if you need to.)
State
If you want to maintain a running state, it is tempting to use IORef
. But this is not necessary, since there is the comfortable State
monad and its transformer counterpart.
Another example is random number generation. In cases where no real random numbers are required, but only arbitrary numbers, you do not need access to the outside world. You can simply use a pseudo random number generator with an explicit state. This state can be hidden in a State monad.
Example: A definition which computes a random value with respect to a custom distribution (distInv
is the inverse of the distribution function) can be defined using I/O
randomDist :: (Random a, Num a) => (a -> a) -> IO a
randomDist distInv = liftM distInv (randomRIO (0,1))
but there is no need to do so.
You don't need the state of the whole world just for remembering the state of a random number generator, instead you can use something similar to this:
randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a
randomDist distInv = liftM distInv (State (randomR (0,1)))
You can get actual values by running the State
as follows:
evalState (randomDist distInv) (mkStdGen an_arbitrary_seed)
ST
In some cases a state monad is simply not efficient enough. Let's say the state is an array and the update operations are modification of single array elements. For this kind of application the State Thread monad ST
was invented.
It provides STRef
as replacement for IORef
, STArray
as replacement for IOArray
, STUArray
as replacement for IOUArray
, and you can define new operations in ST
. You can escape from ST
to non-monadic code in a safe, and in many cases efficient, way e.g. by using runST
.
Applicative functor style
Say you have written the operation
translate :: String -> IO String
translate word =
do dict <- readDictionary "english-german.dict"
return (Map.findWithDefault word word dict)
You can only use this operation within the I/O monad, and it is not very efficient either, since for every translation the dictionary must be read from disk. You can rewrite this operation in a way that it generates a non-monadic function that can be used anywhere.
makeTranslator :: IO (String -> String)
makeTranslator =
do dict <- readDictionary "english-german.dict"
return (\word -> Map.findWithDefault word word dict)
main :: IO ()
main =
do translate <- makeTranslator
putStr (unlines (map translate ["foo", "bar"]))
I call this Applicative Functor style because you can use the application operator from Control.Applicative
:
makeTranslator <*> getLine
Custom monad-based type class
If you only use a small set of I/O operations in otherwise non-I/O code you may define a custom monad-based type class which implements just these operations. You can then implement them based on I/O for the application and without I/O for the test suite.
As an example consider the operation
localeTextIO :: String -> IO String
which converts an English phrase to the currently configured user language of the system. You can abstract the IO
type away using
class Monad m => Locale m where
localeText :: String -> m String
instance Locale IO where
localeText = localeTextIO
instance Locale Identity where
localeText = Identity
where the first instance can be used for the application and the second one for "dry" tests. For more sophisticated tests, you may load a dictionary into a Map
and use this for translation.
newtype Interpreter a = Interpreter (Reader (Map String String) a)
instance Locale Interpreter where
localeText text = Interpreter $ fmap (Map.findWithDefault text text) ask
Pseudodata: structured data with embedded effects
If the set of required effects is small, F. Warren Burton's pseudodata technique can be used to access them via abstract values in a larger structured value e.g. a (theoretically) infinite tree. Pure selector functions can then be used to:
- access new parts of the structured value;
- retrieve the abstract, effect-bearing values from individual parts.
Using Burton's terminology, the effects only occur when the abstract values are initially used by special functions. Once used, the abstract values remain constant - reusing them has no further effect. Burton briefly explains how using the abstract values in this way ensures referential transparency is preserved, even in the presence of the effects they carry.
For Haskell, external effects typically implies the monadic IO
type, so there will usually be at least one I/O operation to build the initial structured value, and the abstract values therein. An example is the simple unique-value supply from State in Haskell by John Launchbury and Simon Peyton Jones - while the also-derided unsafeInterleaveIO
is involved:
- its use is confined to the I/O operation defining the structured value:
newUniqueSupply
; - the presence of other I/O actions (e.g. the
atomicModifyIORef
call) is still reflected in the type of the defining operation:IO UniqueSupply
.
From pages 39-40 of Launchbury and Peyton-Jones's paper, using more-contemporary syntax:
-- unique-supply interface
--
newUniqueSupply :: IO UniqueSupply
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
getUnique :: UniqueSupply -> Unique
data UniqueSupply = US Unique UniqueSupply UniqueSupply
-- ...and implementation
--
type Unique = Int
newUniqueSupply = do uvar <- newIORef 0
let incr :: Int -> (Int, Unique)
incr u = (u+1, u)
next :: IO Unique
next = unsafeInterleaveIO $
atomicModifyIORef uvar incr
supply :: IO UniqueSupply
supply = unsafeInterleaveIO $
liftM3 US next supply supply
supply
splitUniqueSupply (US _ s1 s2) = (s1, s2)
getUnique (US u _ _) = u
- With due care, this particular example can even be given an alternate interface in the style of
ST
: see Luke Palmer's IO-free splittable supply.
For more information on, and other examples of Burton's technique, see Nondeterminism with Referential Transparency in Functional Programming Languages.
The last resort
The method of last resort is unsafePerformIO
. If you're writing something as complex as e.g. an operating system or a webserver, sometimes it's necessary. But for any other task, your safest option is to avoid using it altogether.
So if you absolutely have to use unsafePerformIO
, think very carefully about how to minimise its use and how you can encapsulate it in a library with a well-chosen interface. Because Haskell's non-strict semantics makes the direct use of observable effects thoroughly impractical, you must only use it with IO
actions that behave like pure Haskell functions. For example, it must not be used to hide file-access operations, whereas careful memory manipulation (as used in the Data.ByteString
module) may be safe. But always remember: your Haskell implementation cannot check if you're using it correctly - that is your responsibility.
Using the FFI instead
Depending on the code being called, changing the type used in a FFI declaration can be another option:
foreign import ccall "noObservableEffects" shouldBeSafe :: Int -> Int
Just remember: this should only be used for foreign code which behaves like pure Haskell functions. It merely avoids needing to use unsafePerformIO
:
foreign import ccall "noObservableEffects" shouldBeSafeButinIO :: Int -> IO Int
shouldBeSafe :: Int -> Int
shouldBeSafe n = unsafePerformIO (shouldBeSafeButinIO n)
...it will not make the foreign call "safe".