IO at work
How do I do I/O in Haskell?
Because of the referential transparency of Haskell, you can't do I/O in Haskell in a way like in impure functional or imperative programming languages.
One example is this: Assume there were a function
readfirstline :: String {- file name -} -> String {- first line of file -}
Calling readfirstline "/foo/bar"
could yield different results at different times while executing the program - for example:
- someone could change the file between the calls;
- it's a special file which seems to change anyway (e.g. terminal devices, named pipes);
- the file no longer exists (e.g. was deleted, network connection lost).
As the Haskell compiler assumes referential transparency (for constant f
and g
, f g
always yields the same result, for some built-in sense of equality), it could transform the program, calling the function more or less often, or earlier or later. Surely, you don't want that for I/O effects.
So, there must be another solution.
The one chosen for Haskell is to keep the I/O of an action (IO
computation) outside of Haskell's expression evaluation by moving it into the surrounding runtime system. The program (the function main
in the module Main) is then one large action - a value which merely describes what the program should do. The runtime evaluates main
and executes the its constituent actions, evaluating functions or other lazy expressions when needed.
This is where the IO
types come into play.
A value of type IO a
describes a computation, potentially with I/O effects, that yields a result of type a
when it's really performed. IO
is an abstract datatype, so you can't directly manipulate its internal structure.
The main
function must have an IO
type. The type parameter (named a
above) is irrelevant since Haskell 98, but was ()
(the trivial type) for earlier versions.
Instead, there are some primitives (functions with no IO
parameters, but with IO
result), that generate IO
values (computations), and there are combinators (functions having one or more IO
values as parameters). The combinators usually have an IO
result. In standard Haskell, there's no combinator having IO
parameters, but no IO
result, except trivial ones (functions which just ignore their IO
parameters).
Now, let's see some examples.
- Printing out a simple message:
putStr :: String -> IO ()
putStr
someString denotes the computation of printing the given string on the program's standard output, and returning a trivial value()
.
- So, one of the simplest programs is:
main = putStr "Hello world\n"
- Other primitive actions are:
getLine :: IO String return :: a -> IO a
getLine
is the computation that, when executed, reads a line from standard input and returns it, with the newline character stripped.
return someValue
is the computation that has no side effects, but just returnssomeValue
, when executed. That is used when you want to combine primitive side-effecting I/O actions with pure computations, processing and/or combining the results of the side-effecting actions into other results. You'll see more aboutreturn
in the examples further down.
- So, in Haskell, the program
main = getLine
- will read a line and discard it (as the result of the
main
action, if there's a non-trivial one, will be silently discarded).
Now, with only those measures, no really interesting programs can be written. You usually must combine more than one primitive action into a whole, doing what you want to do.
And, there they are, some combinators:
(>>) :: IO a -> IO b -> IO b
The computation computation1 >> computation2
will, when executed, first execute computation1
, discard the result, then execute computation2
, and return the result of executing computation2.
And, if you want to carry on results of earlier computations:
(>>=) :: IO a -> (a -> IO b) -> IO b
The computation computation1 >>= (\result1 -> computation2)
will, when executed, execute computation1
, pass the result to the lambda on the right of (>>=)
. That lambda application will yield a computation by pure expression evaluation. That resulting computation will be executed then.
Some examples:
main = putStr "Hello " >> putStr "world\n"
- The same as above, but in two different
putStr
actions.
main = getLine >>= (\inputStr -> putStr ("You typed " ++ inputStr ++ "\n"))
- In fact, you can omit the parentheses, and you can format it differently:
main = getLine >>= \inputStr -> putStr ("You typed " ++ inputStr ++ "\n")
- This reads a line, then echoes it with some text around it.
Now, where does the do
-notation come into play? It's merely syntactic sugar for expressions combined with (>>)
and (>>=)
.
The last two examples read this in the do
-notation:
main = do putStr "Hello " putStr "world\n"
main = do inputStr <- getLine putStr ("You typed " ++ inputStr ++ "\n")
Now, at last, back to the opener: the readfirstline
example. That's also an example for the use of return
.
In fact, that function is easy to define:
readfirstline :: String -> IO String
readfirstline filename =
readFile filename >>= \fileContent ->
return (head (lines fileContent))
Try to understand that for yourself! Because readFile
reads the file lazily, only the first line of the file will be read at all (or some more, because of buffering) rather than the whole file.
The above code would be (in do
-notation):
readfirstline filename = do
fileContent <- readFile filename
return (head(lines fileContent))
What's that IO
type anyway?
Abstract:
The IO
type serves as a tag for operations (actions) that interact with the outside world. The IO
type is abstract: no constructors are visible to the user.
The Haskell 2010 Report (page 95 of 329).
If Haskell's FFI allowed it, IO
could be as simple as:
data IO a
instance Monad IO where
return = unitIO
(>>=) = bindIO
foreign import ccall "primUnitIO" unitIO :: a -> IO a
foreign import ccall "primBindIO" bindIO :: IO a -> (a -> IO b) -> IO b
⋮
Obviously this isn't how GHC defines IO
: that's because prior to Haskell 2010 there was more than one Haskell implementation! That in turn meant there were different definitions of IO
, depending on each implementation (IO inside lists a few of them, if you're interested). The majority of then (including GHC's)
require IO
to be abstract for two reasons:
- to work correctly in Haskell (therefore no access to e.g. Clean's uniqueness types),
- and to hide "boring details" e.g. all those extra state parameters and results GHC's definition relies on.