Editing Ru/IO Inside (section)

== Dark side of IO monad ==
=== unsafePerformIO ===

Programmers coming from an imperative language background often look for a way to execute IO actions inside a pure procedure. But what does this mean?
Imagine that you're trying to write a procedure that reads the contents of a file with a given name, and you try to write it as a pure (non-IO) function:

<haskell>
readContents :: Filename -> String
</haskell>

Defining readContents as a pure function will certainly simplify the code that uses it. But it will also create problems for the compiler:

# This call is not inserted in a sequence of "world transformations", so the compiler doesn't know at what exact moment you want to execute this action. For example, if the file has one kind of contents at the beginning of the program and another at the end - which contents do you want to see?  You have no idea when (or even if) this function is going to get invoked, because Haskell sees this function as pure and feels free to reorder the execution of any or all pure functions as needed.
# Attempts to read the contents of files with the same name can be factored (''i.e.'' reduced to a single call) despite the fact that the file (or the current directory) can be changed between calls. Again, Haskell considers all non-IO functions to be pure and feels free to omit multiple calls with the same parameters.

So, implementing pure functions that interact with the Real World is
considered to be Bad Behavior. Good boys and girls never do it ;)


Nevertheless, there are (semi-official) ways to use IO actions inside
of pure functions. As you should remember this is prohibited by
requiring the RealWorld "baton" in order to call an IO action. Pure functions don't have the baton, but there is a special "magic" procedure that produces this baton from nowhere, uses it to call an IO action and then throws the resulting "world" away!  It's a little low-level magic :)  This very special (and dangerous) procedure is:

<haskell>
unsafePerformIO :: IO a -> a
</haskell>

Let's look at its (possible) definition:

<haskell>
unsafePerformIO :: (RealWorld -> (a, RealWorld)) -> a
unsafePerformIO action = let (a, world1) = action createNewWorld
                         in a
</haskell>

where 'createNewWorld' is an internal function producing a new value of
the RealWorld type.

Using unsafePerformIO, you can easily write pure functions that do
I/O inside. But don't do this without a real need, and remember to
follow this rule: the compiler doesn't know that you are cheating; it still
considers each non-IO function to be a pure one. Therefore, all the usual
optimization rules can (and will!) be applied to its execution. So
you must ensure that:

# The result of each call depends only on its arguments.
# You don't rely on side-effects of this function, which may be not executed if its results are not needed.


Let's investigate this problem more deeply. Function evaluation in Haskell
is determined by a value's necessity - the language computes only the values that are really required to calculate the final result. But what does this mean with respect to the 'main' function?  To "calculate the final world's" value, you need to perform all the intermediate IO actions that are included in the 'main' chain. By using 'unsafePerformIO' we call IO actions outside of this chain.  What guarantee do we have that they will be run at all? None. The only time they will be run is if running them is required to compute the overall function result (which in turn should be required to perform some action in the
'main' chain). This is an example of Haskell's evaluation-by-need strategy. Now you should clearly see the difference:

- An IO action inside an IO procedure is guaranteed to execute as long as
it is (directly or indirectly) inside the 'main' chain - even when its result isn't used (because the implicit "world" value it returns ''will'' be used). You directly specify the order of the action's execution inside the IO procedure. Data dependencies are simulated via the implicit "world" values that are passed from each IO action to the next.

- An IO action inside 'unsafePerformIO' will be performed only if
result of this operation is really used. The evaluation order is not
guaranteed and you should not rely on it (except when you're sure about
whatever data dependencies may exist).


I should also say that inside 'unsafePerformIO' call you can organize
a small internal chain of IO actions with the help of the same binding
operators and/or 'do' syntactic sugar we've seen above.  For example, here's a particularly convoluted way to compute the integer that comes after zero:

<haskell>
one :: Int
one = unsafePerformIO $ do var <- newIORef 0
                           modifyIORef var (+1)
                           readIORef var
</haskell>

and in this case ALL the operations in this chain will be performed as
long as the result of the 'unsafePerformIO' call is needed. To ensure this,
the actual 'unsafePerformIO' implementation evaluates the "world" returned
by the 'action':

<haskell>
unsafePerformIO action = let (a,world1) = action createNewWorld
                         in (world1 `seq` a)
</haskell>

(The 'seq' operation strictly evaluates its first argument before
returning the value of the second one).


=== inlinePerformIO ===

inlinePerformIO has the same definition as unsafePerformIO but with addition of INLINE pragma:
<haskell>
-- | Just like unsafePerformIO, but we inline it. Big performance gains as
-- it exposes lots of things to further inlining
{-# INLINE inlinePerformIO #-}
inlinePerformIO action = let (a, world1) = action createNewWorld
                         in (world1 `seq` a)
#endif
</haskell>

Semantically inlinePerformIO = unsafePerformIO
in as much as either of those have any semantics at all.

The difference of course is that inlinePerformIO is even less safe than
unsafePerformIO. While ghc will try not to duplicate or common up
different uses of unsafePerformIO, we aggressively inline
inlinePerformIO. So you can really only use it where the IO content is
really properly pure, like reading from an immutable memory buffer (as
in the case of ByteStrings). However things like allocating new buffers
should not be done inside inlinePerformIO since that can easily be
floated out and performed just once for the whole program, so you end up
with many things sharing the same buffer, which would be bad.

So the rule of thumb is that IO things wrapped in unsafePerformIO have
to be externally pure while with inlinePerformIO it has to be really
really pure or it'll all go horribly wrong.

That said, here's some really hairy code. This should frighten any pure
functional programmer...

<haskell>
write :: Int -> (Ptr Word8 -> IO ()) -> Put ()
write !n body = Put $ \c buf@(Buffer fp o u l) ->
  if n <= l
    then write' c fp o u l
    else write' (flushOld c n fp o u) (newBuffer c n) 0 0 0

  where {-# NOINLINE write' #-}
        write' c !fp !o !u !l =
          -- warning: this is a tad hardcore
          inlinePerformIO
            (withForeignPtr fp
              (\p -> body $! (p `plusPtr` (o+u))))
          `seq` c () (Buffer fp o (u+n) (l-n))
</haskell>

it's used like:
<haskell>
word8 w = write 1 (\p -> poke p w)
</haskell>

This does not adhere to my rule of thumb above. Don't ask exactly why we
claim it's safe :-) (and if anyone really wants to know, ask Ross
Paterson who did it first in the Builder monoid)

=== unsafeInterleaveIO ===

But there is an even stranger operation called 'unsafeInterleaveIO' that
gets the "official baton", makes its own pirate copy, and then runs
an "illegal" relay-race in parallel with the main one! I can't talk further
about its behavior without causing grief and indignation, so it's no surprise
that this operation is widely used in countries that are hotbeds of software piracy such as Russia and China! ;)  Don't even ask me - I won't say anything more about this dirty trick I use all the time ;)

One can use unsafePerformIO (not unsafeInterleaveIO) to perform I/O
operations not in predefined order but by demand. For example, the
following code:

<haskell>
do let c = unsafePerformIO getChar
   do_proc c
</haskell>

will perform getChar I/O call only when value of c is really required
by code, i.e. it this call will be performed lazily as any usual
Haskell computation.

Now imagine the following code:

<haskell>
do let s = [unsafePerformIO getChar, unsafePerformIO getChar, unsafePerformIO getChar]
   do_proc s
</haskell>

Three chars inside this list will be computed on demand too, and this
means that their values will depend on the order they are consumed. It
is not that we usually need :)


unsafeInterleaveIO solves this problem - it performs I/O only on
demand but allows to define exact *internal* execution order for parts
of your datastructure. It is why I wrote that unsafeInterleaveIO makes
illegal copy of baton :)

First, unsafeInterleaveIO has (IO a) action as a parameter and returns
value of type 'a':

<haskell>
do str <- unsafeInterleaveIO myGetContents
</haskell>

Second, unsafeInterleaveIO don't perform any action immediately, it
only creates a box of type 'a' which on requesting this value will
perform action specified as a parameter.

Third, this action by itself may compute the whole value immediately
or... use unsafeInterleaveIO again to defer calculation of some
sub-components:

<haskell>
myGetContents = do
   c <- getChar
   s <- unsafeInterleaveIO myGetContents
   return (c:s)
</haskell>

This code will be executed only at the moment when value of str is
really demanded. In this moment, getChar will be performed (with
result assigned to c) and one more lazy IO box will be created - for s.
This box again contains link to the myGetContents call

Then, list cell returned that contains one char read and link to
myGetContents call as a way to compute rest of the list. Only at the
moment when next value in list required, this operation will be
performed again

As a final result, we get inability to read second char in list before
first one, but lazy character of reading in whole. bingo!


PS: of course, actual code should include EOF checking. also note that
you can read many chars/records at each call:

<haskell>
myGetContents = do
   c <- replicateM 512 getChar
   s <- unsafeInterleaveIO myGetContents
   return (c++s)
</haskell>