Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Haskell
Wiki community
Recent changes
Random page
HaskellWiki
Search
Search
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Ru/IO Inside
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Special pages
Page information
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Dark side of IO monad == === unsafePerformIO === Programmers coming from an imperative language background often look for a way to execute IO actions inside a pure procedure. But what does this mean? Imagine that you're trying to write a procedure that reads the contents of a file with a given name, and you try to write it as a pure (non-IO) function: <haskell> readContents :: Filename -> String </haskell> Defining readContents as a pure function will certainly simplify the code that uses it. But it will also create problems for the compiler: # This call is not inserted in a sequence of "world transformations", so the compiler doesn't know at what exact moment you want to execute this action. For example, if the file has one kind of contents at the beginning of the program and another at the end - which contents do you want to see? You have no idea when (or even if) this function is going to get invoked, because Haskell sees this function as pure and feels free to reorder the execution of any or all pure functions as needed. # Attempts to read the contents of files with the same name can be factored (''i.e.'' reduced to a single call) despite the fact that the file (or the current directory) can be changed between calls. Again, Haskell considers all non-IO functions to be pure and feels free to omit multiple calls with the same parameters. So, implementing pure functions that interact with the Real World is considered to be Bad Behavior. Good boys and girls never do it ;) Nevertheless, there are (semi-official) ways to use IO actions inside of pure functions. As you should remember this is prohibited by requiring the RealWorld "baton" in order to call an IO action. Pure functions don't have the baton, but there is a special "magic" procedure that produces this baton from nowhere, uses it to call an IO action and then throws the resulting "world" away! It's a little low-level magic :) This very special (and dangerous) procedure is: <haskell> unsafePerformIO :: IO a -> a </haskell> Let's look at its (possible) definition: <haskell> unsafePerformIO :: (RealWorld -> (a, RealWorld)) -> a unsafePerformIO action = let (a, world1) = action createNewWorld in a </haskell> where 'createNewWorld' is an internal function producing a new value of the RealWorld type. Using unsafePerformIO, you can easily write pure functions that do I/O inside. But don't do this without a real need, and remember to follow this rule: the compiler doesn't know that you are cheating; it still considers each non-IO function to be a pure one. Therefore, all the usual optimization rules can (and will!) be applied to its execution. So you must ensure that: # The result of each call depends only on its arguments. # You don't rely on side-effects of this function, which may be not executed if its results are not needed. Let's investigate this problem more deeply. Function evaluation in Haskell is determined by a value's necessity - the language computes only the values that are really required to calculate the final result. But what does this mean with respect to the 'main' function? To "calculate the final world's" value, you need to perform all the intermediate IO actions that are included in the 'main' chain. By using 'unsafePerformIO' we call IO actions outside of this chain. What guarantee do we have that they will be run at all? None. The only time they will be run is if running them is required to compute the overall function result (which in turn should be required to perform some action in the 'main' chain). This is an example of Haskell's evaluation-by-need strategy. Now you should clearly see the difference: - An IO action inside an IO procedure is guaranteed to execute as long as it is (directly or indirectly) inside the 'main' chain - even when its result isn't used (because the implicit "world" value it returns ''will'' be used). You directly specify the order of the action's execution inside the IO procedure. Data dependencies are simulated via the implicit "world" values that are passed from each IO action to the next. - An IO action inside 'unsafePerformIO' will be performed only if result of this operation is really used. The evaluation order is not guaranteed and you should not rely on it (except when you're sure about whatever data dependencies may exist). I should also say that inside 'unsafePerformIO' call you can organize a small internal chain of IO actions with the help of the same binding operators and/or 'do' syntactic sugar we've seen above. For example, here's a particularly convoluted way to compute the integer that comes after zero: <haskell> one :: Int one = unsafePerformIO $ do var <- newIORef 0 modifyIORef var (+1) readIORef var </haskell> and in this case ALL the operations in this chain will be performed as long as the result of the 'unsafePerformIO' call is needed. To ensure this, the actual 'unsafePerformIO' implementation evaluates the "world" returned by the 'action': <haskell> unsafePerformIO action = let (a,world1) = action createNewWorld in (world1 `seq` a) </haskell> (The 'seq' operation strictly evaluates its first argument before returning the value of the second one). === inlinePerformIO === inlinePerformIO has the same definition as unsafePerformIO but with addition of INLINE pragma: <haskell> -- | Just like unsafePerformIO, but we inline it. Big performance gains as -- it exposes lots of things to further inlining {-# INLINE inlinePerformIO #-} inlinePerformIO action = let (a, world1) = action createNewWorld in (world1 `seq` a) #endif </haskell> Semantically inlinePerformIO = unsafePerformIO in as much as either of those have any semantics at all. The difference of course is that inlinePerformIO is even less safe than unsafePerformIO. While ghc will try not to duplicate or common up different uses of unsafePerformIO, we aggressively inline inlinePerformIO. So you can really only use it where the IO content is really properly pure, like reading from an immutable memory buffer (as in the case of ByteStrings). However things like allocating new buffers should not be done inside inlinePerformIO since that can easily be floated out and performed just once for the whole program, so you end up with many things sharing the same buffer, which would be bad. So the rule of thumb is that IO things wrapped in unsafePerformIO have to be externally pure while with inlinePerformIO it has to be really really pure or it'll all go horribly wrong. That said, here's some really hairy code. This should frighten any pure functional programmer... <haskell> write :: Int -> (Ptr Word8 -> IO ()) -> Put () write !n body = Put $ \c buf@(Buffer fp o u l) -> if n <= l then write' c fp o u l else write' (flushOld c n fp o u) (newBuffer c n) 0 0 0 where {-# NOINLINE write' #-} write' c !fp !o !u !l = -- warning: this is a tad hardcore inlinePerformIO (withForeignPtr fp (\p -> body $! (p `plusPtr` (o+u)))) `seq` c () (Buffer fp o (u+n) (l-n)) </haskell> it's used like: <haskell> word8 w = write 1 (\p -> poke p w) </haskell> This does not adhere to my rule of thumb above. Don't ask exactly why we claim it's safe :-) (and if anyone really wants to know, ask Ross Paterson who did it first in the Builder monoid) === unsafeInterleaveIO === But there is an even stranger operation called 'unsafeInterleaveIO' that gets the "official baton", makes its own pirate copy, and then runs an "illegal" relay-race in parallel with the main one! I can't talk further about its behavior without causing grief and indignation, so it's no surprise that this operation is widely used in countries that are hotbeds of software piracy such as Russia and China! ;) Don't even ask me - I won't say anything more about this dirty trick I use all the time ;) One can use unsafePerformIO (not unsafeInterleaveIO) to perform I/O operations not in predefined order but by demand. For example, the following code: <haskell> do let c = unsafePerformIO getChar do_proc c </haskell> will perform getChar I/O call only when value of c is really required by code, i.e. it this call will be performed lazily as any usual Haskell computation. Now imagine the following code: <haskell> do let s = [unsafePerformIO getChar, unsafePerformIO getChar, unsafePerformIO getChar] do_proc s </haskell> Three chars inside this list will be computed on demand too, and this means that their values will depend on the order they are consumed. It is not that we usually need :) unsafeInterleaveIO solves this problem - it performs I/O only on demand but allows to define exact *internal* execution order for parts of your datastructure. It is why I wrote that unsafeInterleaveIO makes illegal copy of baton :) First, unsafeInterleaveIO has (IO a) action as a parameter and returns value of type 'a': <haskell> do str <- unsafeInterleaveIO myGetContents </haskell> Second, unsafeInterleaveIO don't perform any action immediately, it only creates a box of type 'a' which on requesting this value will perform action specified as a parameter. Third, this action by itself may compute the whole value immediately or... use unsafeInterleaveIO again to defer calculation of some sub-components: <haskell> myGetContents = do c <- getChar s <- unsafeInterleaveIO myGetContents return (c:s) </haskell> This code will be executed only at the moment when value of str is really demanded. In this moment, getChar will be performed (with result assigned to c) and one more lazy IO box will be created - for s. This box again contains link to the myGetContents call Then, list cell returned that contains one char read and link to myGetContents call as a way to compute rest of the list. Only at the moment when next value in list required, this operation will be performed again As a final result, we get inability to read second char in list before first one, but lazy character of reading in whole. bingo! PS: of course, actual code should include EOF checking. also note that you can read many chars/records at each call: <haskell> myGetContents = do c <- replicateM 512 getChar s <- unsafeInterleaveIO myGetContents return (c++s) </haskell>
Summary:
Please note that all contributions to HaskellWiki are considered to be released under simple permissive license (see
HaskellWiki:Copyrights
for details). If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION!
Cancel
Editing help
(opens in new window)
Toggle limited content width