Opting for oracles

Oracles, defined

An oracle is a value that can be viewed as having the ability to predict what another value will be. The predicted value is usually the result of a computation involving information which cannot be represented as regular values in the computation. The oracle, by seeming to contain the prediction, preserves the referential transparency of the language being used to define the computation.

This makes the use of oracles highly relevant to languages intended to preserve referential transparency, such as Haskell.

Oracles in use

The early use of oracles in programming languages focussed on coping with the nondeterminism arising from the occurrence of events outside the language - one classic example being concurrency. For a language to support it, it must provide some means for allowing multiple computations to each progress at their own rate, depending on the current availability of suitable resources, which can either be:

inside the language e.g. definitions which the computations share, or

outside of it e.g. hardware devices for storage, audiovisual or networking.

Clearly the language cannot predict e.g. when users of a computer system will be active, so concurrency is generally nondeterministic. For information on how oracles have helped to support various forms of concurrency, see Concurrency with oracles.

(Of course, the use of oracles goes beyond programming languages e.g. Jennifer Hackett and Graham Hutton use them to alleviate some of the tedium associated with the classic state-centric semantics used to examine the operational behaviour of lazy programs - see Call-by-Need Is Clairvoyant Call-by-Value.)

From oracles to pseudodata

In his paper Nondeterminism with Referential Transparency in Functional Programming Languages, F. Warren Burton illustrates how oracles can be repurposed to make use of other outside information - starting with decisions for supporting nondeterministic choice:

 -- section 2
data Tree a = Node { contents :: a,
                     left     :: Tree a,
                     right    :: Tree a }

 -- section 3
data Decision  -- abstract, builtin
choice :: Decision -> a -> a -> a

he then shows how the concept of oracles can be expanded to access the current time, using the example of timestamps:

 -- section 7
data Timestamp  -- abstract, possibly builtin
stamp :: !Timestamp -> Timestamp
compare :: Timestamp -> Timestamp -> Integer -> Bool

all while preserving referential transparency. He also hints as to how the current size of available storage can also be made available. For more, see Burton-style nondeterminism.

A simpler interface

Since its advent (sometimes as a result of being inspired by it, or similar entities), an alternate interface has appeared for working with Burton's pseudodata:

In Simon Peyton Jones's book The implementation of functional programming languages (section 9.6 on page 188 of 458), Peter Hancock provides a simple interface for generating new type varibles (of type tvname) for a type checker, using the type name_supply:

|| page 188 of 458 next_name :: name_supply -> tvname deplete :: name_supply -> name_supply split :: name_supply -> (name_supply, name_supply)

In his paper Efficient and Portable Combined Random Number Generators, Pierre L'Ecuyer suggests the disjoint splitting of random numbers into independent subsequences as needed. Burton and Rex L. Page follow this up in Distributed Random Number Generation - from page 9 of 10:

data Random = Seed Int

seed     :: Int -> Random
split    :: Random -> (Random, Random)
generate :: Random -> [Float]

(An assessment of the applicability of Burton's technique for simplifying the the provision of random numbers is also given).

Lennart Augustsson, Mikael Rittri and Dan Synek use Burton's technique to reimplement Hancock's name_supply in their functional pearl On generating unique names, making it practical for regular use. An implementation can be found in Simon Peyton Jones and Jon Launchbury's State in Haskell - using using more-contemporary syntax:

 -- page 39
newUniqueSupply   :: IO UniqueSupply
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
getUnique         :: UniqueSupply -> Unique

data UniqueSupply =  US Unique UniqueSupply UniqueSupply

-- page 40
type Unique       =  Int

newUniqueSupply   =  do uvar <- newIORef 0
                        let incr   :: Int -> (Int, Unique)
                            incr u =  (u+1, u) 

                            next   :: IO Unique
                            next   =  unsafeInterleaveIO $
                                      atomicModifyIORef uvar incr

                            supply :: IO UniqueSupply
                            supply =  unsafeInterleaveIO $
                                      liftM3 US next supply supply

                        supply

splitUniqueSupply (US _ s1 s2) =  (s1, s2)
getUnique (US u _ _) =  u

The crucial point here is that US - the single data constructor for UniqueSupply - can now be kept private. The use of trees has been reduced to an implementation detail, oblivious to the users of UniqueSupply (and Unique) values.

A simpler implementation

Augustsson, Rittri and Synek provide other possible implementations of Hancock's supply in their paper - of particular interest is the monousal one: to preserve referential transparency, each UniqueSupply should only be used once (if at all). This makes for a compact implementation, albeit an irregular one:

abstype uniquesupply with new_uniquesupply :: uniquesupply split_uniquesupply :: uniquesupply -> (uniquesupply, uniquesupply) get_unique :: uniquesupply -> unique uniquesupply ::= US new_uniquesupply = US split_uniquesupply US = (US, US) get_unique s = gensym(s) unique == int || Not a regular definition! gensym :: * -> unique

as shown by the presence of gensym. It should be obvious that a corresponding implementation in regular Haskell isn't possible - non-standard, possibly implementation-specific extensions are required. To provide an implementation here would be more distracting than useful.

A matter of nomenclature

As mentioned earlier, L'Ecuyer suggests the splitting of random numbers be disjoint. However, as this can complicate matters, most RNGs (including Burton and Page's) opt for simpler forms of splitting where the duplication of random numbers is statistically unlikely. It only requires a little imagination to realise the consequences of using a similarly-relaxed approach for supplying fresh identifiers!

To avoid having to repeatedly specify it, an alternate terminology is needed - one which clearly indicates that for some types of pseudodata, the disjointedness of its splitting is mandatory, instead of just being very convenient.