Difference between revisions of "Opting for oracles"

From HaskellWiki
Jump to navigation Jump to search
(Initial contents, with some crossover)
 
m (Extra reference added)
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
<div style="border-left:1px solid lightgray; padding: 1em" alt="blockquote">
  +
LML had a fully fledged implementation of oracles running of a multi-processor (a Sequent Symmetry) back in ca 1989. The description in [https://cth.altocumulus.org/~hallgren/Thesis/toc.html the Fudgets thesis] refers to this implementation. It was fairly pleasant to work with and quite practical.
  +
  +
<tt>[http://lambda-the-ultimate.org/node/1665#comment-2033 Lennart Augustsson.]</tt>
  +
</div>
  +
<sup> </sup>
  +
__TOC__
  +
 
== Oracles, defined ==
 
== Oracles, defined ==
   
An ''oracle'' is a value that can be viewed as having the ability to predict what another value will be. The predicted value is usually the result of a computation involving information which cannot be represented as regular values in the computation. The oracle, by seeming to contain the prediction, preserves the [[Referential transparency|referential transparency]] of the language being used to define the computation.
+
An ''oracle'' is a value that can be viewed as having the ability to predict what another value will be. The predicted value is usually the result of a computation involving information which cannot be represented as regular values by the language defining the computation. The oracle, by seeming to contain the prediction, limits the effect on the semantics of the language being used.
   
 
This makes the use of oracles highly relevant to languages intended to preserve referential transparency, such as Haskell.
 
This makes the use of oracles highly relevant to languages intended to preserve referential transparency, such as Haskell.
Line 7: Line 15:
 
== Oracles in use ==
 
== Oracles in use ==
   
The early use of oracles in programming languages focussed on coping with the <span class="plainlinks">[https://wiki.haskell.org/Category:Nondeterminism nondeterminism]<span> <!-- [[Category:Nondeterminism|nondeterminism]] --> arising from the occurrence of events outside the language - one classic example being ''concurrency''. For a language to support it, it must provide computations some means of allowing multiple computations to each progress at their own rate, depending on the current availability of suitable resources, which can either be:
+
The early use of oracles in programming languages focused on coping with the <span class="plainlinks">[https://wiki.haskell.org/Category:Nondeterminism nondeterminism]<span> <!-- [[Category:Nondeterminism|nondeterminism]] --> arising from the occurrence of events outside the language - one classic example being ''concurrency''. For a language to support it, it must provide computations some means of allowing multiple sub-computations to each progress at their own rate, depending on the current availability of suitable resources, which can either be:
   
 
* ''inside'' the language e.g. definitions which the computations share, or
 
* ''inside'' the language e.g. definitions which the computations share, or
Line 15: Line 23:
 
Clearly the language cannot predict e.g. when users of a computer system will be active, so concurrency is generally nondeterministic. For information on how oracles have helped to support various forms of concurrency, see [[Concurrency with oracles]].
 
Clearly the language cannot predict e.g. when users of a computer system will be active, so concurrency is generally nondeterministic. For information on how oracles have helped to support various forms of concurrency, see [[Concurrency with oracles]].
   
  +
The use of oracles goes beyond programming languages:
(Of course, the use of oracles goes beyond programming languages e.g. Jennifer Hackett and Graham Hutton use them to alleviate some of the tedium associated with the classic state-centric semantics used to examine the operational behaviour of lazy programs - see [https://www.cs.nott.ac.uk/~pszgmh/clairvoyant.pdf Call-by-Need Is Clairvoyant Call-by-Value].)
 
  +
  +
* Jennifer Hackett and Graham Hutton use them as the basis for an alternate semantics (rather than the more common stateful effects) in [https://www.cs.nott.ac.uk/~pszgmh/clairvoyant.pdf Call-by-Need Is Clairvoyant Call-by-Value].
  +
  +
* A brief mention appears in Conor McBride's representation of general recursion for use with total functional languages in [https://strathprints.strath.ac.uk/60166/1/McBride_LNCS2015_Turing_completeness_totally_free.pdf Turing-Completeness Totally Free].
  +
  +
* They have also been used in [https://www.cs.kent.ac.uk/people/staff/sao/documents/esop16.pdf functional big-step semantics] for supporting languages with I/O, in addition to nondeterminism.
   
 
== From oracles to ''pseudodata'' ==
 
== From oracles to ''pseudodata'' ==
   
In his paper [https://academic.oup.com/comjnl/article-pdf/31/3/243/1157325/310243.pdf Nondeterminism with Referential Transparency in Functional Programming Languages], F. Warren Burton illustrates how oracles can be repurposed to make use of other outside information - starting with <tt>decisions</tt> for supporting nondeterministic choice:
+
In his paper [https://academic.oup.com/comjnl/article-pdf/31/3/243/1157325/310243.pdf Nondeterminism with Referential Transparency in Functional Programming Languages], F. Warren Burton illustrates how oracle streams can be repurposed to convey other types of outside information, based on the classic example of [[Burton-style nondeterminism|nondeterministic choice]] - from that, Burton shows how this ''pseudodata'' can provide <tt>timestamps</tt> based on the current time:
   
 
<haskell>
 
<haskell>
Line 27: Line 41:
 
right :: Tree a }
 
right :: Tree a }
   
-- section 3
 
data Decision -- abstract, builtin
 
choice :: Decision -> a -> a -> a
 
</haskell>
 
 
he then shows how the concept of oracles can be expanded to access the current time, using the example of <tt>timestamps</tt>:
 
 
<haskell>
 
 
-- section 7
 
-- section 7
 
data Timestamp -- abstract, possibly builtin
 
data Timestamp -- abstract, possibly builtin
Line 41: Line 47:
 
</haskell>
 
</haskell>
   
all while preserving referential transparency. He also hints as to how the current size of available storage can also be made available. For more, see [[Burton-style nondeterminism]].
+
He also hints as to how the current size of available storage can be made available.
   
  +
Practical examples of ''pseudodata'' include:
== A simpler interface ==
 
   
  +
* '''clocks''' - in his thesis [https://core.ac.uk/download/9835633.pdf Functional Real-Time Programming: The Language Ruth And Its Semantics], Dave Harrison uses ''pseudodata'' to provide processes in a program with the current time.
Since its advent (sometimes as a result of being inspired by it, or similar entities), an alternate interface has appeared for working with Burton's ''pseudodata'':
 
 
* In Simon Peyton Jones's book [[Books|The implementation of functional programming languages]] (section 9.6 on page 188 of 458), Peter Hancock provides a simple interface for generating new type varibles (of type <tt>tvname</tt>) for a type checker, using the type <tt>name_supply</tt>:
 
   
  +
* '''supplies''' - In their functional pearl [https://www.cambridge.org/core/services/aop-cambridge-core/content/view/763DE73EB4761FDF681A613BE0E98443/S0956796800000988a.pdf/functional_pearl_on_generating_unique_names.pdf On generating unique names], Lennart Augustsson, Mikael Rittri and Dan Synek show how ''pseudodata'' can make Peter Hancock's <tt>name_supply</tt> viable for regular use. A small implementation by Iavor Diatchki [https://hackage.haskell.org/package/value-supply is available] in [https://hackage.haskell.org Hackage].
<tt>
 
::|| page 188 of 458
 
::next_name :: name_supply -> tvname
 
::deplete :: name_supply -> name_supply
 
::split :: name_supply -> (name_supply, name_supply)
 
</tt>
 
   
  +
=== Extra parameters and arguments ===
* In his paper [https://www.iro.umontreal.ca/~lecuyer/myftp/papers/cacm88.pdf Efficient and Portable Combined Random Number Generators], Pierre L'Ecuyer suggests the ''disjoint'' splitting of random numbers into independent subsequences as needed. Burton and Rex L. Page follow this up in [https://www.cs.ou.edu/~rlpage/burtonpagerngjfp.pdf Distributed Random Number Generation] - from page 9 of 10:
 
   
  +
While they recognise its preservation of referential transparency, in their paper [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.1260&rep=rep1&type=pdf Programming Reactive Systems in Haskell], Sigbjørn Finne and Simon Peyton Jones also raise one possible annoyance with Burton's approach - the need to manually disperse ''pseudodata'' as additional arguments or parameters within programs.
:<haskell>
 
data Random = Seed Int
 
   
  +
As it happened, the first hints of a solution was already present when Burton's paper was first published, and now forms part of the standard <code>Prelude</code> for Haskell. Using the [https://downloads.haskell.org/~ghc/7.8.4/docs/html/users_guide/bang-patterns.html bang-patterns] extension:
seed :: Int -> Random
 
split :: Random -> (Random, Random)
 
generate :: Random -> [Float]
 
</haskell>
 
   
  +
<haskell>
:(An assessment of the applicability of Burton's technique for simplifying the the provision of random numbers is also given).
 
  +
-- for GHC 8.4.3
  +
{-# LANGUAGE BangPatterns, FlexibleInstances, OverlappingInstances #-}
   
  +
instance Monad ((->) (Tree a)) where
* Lennart Augustsson, Mikael Rittri and Dan Synek use Burton's technique to reimplement Hancock's <tt>name_supply</tt> in their functional pearl [[Research papers/Functional pearls|On generating unique names]], making it practical for regular use. An implementation can be found in Simon Peyton Jones and Jon Launchbury's [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.3656&rep=rep1&type=pdf State in Haskell] - using using more-contemporary syntax:
 
  +
return x = \(!_) -> x
 
  +
m >>= k = \t -> case m (left t) of !x -> k x (right t)
:<haskell>
 
-- page 39
 
newUniqueSupply :: IO UniqueSupply
 
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
 
getUnique :: UniqueSupply -> Unique
 
 
data UniqueSupply = US Unique UniqueSupply UniqueSupply
 
 
-- page 40
 
type Unique = Int
 
 
newUniqueSupply = do uvar <- newIORef 0
 
let incr :: Int -> (Int, Unique)
 
incr u = (u+1, u)
 
 
next :: IO Unique
 
next = unsafeInterleaveIO $
 
atomicModifyIORef uvar incr
 
 
supply :: IO UniqueSupply
 
supply = unsafeInterleaveIO $
 
liftM3 US next supply supply
 
 
supply
 
 
splitUniqueSupply (US _ s1 s2) = (s1, s2)
 
getUnique (US u _ _) = u
 
 
</haskell>
 
</haskell>
   
  +
making use of the fact that the partially-applied function type <code>forall a . (->) a</code> is monadic.
:The crucial point here is that <code>US</code> - the single data constructor for <code>UniqueSupply</code> - can now be kept private. The use of trees has been reduced to an implementation detail, oblivious to the users of <code>UniqueSupply</code> (and <code>Unique</code>) values.
 
 
== A simpler implementation ==
 
 
Augustsson, Rittri and Synek provide other possible implementations of Hancock's supply in their paper - of particular interest is the ''monousal'' one: to preserve referential transparency, each <code>UniqueSupply</code> should only be used ''once'' (if at all). This makes for a compact implementation, albeit an irregular one:
 
 
<tt>
 
:abstype uniquesupply
 
:with
 
::new_uniquesupply :: uniquesupply
 
::split_uniquesupply :: uniquesupply -> (uniquesupply, uniquesupply)
 
::get_unique :: uniquesupply -> unique
 
 
:uniquesupply ::= US
 
 
:new_uniquesupply = US
 
:split_uniquesupply US = (US, US)
 
:get_unique s = gensym(s)
 
 
:unique == int
 
 
:|| Not a regular definition!
 
:gensym :: * -> unique
 
</tt>
 
 
as shown by the presence of <tt>gensym</tt>. It should be obvious that a corresponding implementation in regular Haskell isn't possible - non-standard, possibly implementation-specific extensions are required.
 
To provide an implementation here would be more distracting than useful.
 
 
== A matter of nomenclature ==
 
 
As mentioned earlier, L'Ecuyer suggests the splitting of random numbers be ''disjoint''. However, as this can complicate matters, most RNGs (including Burton and Page's) opt for simpler forms of splitting where the duplication of random numbers is statistically unlikely. It only requires a little imagination to realise the consequences of using a similarly-relaxed approach for supplying fresh identifiers!
 
 
To avoid having to repeatedly specify it, an alternate terminology is needed - one which clearly indicates that for some types of pseudodata, the ''disjointedness'' of its splitting is '''mandatory''', instead of just being very convenient.
 
   
 
[[Category:Code]]
 
[[Category:Code]]

Revision as of 02:43, 29 July 2022

LML had a fully fledged implementation of oracles running of a multi-processor (a Sequent Symmetry) back in ca 1989. The description in the Fudgets thesis refers to this implementation. It was fairly pleasant to work with and quite practical.

Lennart Augustsson.

Oracles, defined

An oracle is a value that can be viewed as having the ability to predict what another value will be. The predicted value is usually the result of a computation involving information which cannot be represented as regular values by the language defining the computation. The oracle, by seeming to contain the prediction, limits the effect on the semantics of the language being used.

This makes the use of oracles highly relevant to languages intended to preserve referential transparency, such as Haskell.

Oracles in use

The early use of oracles in programming languages focused on coping with the nondeterminism arising from the occurrence of events outside the language - one classic example being concurrency. For a language to support it, it must provide computations some means of allowing multiple sub-computations to each progress at their own rate, depending on the current availability of suitable resources, which can either be:

  • inside the language e.g. definitions which the computations share, or
  • outside of it e.g. hardware devices for storage, audiovisual or networking.

Clearly the language cannot predict e.g. when users of a computer system will be active, so concurrency is generally nondeterministic. For information on how oracles have helped to support various forms of concurrency, see Concurrency with oracles.

The use of oracles goes beyond programming languages:

From oracles to pseudodata

In his paper Nondeterminism with Referential Transparency in Functional Programming Languages, F. Warren Burton illustrates how oracle streams can be repurposed to convey other types of outside information, based on the classic example of nondeterministic choice - from that, Burton shows how this pseudodata can provide timestamps based on the current time:

 -- section 2
data Tree a = Node { contents :: a,
                     left     :: Tree a,
                     right    :: Tree a }

 -- section 7
data Timestamp  -- abstract, possibly builtin
stamp :: !Timestamp -> Timestamp
compare :: Timestamp -> Timestamp -> Integer -> Bool

He also hints as to how the current size of available storage can be made available.

Practical examples of pseudodata include:

  • supplies - In their functional pearl On generating unique names, Lennart Augustsson, Mikael Rittri and Dan Synek show how pseudodata can make Peter Hancock's name_supply viable for regular use. A small implementation by Iavor Diatchki is available in Hackage.

Extra parameters and arguments

While they recognise its preservation of referential transparency, in their paper Programming Reactive Systems in Haskell, Sigbjørn Finne and Simon Peyton Jones also raise one possible annoyance with Burton's approach - the need to manually disperse pseudodata as additional arguments or parameters within programs.

As it happened, the first hints of a solution was already present when Burton's paper was first published, and now forms part of the standard Prelude for Haskell. Using the bang-patterns extension:

 -- for GHC 8.4.3
{-# LANGUAGE BangPatterns, FlexibleInstances, OverlappingInstances #-}

instance Monad ((->) (Tree a)) where
    return x = \(!_) -> x
    m >>= k  = \t -> case m (left t) of !x -> k x (right t)

making use of the fact that the partially-applied function type forall a . (->) a is monadic.