Opting for oracles: Difference between revisions

From HaskellWiki
m (Final two sections reorganised)
(Redundant sections replaced with "oracle" reference; other minor changes)
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Oracles, defined ==


An ''oracle'' is a value that can be viewed as having the ability to predict what another value will be. The predicted value is usually the result of a computation involving information which cannot be represented as regular values by the language defining the computation. The oracle, by seeming to contain the prediction, preserves the [[Referential transparency|referential transparency]] of the language being used.
== An adapted example ==


This makes the use of oracles highly relevant to languages intended to preserve referential transparency, such as Haskell.
Based on page 42 of 76 in [http://www.people.cs.uchicago.edu/~soare/Turing/shagrir.pdf Turing-Post Relativized Computability and Interactive Computing]:


== Oracles in use ==
<blockquote>
<b>The Limit Computable or Approximation Model</b>


The early use of oracles in programming languages focused on coping with the <span class="plainlinks">[https://wiki.haskell.org/Category:Nondeterminism nondeterminism]<span> <!-- [[Category:Nondeterminism|nondeterminism]] --> arising from the occurrence of events outside the language - one classic example being ''concurrency''. For a language to support it, it must provide computations some means of allowing multiple sub-computations to each progress at their own rate, depending on the current availability of suitable resources, which can either be:
There exists a sequence of Turing programs <math>\{P_t:t \in T\}</math> so that
<math>P_t</math> computes function <math>g_t</math> at time <math>{t \in T}</math>. There is not
necessarily any connection between different programs and computing may have to
start all over again with a new program as the time changes from <math>t</math> to <math>t+1</math>.


* ''inside'' the language e.g. definitions which the computations share, or
Suppose a meteorologist receives data every second <math>t \in T</math> from weather
stations scattered across the country. The configuration at the meteorologist's
desk may be described using the Shoenfield Limit Lemma by a computable function
where <math>g_t</math> is the computable characteristic function of
<math>B_t</math>, the configuration of the meteorological computation at the end
of time <math>t</math>. The computable function <math>g_t</math> gives an algorithm to
compute the condition <math>B_t</math> at time <math>t</math> but it gives no relationship
between <math>B_t</math> and <math>B_{t+1}</math>. It will not be possible for the
meteorologist to run, let alone write a new program every second. How will the
meteorologist write a program to uniformly compute the index <math>g_t</math>
for <math>{t \in T}</math>?


* ''outside'' of it e.g. hardware devices for storage, audiovisual or networking.
<b>The Online Model With an Oracle Machine</b>


Clearly the language cannot predict e.g. when users of a computer system will be active, so concurrency is generally nondeterministic. For information on how oracles have helped to support various forms of concurrency, see [[Concurrency with oracles]].
By the Shoenfield Limit Lemma there is a computably enumerable set <math>{A}</math> (or even
a <math>\Delta_2^0</math> set) and oracle machine <math>\Phi_e</math> such that
<math>B = \Phi_e^A</math>. Now the meteorologist can program the
algorithm <math>\Phi_e</math> into a computer once and for all at the start of the
day. Every second <math>{t \in T}</math> the meteorologist receives from the weather stations
the latest readings <math>A_t</math> which enter directly into that computer by
an network connection. The meteorologist does not (and cannot) change the
program <math>\Phi_e</math> every second. The algorithm simply receives the
“oracle” information <math>A_t</math> from the weather-station network as it is continually
updated, and computes the approximation <math>B_t(x) = \Phi_e^{A_t}(x)</math>.
The meteorologist's program then produces the next scheduled weather forecast for the day from the
algorithm's result. It is difficult to see how this meteorologist could have
carried out that activity using a batch processing, automatic machine
model, instead of an online model.
</blockquote>


(Of course, the use of oracles goes beyond programming languages e.g. Jennifer Hackett and Graham Hutton use them to alleviate some of the tedium associated with the classic state-centric semantics used to examine the operational behaviour of lazy programs - see [https://www.cs.nott.ac.uk/~pszgmh/clairvoyant.pdf Call-by-Need Is Clairvoyant Call-by-Value].)
...the [[oracle]] is a function of time <math>t \in T</math> with result
<math>A_t</math> that provides input to the meteorologist's program.


== From oracles to ''pseudodata'' ==
=== The problem of choice ===


In his paper [https://academic.oup.com/comjnl/article-pdf/31/3/243/1157325/310243.pdf Nondeterminism with Referential Transparency in Functional Programming Languages], F. Warren Burton illustrates how oracles can be repurposed to make use of other types of outside information, based on the classic example of [[Burton-style nondeterminism|nondeterministic choice]]: from that, Burton shows how to provide the current time, using the example of <tt>timestamps</tt>:
But for the program's output - the forecasts - an element of choice is often
involved:


<haskell>
* the choice of time to send the forecasts to media outlets broadcasting to listeners or viewers - forecasts may need to be sent more frequently to regions experiencing more severe weather conditions.
-- section 2
data Tree a = Node { contents :: a,
                    left    :: Tree a,
                    right    :: Tree a }


-- section 7
* the choice of device - the meteorologist may need a hard copy of the latest forecast (or some part thereof) for research purposes.
data Timestamp  -- abstract, possibly builtin
stamp :: !Timestamp -> Timestamp
compare :: Timestamp -> Timestamp -> Integer -> Bool
</haskell>


He also hints as to how the current size of available storage can also be made available.
Choices can also affect the program itself - the meteorologist may for
some reason decide to intervene while the program is running (perhaps
restarting it to provide more accurate forecasts after software upgrades).


=== Extra parameters and arguments ===
In contrast, output is a simple matter for a function - if there is any,
it all goes to its caller! Therefore interactive choice requires something
more than functions alone; it usually appearing in the form of an abstract
type:


While they acknowledge Burton's technique does maintain referential transparency, in their paper [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.695&rep=rep1&type=pdf On the Expressiveness of Purely Functional I/O Systems] Paul Hudak and Raman S. Sundaresh also raise one possible annoyance - the need to manually disperse subtrees as additional arguments or parameters within programs.
* the various auxiliary types used in dialogues[https://hackage.haskell.org/package/dialogue <span></span>] - text and binary files, channels, <i>etc</i>.  


As it happened, the first hints of a solution was already present when Burton's paper was first published, and now forms part of the standard <code>Prelude</code> for Haskell. Using the [https://downloads.haskell.org/~ghc/7.8.4/docs/html/users_guide/bang-patterns.html bang-patterns] extension:
* the type of the final result returned by a nested sequence of CPS calls[https://dl.acm.org/doi/pdf/10.1145/165180.165199 <span></span>];


<haskell>
* the type of the state passed through all the calls which require it[https://www.cambridge.org/core/services/aop-cambridge-core/content/view/2EFAEBBE3A19EA03A8D6D75A5348E194/S0956796800001258a.pdf/the-ins-and-outs-of-clean-io.pdf <span></span>];
-- for GHC 8.4.3
{-# LANGUAGE BangPatterns, FlexibleInstances, OverlappingInstances #-}


instance Monad ((->) (Tree a)) where
* the monadic, comonadic or arrow types of online actions[https://web.archive.org/web/20051212060309/http://www.cse.ogi.edu/PacSoft/publications/phaseiiiq10papers/codata.pdf <span></span>][https://hackage.haskell.org/package/io-machine <span></span>];
    return x = \(!_) -> x
    m >>= k  = \t -> case m (left t) of !x -> k x (right t)
</haskell>


making use of the fact that the partially-applied function type <code>forall a . (->) a</code> is monadic.
Another option is to use an structured value of unbounded size:


[[Category:Code]]
* lazy lists, for incremental streaming[https://arxiv.org/pdf/1907.11354.pdf <span></span>];
[[Category:Nondeterminism]]
 
* lazy trees, for providing subtrees to various parts of the program[https://dspace.stir.ac.uk/bitstream/1893/12116/1/Harrison%20-%20thesis.pdf <span></span>].
 
 
[[Category:Mathematics]]

Latest revision as of 08:44, 20 December 2024

An adapted example

Based on page 42 of 76 in Turing-Post Relativized Computability and Interactive Computing:

The Limit Computable or Approximation Model

There exists a sequence of Turing programs {Pt:tT} so that Pt computes function gt at time tT. There is not necessarily any connection between different programs and computing may have to start all over again with a new program as the time changes from t to t+1.

Suppose a meteorologist receives data every second tT from weather stations scattered across the country. The configuration at the meteorologist's desk may be described using the Shoenfield Limit Lemma by a computable function where gt is the computable characteristic function of Bt, the configuration of the meteorological computation at the end of time t. The computable function gt gives an algorithm to compute the condition Bt at time t but it gives no relationship between Bt and Bt+1. It will not be possible for the meteorologist to run, let alone write a new program every second. How will the meteorologist write a program to uniformly compute the index gt for tT?

The Online Model With an Oracle Machine

By the Shoenfield Limit Lemma there is a computably enumerable set A (or even a Δ20 set) and oracle machine Φe such that B=ΦeA. Now the meteorologist can program the algorithm Φe into a computer once and for all at the start of the day. Every second tT the meteorologist receives from the weather stations the latest readings At which enter directly into that computer by an network connection. The meteorologist does not (and cannot) change the program Φe every second. The algorithm simply receives the “oracle” information At from the weather-station network as it is continually updated, and computes the approximation Bt(x)=ΦeAt(x). The meteorologist's program then produces the next scheduled weather forecast for the day from the algorithm's result. It is difficult to see how this meteorologist could have carried out that activity using a batch processing, automatic machine model, instead of an online model.

...the oracle is a function of time tT with result At that provides input to the meteorologist's program.

The problem of choice

But for the program's output - the forecasts - an element of choice is often involved:

  • the choice of time to send the forecasts to media outlets broadcasting to listeners or viewers - forecasts may need to be sent more frequently to regions experiencing more severe weather conditions.
  • the choice of device - the meteorologist may need a hard copy of the latest forecast (or some part thereof) for research purposes.

Choices can also affect the program itself - the meteorologist may for some reason decide to intervene while the program is running (perhaps restarting it to provide more accurate forecasts after software upgrades).

In contrast, output is a simple matter for a function - if there is any, it all goes to its caller! Therefore interactive choice requires something more than functions alone; it usually appearing in the form of an abstract type:

  • the various auxiliary types used in dialogues - text and binary files, channels, etc.
  • the type of the final result returned by a nested sequence of CPS calls;
  • the type of the state passed through all the calls which require it;
  • the monadic, comonadic or arrow types of online actions;

Another option is to use an structured value of unbounded size:

  • lazy lists, for incremental streaming;
  • lazy trees, for providing subtrees to various parts of the program.