pseudo-data
In Nondeterminism with Referential Transparency in Functional Programming Languages,
F. Warren Burton introduces the term pseudo-data
to describe an
alternative to the options for allowing nondeterminism that were available at
the time:
Existing constructs for nondeterminism are not functional and are not compatible with the mathematical foundations of functional programming, which require that the value of a function be uniquely determined by the values of its arguments.
We propose to solve this problem by placing the nondeterminism in pseudo-data. A program is passed an infinite tree of two-valued
decisions
, along with its input. Thesedecisions
may be fixed at run time, thereby permitting nondeterminism. Once fixed, adecision
remains unchanged, so equivalent expressions must always have the same value.
one which can be reused for other purposes:
The approach generalizes so that a program can make use of other run-time information such as the current time of the current amount of available storage.
Examples of use
Based on Burton's original example, pseudo-data clocks (trees of times) are used in the semantics of Ruth, an early non-strict realtime functional language. More recently, pseudo-data has been used to safely provide nondeterminism in Single-Assignment C.
Further developments
Since its advent, pseudo-data (or aspects thereof) have appeared, or can be recognised in other contexts:
- In Simon Peyton Jones's The implementation of functional programming languages (section 9.6 on page 188 of 458), Peter Hancock provides a simpler interface for generating new type variables (of type tvname) for a type checker, using the type name_supply:
- next_name :: name_supply -> tvname
- deplete :: name_supply -> name_supply
- split :: name_supply -> (name_supply, name_supply)
- The crucial point here is the absence of trees - they have been reduced to an implementation detail, oblivious to the users of name_supply values.
- In Efficient and Portable Combined Random Number Generators, Pierre L'Ecuyer suggests the disjoint splitting of random numbers into independent subsequences as needed.
- As previously specified, if pseudo-data is used then it remains constant - reusing it doesn't change its value. Lennart Augustsson, Mikael Rittri and Dan Synek take this to an extreme in their functional pearl On generating unique names with their concept implementation for a single-use variant of Hancock's unique-name supply, where each one can only be used once, if at all - from page 4 of 7:
module OneTimeSupplies( Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply) where gensym :: a —> Int —— implemented in assembler data Name = MkName Int deriving (Eq) data NameSupply = MkNameSupply initialNameSupply = MkNameSupply getNameDeplete s = (MkName(gensym(s)), MkNameSupply) splitNameSupply MkNameSupply = (MkNameSupply, MkNameSupply)
- In contrast to the
HideGensym
implementation found on the same page, this monousal strategy completely obviates the need for trees (or other intermediary structured values such as streams).
- Nobuo Yamashita uses a type reminiscent of pseudo-data in his alternative to
IO a
in the oi package: see the Data.OI.Internal module for the details.
A notable difference
While references to pseudo-data frequently mention them:
In [F. W. Burton and Lennart Augustsson's work], the potentially non-referentially transparent features of
amb
and Henderson'smerge
are avoided through the introduction of a new data structure, oracles.
- Programming Reactive Systems in Haskell, Sigbjørn Finne and Simon Peyton Jones (page 11 of 16).
[Burton] proposed to supply each program with an extra argument: an infinite binary tree of values. Whenever the program needs to make a non-deterministic choice, the binary tree is consulted (as a kind of oracle).
- On the Expressiveness of Purely Functional I/O Systems, Paul Hudak and Raman S. Sundaresh (page 15 of 29).
One important alternative approach to the modelling of non-determinism is to make explicit the 'implicit time determinism' by the introduction of timestamps and clock oracles to supply information about the current time. This is discussed by Burton, and is built on by Harrison.
- Functional Programming and Operating Systems, Simon B. Jones and A. F. Sinclair (page 10 of 13).
...oracles, as elaborated by Post and others, are not directly mentioned in either article by Burton or Harrison.
Until it can be formally proven that pseudo-data and oracles are the same concept, the cautious assumption must be that they are not. Likewise for attempting to use pseudo-data with oracle machines.
A simpler choice
An alternative is the choice machine and its external operator - in this computational model, pseudo-data simply conveys the operator's choices (or results thereof) to where they're needed in the program:
[...] for certain applications like updating independent pixels of the framebuffer, printing indexed data or generating debugging output, the order in which the different tasks are performed is not of importance as long as the order of each single task stays intact. Thus, introducing a certain amount of non-determinism, i.e., allowing different I/O tasks to be performed out of order, may not be harmful but considered beneficial.
- Controlling Chaos: On Safe Side-Effects in Data-Parallel Operations, Stephan Herhut, Sven-Bodo Scholz and Clemens Grelck (page 2 of 9).