pseudo-data

In Nondeterminism with Referential Transparency in Functional Programming Languages, F. Warren Burton introduces the term pseudo-data to describe an alternative to the options for allowing nondeterminism that were available at the time:

Existing constructs for nondeterminism are not functional and are not compatible with the mathematical foundations of functional programming, which require that the value of a function be uniquely determined by the values of its arguments.
We propose to solve this problem by placing the nondeterminism in pseudo-data. A program is passed an infinite tree of two-valued decisions, along with its input. These decisions may be fixed at run time, thereby permitting nondeterminism. Once fixed, a decision remains unchanged, so equivalent expressions must always have the same value.

one which can be reused for other purposes:

The approach generalizes so that a program can make use of other run-time information such as the current time of the current amount of available storage.

Examples of use

Based on Burton's original example, pseudo-data clocks (trees of times) are used in the semantics of Ruth, an early non-strict realtime functional language. More recently, pseudo-data has been used to safely provide nondeterminism in Single-Assignment C.

Further developments

Since its advent, pseudo-data (or aspects thereof) have appeared, or can be recognised in other contexts:

In Simon Peyton Jones's The implementation of functional programming languages (section 9.6 on page 188 of 458), Peter Hancock provides a simpler interface for generating new type variables (of type tvname) for a type checker, using the type name_supply:

next_name :: name_supply -> tvname deplete :: name_supply -> name_supply split :: name_supply -> (name_supply, name_supply)

The crucial point here is the absence of trees - they have been reduced to an implementation detail, oblivious to the users of name_supply values.

In Efficient and Portable Combined Random Number Generators, Pierre L'Ecuyer suggests the disjoint splitting of random numbers into independent subsequences as needed.

As previously specified, if pseudo-data is used then it remains constant - reusing it doesn't change its value. Lennart Augustsson, Mikael Rittri and Dan Synek take this to an extreme in their functional pearl On generating unique names with their concept implementation for a single-use variant of Hancock's unique-name supply, where each one can only be used once, if at all - from page 4 of 7:

module OneTimeSupplies(
   Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply)
where
   gensym :: a —> Int  —— implemented in assembler
   data Name = MkName Int deriving (Eq)
   data NameSupply = MkNameSupply
   initialNameSupply = MkNameSupply
   getNameDeplete s = (MkName(gensym(s)), MkNameSupply)
   splitNameSupply MkNameSupply = (MkNameSupply, MkNameSupply)

In contrast to the HideGensym implementation found on the same page, this monousal strategy completely obviates the need for trees (or other intermediary structured values such as streams).

Nobuo Yamashita uses a type reminiscent of pseudo-data in his alternative to IO a in the oi package: see the Data.OI.Internal module for the details.

A notable difference

While references to pseudo-data frequently mention them:

In [F. W. Burton and Lennart Augustsson's work], the potentially non-referentially transparent features of amb and Henderson's merge are avoided through the introduction of a new data structure, oracles.

Programming Reactive Systems in Haskell, Sigbjørn Finne and Simon Peyton Jones (page 11 of 16).

[Burton] proposed to supply each program with an extra argument: an infinite binary tree of values. Whenever the program needs to make a non-deterministic choice, the binary tree is consulted (as a kind of oracle).

On the Expressiveness of Purely Functional I/O Systems, Paul Hudak and Raman S. Sundaresh (page 15 of 29).

One important alternative approach to the modelling of non-determinism is to make explicit the 'implicit time determinism' by the introduction of timestamps and clock oracles to supply information about the current time. This is discussed by Burton, and is built on by Harrison.

Functional Programming and Operating Systems, Simon B. Jones and A. F. Sinclair (page 10 of 13).

...oracles, as elaborated by Post and others, are not directly mentioned in either article by Burton or Harrison.

Until it can be formally proven that pseudo-data and oracles are the same concept, the cautious assumption must be that they are not. Likewise for attempting to use pseudo-data with oracle machines.

A simpler choice

An alternative is the choice machine and its external operator - in this computational model, pseudo-data simply conveys the operator's choices (or results thereof) to where they're needed in the program:

[...] for certain applications like updating independent pixels of the framebuffer, printing indexed data or generating debugging output, the order in which the different tasks are performed is not of importance as long as the order of each single task stays intact. Thus, introducing a certain amount of non-determinism, i.e., allowing different I/O tasks to be performed out of order, may not be harmful but considered beneficial.

Controlling Chaos: On Safe Side-Effects in Data-Parallel Operations, Stephan Herhut, Sven-Bodo Scholz and Clemens Grelck (page 2 of 9).