Difference between revisions of "Non-strict semantics"

Latest revision as of 13:55, 15 December 2019

An expression language is said to have non-strict semantics if expressions can have a value even if some of their subexpressions do not. Haskell is one of the few modern languages to have non-strict semantics by default: nearly every other language has strict semantics, in which if any subexpression fails to have a value, the whole expression fails with it.

This is one of the most important features in Haskell: it is what allows programs to work with conceptually infinite data structures, and it is why people say that Haskell lets you write your own control structures. It's also one of the motivations behind Haskell being a pure language (though there are several other good ones).

What?

Any sufficiently capable programming language is non-total, which is to say you can write expressions that do not produce a value: common examples are an exception thrown, an infinite loop, or unproductive recursion, e.g. the following definition in Haskell:

noreturn :: Integer -> Integer
noreturn x = negate (noreturn x)

or the following Python function:

def noreturn(x):
    while True:
        x = -x

    return x # not reached

both fail to produce a value when executed. We say that noreturn x is undefined, and write noreturn x = ⊥.

In Python the following expression to check if 2 is in some list:

2 in [2,4,noreturn(5)]

also fails to have a value, because in order to construct the list, the interpreter tries to work out noreturn(5), which of course doesn't return a value. This is called innermost-first evaluation: in order to call a function with some arguments, you first have to calculate what all the arguments are, starting from the innermost function call and working outwards. The result is that Python is strict, in the sense that calling any function with an undefined argument produces an undefined value, i.e. f(⊥) = ⊥. If your language uses innermost-first evaluation, it correspondingly must have strict semantics.

In Haskell, an analogous expression:

elem 2 [2, 4, noreturn 5]

in fact has the value True. The program does not have to compute noreturn 5 because it is irrelevant to the overall value of the computation: only the values that are necessary to the result need be computed. This is called outermost-first evaluation because you first look at the outermost function call, elem, to see if it needs to use its arguments, and only if it does do you look at what those arguments are. This means that you can write a function that doesn't look at its argument, so it will return a value even if the argument is ⊥. Such functions are not strict, i.e. they satisfy f(⊥) ≠ ⊥. Practically, this means that Haskell functions need not completely compute their arguments before using them, which is why e.g. take 3 [1..] can produce [1,2,3] even though it is given a conceptually infinite list.

Note that semantics is, pedantically speaking, just about which expressions have a value, and what the value is, not how you figure it out. Outermost-first evaluation is not the only way to have non-strict semantics: a speculative evaluation strategy, that evaluates arguments in parallel with the function in case they are needed later, could also be non-strict, as long as whenever the speculative evaluation failed, the evaluation of the function continued.

Note also that according to the f(⊥) ≠ ⊥ definition, lots of seemingly-nonstrict functions are actually strict, e.g. elem 2 from the above example! However, strictness can be more nuanced than that: e.g. we can say that elem 2 is non-strict in the part of the list following the first 2, so will return a non-⊥ answer to elem 2 (2 : ⊥). We might say that a function is "strict in the spine of the list" (like length, which will ignore ⊥ in the elements of the list) or "strict in the second component of the tuple (like... snd, I guess).

Why?

To correct a common misconception about non-strict semantics, it is not a performance feature. Non-strict semantics allows your language to only evaluate the things it needs to, but if you write your programs carefully, you'll only compute what is absolutely necessary anyway, so the extra time your program spends working out what should and shouldn't be evaluated is time wasted. For this reason, a very well-optimised strict program will frequently outperform even the fastest non-strict program.

However, the real and major advantage that non-strictness gives you over strict languages is you get to write cleaner and more composable code. In particular, you can separate production and consumption of data: don't know how many prime numbers you're going to need? Just make `primes` a list of all prime numbers, and then which ones actually get generated depends on how you use them in the rest of your code. By contrast, writing code in a strict language that constructs a data structure in response to demand usually will require careful intertwining or explicit management of control flow between the producer and consumer.

Consider the following Haskell function definition:

any :: (a -> Bool) -> [a] -> Bool
any p = or . map p

Here, map p replaces each element of the list with a boolean value representing whether or not that element satisfied p, then or checks if any of the booleans were True. Overall, then, any p xs tells you whether or not p x is True for any x in xs.

Naively, it seems like this would be inefficient: first map processes the whole list, and then or finds any Trues – but if the very first item of the list satisfies p, then you really didn't need to map over all the others.

But in a non-strict context, even if both or and map are written completely naïvely, when or gets to the first True it stops asking for any more booleans, so map doesn't need to produce any more of them, and none of the rest of the list is visited.

But that's so weird!

Not really! In non-strict languages you typically have evaluation driven by need, whereas in strict languages you have evaluation driven by function application. But functions are already for abstraction, so they end up serving a sort of dual purpose; meanwhile ordinary values can't really be used for abstraction, except if you know you're going to use their value at least once. If you don't, you have to wrap your value in a function that doesn't take any arguments, or in certain type systems where that doesn't make sense as a concept, you have to use a function that takes a single, boring argument, that it then ignores. You then have to duplicate the work if you want to use it twice, or else write some sort of caching, probably using mutable variables. On top of all that, you decide that function application isn't even the only method of driving evaluation, because you also need if-statements, loops, and other control structures that you have to bake right into the fabric of your language.

In a strict langauge, to get the short-circuiting behaviour of any described in the previous section, you'd have little choice but to write out the whole recursion explicitly:

any p [] = False
any p (x:xs)
  | p x = True
  | otherwise = any p xs

since in strict languages only builtin control structures can decide whether some bit of code gets executed or not, ordinary functions like or can't. You essentially duplicate the code of map iterating over the list and applying a function, and or folding the list with a binary operation.

Meanwhile, in Haskell, functions are precisely for abstraction with parameters, and for abstraction without parameters, ordinary values suffice, whether you end up using them or not. All code, inside or outside functions, gets run when you need it and doesn't when you don't. You can easily write control structures as ordinary code:

ifThenElse :: Bool -> a -> a -> a
ifThenElse True  x _ = x
ifThenElse False _ y = y

and this allows all sorts of interesting patterns to be abstracted in an incredibly lightweight fashion. Labelled for-loops are a library in Haskell, rather than requiring special syntax and language support.

How do I stop it?

As mentioned above, non-strictness can hurt performance, e.g. if a result is definitely going to be needed later, you might as well evaluate it now, to avoid having to hold on to all the data that goes into it. Fortunately, the Haskell designers were aware of these problems and introduced a loophole or two so that we could force our programs to be strict when necessary: see Performance/Strictness and seq.

@@ Line 1: / Line 1: @@
+An expression language is said to have [[non-strict semantics]] if expressions can have a value even if some of their subexpressions do not. Haskell is one of the few modern languages to have non-strict semantics by default: nearly every other language has [[strict semantics]], in which if any subexpression fails to have a value, the whole expression fails with it.
-'''Non-strict semantics''' means that a function can have a definite value although its argument is undefined.
-E.g. in Haskell you get
+This is one of the most important features in Haskell: it is what allows programs to work with conceptually infinite data structures, and it is why people say that Haskell lets you write your own control structures. It's also one of the motivations behind Haskell being a [[pure]] language (though there are several other good ones).
+== What? ==
+Any sufficiently capable programming language is ''non-total'', which is to say you can write expressions that do not produce a value: common examples are an exception thrown, an infinite loop, or unproductive recursion, e.g. the following definition in Haskell:
 <haskell>
+noreturn :: Integer -> Integer
-Prelude> True || undefined
+noreturn x = negate (noreturn x)
-True
 </haskell>
+or the following Python function:
-You will not be able to define a function <code>or</code> say in C which returns something if you pass an undefined value (e.g. one that is the result of an infinite loop). In fact, in <code>or(true,infinite_loop())</code>, the code of <code>or</code> will never be run. In Haskell it is possible because you [[Call by demand]].
+ def noreturn(x):
+     while True:
+         x = -x
+     return x # not reached
+both fail to produce a value when executed. We say that <tt>noreturn x</tt> is undefined, and write <tt>noreturn x = [[Bottom|⊥]]</tt>.
-Non-strict semantics allows to formally process infinite amount of data.
-Actually any terminating computation must only consume a finite amount of data,
+In Python the following expression to check if <tt>2</tt> is in some list:
-but surprisingly there are many applications with formally infinite data
-where all partial computations only depend on a finite amount of data.
+in [2,4,noreturn(5)]
-Why is this related to non-strict semantics?
-Consider the list <hask>1:2:undefined</hask>.
+also fails to have a value, because in order to construct the list, the interpreter tries to work out <tt>noreturn(5)</tt>, which of course doesn't return a value. This is called '''innermost-first''' evaluation: in order to call a function with some arguments, you first have to calculate what all the arguments are, starting from the innermost function call and working outwards. The result is that Python is ''strict'', in the sense that calling any function with an undefined argument produces an undefined value, i.e. <tt>f(⊥) = ⊥</tt>. If your language uses innermost-first evaluation, it correspondingly must have strict semantics.
-In the strict semantics this would be equivalent to <hask>undefined</hask>,
-because an undefined sub-expression implies an undefined expression.
+In Haskell, an analogous expression:
-In the non-strict semantrics <hask>1:2:undefined</hask> and <hask>undefined</hask> are very different.
-You can process the start of the list (that is 1 and 2) safely, as long as you do not touch the <hask>undefined</hask>.
-If you do not touch that part of the list, you don't care what it actually is, it might also be some infinite data stream.
-In the strict semantics an infinite stream is an infinite loop and thus undefined.
-What does this mean in practice?
-Consider the processing of an audio stream:
-In principle it can be infinitely long, but e.g. for amplifying the signal only current data is necessary.
-Non-strict semantics ensures that only the needed audio data is fetched,
-and the [[garbage collector]] cares about freeing data that is no longer needed.
-The nice thing is that you can separate logical processing steps but the data is processed as it is requested.
-E.g. you may write
 <haskell>
+elem 2 [2, 4, noreturn 5]
-fadedMusic = envelope (fadeIn ++ hold ++ fadeOut) music
 </haskell>
-that is, you have the steps
-* composing the fader control
-* fetching the music signal
-* amplify the music according to the fader envelope.
+in fact has the value <tt>True</tt>. The program does not have to compute <tt>noreturn 5</tt> because it is irrelevant to the overall value of the computation: only the values that are necessary to the result need be computed. This is called '''outermost-first''' evaluation because you first look at the outermost function call, <tt>elem</tt>, to see if it needs to use its arguments, and only if it does do you look at what those arguments are. This means that you can write a function that doesn't look at its argument, so it will return a value even if the argument is <tt>⊥</tt>. Such functions are ''not strict'', i.e. they satisfy <tt>f(⊥) ≠ ⊥</tt>. Practically, this means that Haskell functions need not completely compute their arguments before using them, which is why e.g. <tt>take 3 [1..]</tt> can produce <tt>[1,2,3]</tt> even though it is given a conceptually infinite list.
-But the program is processed in the order of data:
-For the first emitted sample, the first sample of music and the first sample of the envelope are computed and then multiplied.
+Note that ''semantics'' is, pedantically speaking, just about which expressions have a value, and what the value is, not how you figure it out. Outermost-first evaluation is not the only way to have non-strict semantics: a speculative evaluation strategy, that evaluates arguments in parallel with the function in case they are needed later, could also be non-strict, as long as whenever the speculative evaluation failed, the evaluation of the function continued.
-In much the same way you can process video sequences, decision trees, power series, continued fractions, real numbers.
-Also for finite but huge, or finite but still not completely known data, the non-strict semantics allows elegant programs.
-You can program the processing of an XML tree by a sequence of transformations of the whole XML tree.
-If the single transformations are carefully programmed then you can process a (serial) XML file from the beginning to the end
-without having to store the whole document in the memory at some time
-and without even knowing when and whether the document ends,
-and whether there will be some syntax error later in the file.
+Note also that according to the <tt>f(⊥) ≠ ⊥</tt> definition, lots of seemingly-nonstrict functions are actually strict, e.g. <tt>elem 2</tt> from the above example! However, strictness can be more nuanced than that: e.g. we can say that <tt>elem 2</tt> is non-strict in the part of the list following the first 2, so will return a non-⊥ answer to <tt>elem 2 (2 : ⊥)</tt>. We might say that a function is "strict in the spine of the list" (like <tt>length</tt>, which will ignore ⊥ in the elements of the list) or "strict in the second component of the tuple (like... <tt>snd</tt>, I guess).
-Although Haskell only requires the non-strict semantics all of its compilers use [[lazy evaluation]],
-which is only one possible implementation of the non-strict semantics.
+== Why? ==
+To correct a common misconception about non-strict semantics, it is not a performance feature. Non-strict semantics allows your language to only evaluate the things it needs to, but if you write your programs carefully, you'll only compute what is absolutely necessary ''anyway'', so the extra time your program spends working out what should and shouldn't be evaluated is time wasted. For this reason, a very well-optimised strict program will frequently outperform even the fastest non-strict program.
+However, the real and major advantage that non-strictness gives you over strict languages is you get to write cleaner and more composable code. In particular, you can separate ''production'' and ''consumption'' of data: don't know how many prime numbers you're going to need? Just make `primes` a list of ''all'' prime numbers, and then which ones actually get ''generated'' depends on how you use them in the rest of your code. By contrast, writing code in a strict language that constructs a data structure in response to demand usually will require careful intertwining or explicit management of control flow between the producer and consumer.
+Consider the following Haskell function definition:
+<haskell>
+any :: (a -> Bool) -> [a] -> Bool
+any p = or . map p
+</haskell>
+Here, <tt>map p</tt> replaces each element of the list with a boolean value representing whether or not that element satisfied <tt>p</tt>, then <tt>or</tt> checks if any of the booleans were <tt>True</tt>. Overall, then, <tt>any p xs</tt> tells you whether or not <tt>p x</tt> is <tt>True</tt> for any <tt>x</tt> in <tt>xs</tt>.
+Naively, it seems like this would be inefficient: first <tt>map</tt> processes the whole list, and then <tt>or</tt> finds any <tt>True</tt>s – but if the very first item of the list satisfies <tt>p</tt>, then you really didn't need to map over all the others.
+But in a non-strict context, even if both <tt>or</tt> and <tt>map</tt> are written completely naïvely, when <tt>or</tt> gets to the first <tt>True</tt> it stops asking for any more booleans, so <tt>map</tt> doesn't need to produce any more of them, and none of the rest of the list is visited.
+=== But that's so weird! ===
+Not really! In non-strict languages you typically have evaluation driven by need, whereas in strict languages you have evaluation driven by function application. But functions are already for abstraction, so they end up serving a sort of dual purpose; meanwhile ordinary values can't really be used for abstraction, except if you know you're going to use their value at least once. If you don't, you have to wrap your value in a function that doesn't take any arguments, or in certain type systems where that doesn't make sense as a concept, you have to use a function that takes a single, boring argument, that it then ignores. You then have to duplicate the work if you want to use it twice, or else write some sort of caching, probably using mutable variables. On top of all that, you decide that function application isn't even the only method of driving evaluation, because you also need if-statements, loops, and other control structures that you have to bake right into the fabric of your language.
+In a strict langauge, to get the short-circuiting behaviour of <tt>any</tt> described in the previous section, you'd have little choice but to write out the whole recursion explicitly:
+<haskell>
+any p [] = False
+any p (x:xs)
+  | p x = True
+  | otherwise = any p xs
+</haskell>
+since in strict languages only builtin control structures can decide whether some bit of code gets executed or not, ordinary functions like <tt>or</tt> can't. You essentially duplicate the code of <tt>map</tt> iterating over the list and applying a function, and <tt>or</tt> folding the list with a binary operation.
+Meanwhile, in Haskell, functions are precisely for abstraction with parameters, and for abstraction without parameters, ordinary values suffice, whether you end up using them or not. All code, inside or outside functions, gets run when you need it and doesn't when you don't. You can easily write control structures as ordinary code:
+<haskell>
+ifThenElse :: Bool -> a -> a -> a
+ifThenElse True  x _ = x
+ifThenElse False _ y = y
+</haskell>
+and this allows all sorts of interesting patterns to be abstracted in an incredibly lightweight fashion. Labelled for-loops are a ''library'' in Haskell, rather than requiring special syntax and language support.
-==See also==
+== How do I stop it? ==
-* Jonathan Cast in Haskell Cafe about [http://www.haskell.org/pipermail/haskell-cafe/2007-November/034807.html What is the role of $! ?]
+As mentioned above, non-strictness can hurt performance, e.g. if a result is definitely going to be needed later, you might as well evaluate it now, to avoid having to hold on to all the data that goes into it. Fortunately, the Haskell designers were aware of these problems and introduced a loophole or two so that we could force our programs to be strict when necessary: see [[Performance/Strictness]] and [[seq]].
-* [[Lazy vs. non-strict]]
 [[Category:Glossary]]

Difference between revisions of "Non-strict semantics"

Latest revision as of 13:55, 15 December 2019

Contents

What?

Why?

But that's so weird!

How do I stop it?

Navigation menu

Search