Latest revision as of 06:34, 2 April 2009

Object Equality for Lazy Languages

Why we need this

By Ahn, Ki Yung

Lazy languages like Haskell have a good excuse for not having object equality (or physical equality): the equality test is a hack for forcing evaluation. We can make use of object equality in Haskell if we can provide a resonable alternative for forcing evaluation; it would be even better if we can automatially decide when to evaluate. Therefore, we suggest we should search for a evaluation strategy that will dismiss the use of equality test as forcing evaluation.

Using equality test for forcing evaluation is against the philosophy of Haskell. Haskell has only one equality operator unlike Unlike Scheme or ML, which is to be simple and clean. But we have assigned strange operational semantics on equality. This is also problematic for partial evaluation; we cannot replace x==x with True.

The functional programming style of persistent data structures tend to generate shared objects frequently. Functional languages lacking object equality cannot have the full-benefit of sharing since they cannot optimize equality with object equality. Functional languages that cannot use object equality are giving up the optimization opportunity on the best-fitting domain. It is a common idiom to check object equality before structural equality. Object equality implies structural equality, unless we are dealing with non-deterministic objects. (There are rare exceptions such as OCaml nan, which is in my opinion a design flaw)

Therefore, our goal is to develop an evaluation strategy that intelligently forces evaluation when it is sure that forcing evaluation will not take too long and save space as well. For this we would need both static analysis and evaluation strategy at runtime. When we rely less on operational semantics of equality, we can have better optimization opportunities such as using object equality in equality tests.

Problems in optimizing equality

It is a common idiom to benefit from physical equality in C++.

A common idiom of C++ copy ctor, assignment and equality:

class C
{
...
   C(const C& c)
   {
       if (*this == &c) return; // the same object!! nothing to do
       // do the job if not the same object
   }

   C& operator = (const C& c)
   {
      if (*this == &c) return; // the same object!! nothing to do
      // do the job if not the same object
   }

   bool operator == (const C& c)
   {
      if (*this == &c) return true; // the same object!! nothing to do
      // do the job if not the same object
   }
...
};

However, haskell being a lazy langauge it has concerns that strict languages would not have. Let === be the physical equality. Even though === implies == except for bottom, there are some situations that we cannot by default optimize equality with physical equality. It is because haskell programmer use reflexive equality test as a common idom for deep evaluation.

Analyzing haskell insertion sort algorithm

These examples came up doing one of the homework in CS584 Algorithms (2006 Spring) class at Portland State University, lectured by Mark P. Jones. I choose haskell to implement sorting but memory leaked I added the counter. Thanks to Mark P. Jones for mentioning that equalities are used for forcing evaluation.

Insertion sort

    inssrt [] = []
    inssrt (x:xs) = ins x $ inssrt xs
      where ins z [] = [z]
            ins z l@(y:ys) | z <= y    = z : l
                           | otherwise = y : ins z ys

We test for the worst case when the length of the input is 1000 and it works fine.

Main> inssrt [1000,999..1]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
... ... ,988,989,990,991,992,993,994,995,996,997,998,999,1000]

Insertion sort with counter that leaks memory

    inssort p@(_,[]) = p
    inssort (n,x:xs) = ins x $ inssort (n,xs)
      where ins z (n,[]) = (n,[z])
            ins z (n,l@(y:ys))
              | z <= y    = (n+1, z:l)
              | otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)

We test for the worst case when the length of the input is 1000 and it crashes because of memory leak due to constructing lengthy expression $1+1+1+\cdots $.

Main> inssort (0,[1000,999..1])
(Segmentation fault

Insertion sort with counter patched by obscure equality

    inssort' p@(_,[]) = p
    inssort' (n,x:xs) = ins x $ inssort' (n,xs)
      where ins z (n,[]) = (n,[z])
            ins z (n,l@(y:ys))
                | n==n && z <= y    = (n+1, z:l)
                | n==n && otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)

We test for the worst case when the length of the input is 1000 and it works because of the equality check forces evaluation of the counter value.

Main> inssort' (0,[1000,999..1])
(499500,[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,
... ... ,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000])

Implementing physical equality in Haskell

I tried Hugs Version: 20050308 and GHC version 6.4.2 but only GHC worked. In Hugs, StablePtr a is not an instance of Eq.

implementation:

import Foreign
import System.IO.Unsafe

x === y = unsafePerformIO $
  do
    px <- newStablePtr x
    py <- newStablePtr y
    let ret = px == py
    freeStablePtr px
    freeStablePtr py
    return ret

usage:

data S a = S a

s = S (S [1..])
s2 = S (S [1..])
s3 = s

   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.4.2, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base-1.0 ... linking ... done.
Compiling Main             ( Tst.hs, interpreted )
Ok, modules loaded: Main.
*Main> s === s
True
*Main> s === s2
False
*Main> s === s3
True

This implementation works and concurrncy or garbage collection does not affect the result. But it is very slow since newStablePtr, freeStablePtr operations are expensive because it changes the state of the garbage collector. Cheap operation that just peeks the address is needed for the implementation. I think the best way is to provide a primitive operator for this.

Caveat

It should be noted that an equality operator like (===) above breaks referential transparency, so adding it to Haskell would ruin many of the nice properties that Haskell has and would make many program transformations invalid.

Related Papers and Websites

Observable Sharing http://citeseer.ist.psu.edu/claessen99observable.html
Hash Consing http://citeseer.ist.psu.edu/goubault94implementing.html

@@ Line 1: / Line 1: @@
+= Object Equality for Lazy Languages =
-__TOC__
 == Why we need this ==
+By Ahn, Ki Yung
-<pre>
-\paragraph{ }
-\title{Object Equality and Evaluation Strategy of Lazy Lagnuages}
-\author{Ahn, Ki Yung}
 Lazy languages like Haskell have a good excuse for
@@ Line 17: / Line 14: @@
 test as forcing evaluation.
+Using equality test for forcing evaluation is against the philosophy of Haskell.
-\paragraph{ }
+Haskell has only one equality operator unlike Unlike Scheme or ML,
-Using equality test for forcing evaluation is agints design purpose of Haskell.
-Hakell has only one equality operator unlike Unlike Scheme or ML,
 which is to be simple and clean. But we have assigned strange operational
 semantics on equality. This is also problematic for partial evaluation;
-we cannot replace \verb|x==x| to \verb|True|.
+we cannot replace ''x==x'' with ''True''.
+The functional programming style of persistent data structures tend to
-\paragraph{ }
-Functional programming style of persistant data structure tend to
 generate shared objects frequently. Functional languages lacking
-object equality cannot have full-benefit of sharing
+object equality cannot have the full-benefit of sharing
-since they cannont optimize equality with object equality.
+since they cannot optimize equality with object equality.
-Functional languages that cannot utilize object equality is
+Functional languages that cannot use object equality are
 giving up the optimization opportunity on the best-fitting domain.
 It is a common idiom to check object equality before structural equality.
 Object equality implies structural equality,
 unless we are dealing with non-deterministic objects.
-\footnote{There are rare exceptions such as OCaml \texttt{nan}, which is in my opinion a design flaw.}
+(There are rare exceptions such as OCaml ''nan'', which is in my opinion a design flaw)
+Therefore, our goal is to develop an evaluation strategy that intelligently
-\paragraph{ }
-Therefore, our goal is to develope a evaluation strategy that intelligently
 forces evaluation when it is sure that forcing evaluation will not take too
 long and save space as well. For this we would need both static analysis and
@@ Line 44: / Line 38: @@
 we can have better optimization opportunities
 such as using object equality in equality tests.
-</pre>
 == Problems in optimizing equality ==
-It is a common idiom to benifit from physical equalitin in C++.
+It is a common idiom to benefit from physical equality in C++.
+A common idiom of C++ copy ctor, assignment and equality:
-Common idom of C++ copy ctor, assignment and equality:
 <pre>
 class C
@@ Line 75: / Line 69: @@
 };
 </pre>
 However, haskell being a lazy langauge it has concerns that strict languages would not have.
@@ Line 84: / Line 77: @@
 It is because haskell programmer use reflexive equality test as
 a common idom for deep evaluation.
 === Analyzing haskell insertion sort algorithm ===
 These examples came up doing  one of the homework
 in CS584 Algorithms (2006 Spring) class at Portland State University,
@@ Line 94: / Line 87: @@
 equalities are used for forcing evaluation.
-\paragraph{Insertion sort}
+==== Insertion sort ====
 <haskell>
     inssrt [] = []
@@ Line 103: / Line 97: @@
 </haskell>
 We test for the worst case when the length of the input is 1000 and it works fine.
 <pre>
 Main> inssrt [1000,999..1]
@@ Line 109: / Line 104: @@
 </pre>
-\paragraph{Insertion sort with counter that leaks memory}
+==== Insertion sort with counter that leaks memory ====
 <haskell>
     inssort p@(_,[]) = p
@@ Line 118: / Line 114: @@
               | otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)
 </haskell>
 We test for the worst case when the length of the input is 1000 and it crashes
-because of memory leak due to constructing lengthy expression $1+1+1+\cdots $.
+because of [[memory leak]] due to constructing lengthy expression $1+1+1+\cdots $.
 <pre>
 Main> inssort (0,[1000,999..1])
@@ Line 125: / Line 122: @@
 </pre>
-\paragraph{Insertion sort with counter patched by obscure equality}
+==== Insertion sort with counter patched by obscure equality ====
 <haskell>
     inssort' p@(_,[]) = p
@@ Line 134: / Line 132: @@
                 | n==n && otherwise = (n', y:ys') where (n',ys') = ins z (n+1,ys)
 </haskell>
 We test for the worst case when the length of the input is 1000 and it works
 because of the equality check forces evaluation of the counter value.
@@ Line 142: / Line 141: @@
 </pre>
+== Implementing physical equality in Haskell ==
-== Implementation ==
 I tried Hugs Version: 20050308 and GHC version 6.4.2 but only GHC worked.
@@ Line 168: / Line 167: @@
 s = S (S [1..])
 s2 = S (S [1..])
+s3 = s
 </haskell>
 <pre>
@@ Line 183: / Line 183: @@
 *Main> s === s2
 False
+*Main> s === s3
+True
 </pre>
+This implementation works and concurrncy or garbage collection does not affect the result.
+But it is very slow since newStablePtr, freeStablePtr operations are expensive because it changes the state of the [[garbage collector]].
+Cheap operation that just peeks the address is needed for the implementation.
+I think the best way is to provide a primitive operator for this.
+== Caveat ==
+It should be noted that an equality operator like (===) above breaks referential transparency, so adding it to Haskell would ruin many of the nice properties that Haskell has and would make many program transformations invalid.
 == Related Papers and Websites ==
- * Observable Sharing http://citeseer.ist.psu.edu/claessen99observable.html
+* Observable Sharing http://citeseer.ist.psu.edu/claessen99observable.html
- * Hash Consing http://citeseer.ist.psu.edu/goubault94implementing.html
+* Hash Consing http://citeseer.ist.psu.edu/goubault94implementing.html

Difference between revisions of "Physical equality"