Difference between revisions of "Applications and libraries/Generic programming/Smash"

Latest revision as of 00:59, 30 July 2007

Approach: Smash your boilerplate

Required features/Portability

The type-level typecase/TypeEq requires overlapping and undecidable instances. The use of these extensions is limited to the library only. The end user code needs no such extensions.
No need for Typeable, nor higher-rank types

Expressibility

Can do both producer and consumer functions: e.g., gsize (consumer), replacing all Floats with Double (producer: the return type is determined by the original term's type/structure). The return type is computed by the typechecker rather than has to be specified by the user. Pure producers (e.g., gminimum) are supported too.
Local redefinitions are possible, see Discussion.

Subset of data types covered

Whatever LDat instances are declared
All primitive and algebraic types are supported. Function spaces. No Dynamic types. Polymorphic types can be handled, via typeclass proxies.

Usage

Library Writer: writes the core library (e.g., STApply class) and declares the LDat class. This is done once and for all.
Power User: (new datatype or a new traversal strategy): defines LDat instances
User: (new generic function): defines a new generic function by combining, in a HList, functions processing values of specific data types, with an appropriate generic function (cf. `everywhere') that does generic traversal. Polymorphism (when specific functions are polymorphic) requires special handling (see below).
End User: apply the generic function like an ordinary function

Error Messages

Can be hard to grasp??

Amount of work per data type (Boilerplate)

instances of LDat, which are similar to Functor and could be automatically derived from the definition of the data type.

Extensibility

Full

Reasoning

TODO ??? what does that mean?

Performance considerations

Although that is hidden from the end user, the library heavily relies on the typeclasses. Thus the efficiency of the typeclass implementation has direct impact.
Choosing the processor for each subterm is done at the compile time. There is no passing and interpreting types at run time (except when a compiler uses that to implement typeclasses).

Helpful extra features

TypeEq and TypeCast as built-ins would have helped.
Deriving for LDat (the same problem as with various SYB approaches)

Discussion

Our generic functions are composed of specific and generic parts. A specific part tells what to do if the input term happens to be of a particular type. The generic part tells us what to do with all other terms. The specific part is just an HList of ordinary functions, with different argument types. If the type of the input term matches the argument type of one of the functions in the list, we apply that function. If no specific function applies, we do the generic action, implicit in the traversal strategy (for example, apply the generic function to the subterms and reduce the results).

SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the value of `generic type' to the value of the specific type.

We observe that the typecase, at the type level, has always existed in Haskell (although that fact was not perhaps realized until HList). That typecase, which does not need any `cast' operation, is the type equality predicate TypeEq. The type-level typecase is the essence of the present approach. Since the handler for each subterm is chosen statically, there is no need for run-time type representation.

The set of traversal strategies is extensible. To the pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the programmer may at any time add a new class of traversals. The message `Smash-along' shows one such extension, lazy gmap, and its application to mapping of terms that do not exist.

We should note the duality of `stapply' with respect to typeclasses. Let us consider

class C a where fn :: a -> Int
instance C Bool where fn x = if x then 10 else 20
instance C Char where fn x = fromEnum x

The typeclass C declares the function fn overloaded over the argument types. When typechecking an application "fn x", the compiler selects the instance of "fn" by matching the type of "x" against the types of C's instances. The same functionality can be implemented with `stapply':

data ResFailure -- an abstract data type with no non-bottom values
                -- and no defined operations. An attempt to actually
                -- use it will trigger a type error

fn' x = 
    stapply ((\ (x::Bool) -> if x then (10::Int) else 20) :+:
             (\ (x::Char) -> fromEnum x) :+:
	     HNil) 
            x
	    (undefined::ResFailure)

testtyc_fn1 = fn' True
testtyc_fn2 = fn' 'x'
-- testtyc_fn3 = show $ fn' () -- type error, can't show ResFailure

(The function `stapply' has the `default' clause, for all other data types. One can do the same with the typeclasses, with the help of overlapping instances). With `stapply', the set of concrete processors is given explicitly as a list rather than implicitly as a set of all typeclass instances in effect. The implementation of `stapply' reifies typechecker's instance selection algorithm. Since the concrete processors are given to stapply in a list in a certain order, stapply easily deals with `overlapping' instances: the first matching processing function is chosen.

The stapply approach is comparable to vlookup in LIGD/Dynamics.lhs. However, in our case the `overriding', the dispatch, is decided statically.

The overloaded function fn differs from fn' in that the set of fn instances is open (and can be extended at any time by defining a new instance of the class C). In contrast, all instances of fn' are explicitly enumerated and their set is closed. This is however an artifact of the particular definition fn'. We could just as well write

fn1_insts = (SCons (\ (x::Bool) -> if x then 10 else 20)
                (SCons (\ (x::Char) -> fromEnum x)
		  SNil))
fn1' x = stapply fn1_insts x (undefined::ResFailure)
-- in another module, importing fn1_insts
fn1' x = stapply (SCons (\ () -> 100) fn1_insts) x (undefined::ResFailure)

Since fn1_insts is a list, we can inspect the instances, rearrange and remove them. These operations are not available with the typeclass instances. Thus stapply approach is more flexible and powerful. Finally, the set of stapply instances can be made open, as described in [http://www.haskell.org/pipermail/haskell/2006-October/018684.html Infinite, open, statically constrained HLists]

The present approach shares with SYB the requirement that specific processing functions (the ones that handle terms of particular types) must be monomorphic. That requirement can be relaxed, as described in [http://pobox.com/~oleg/ftp/Haskell/poly2.txt Type-class overloaded functions: second-order typeclass programming with backtracking]

@@ Line 17: / Line 17: @@
 determined by the original term's type/structure). The return type
 is computed by the typechecker rather than has to be specified by the
+user. Pure producers (e.g., gminimum) are supported too.
-user.
-<li>Dat/TDat duality
 <li>Local redefinitions are possible, see Discussion.
 </ul>
@@ Line 25: / Line 24: @@
 <ul>
-<li>Whatever TDat instances are declared
+<li>Whatever LDat instances are declared
 <li>All primitive and algebraic types are supported. Function
 spaces. No Dynamic types. Polymorphic types can be handled, via
@@ Line 35: / Line 34: @@
 <li>''Library Writer'':
 writes the core library (e.g., STApply class) and declares
-Dat and TDat classes. This is done once and for all.
+the LDat class. This is done once and for all.
-<li>''Power User'': (new datatype): defines Dat and TDat instances
+<li>''Power User'': (new datatype or a new traversal strategy): defines
+LDat instances
 <li>''User'': (new generic function): defines a new generic function
 by combining, in a HList, functions processing values of specific
@@ Line 54: / Line 54: @@
 <ul>
-<li>instances of Dat and TDat. Dat is especially simple (essentially
+<li>instances of LDat, which are similar to Functor and could be
+automatically derived from the definition of the data type.
-like deriving a Functor).
-So far, there is no satisfactory way to merge Dat and TDat (of course,
-one can always jam them together. Alas, the result is not
-<em>less</em> than the sum of the parts.
 </ul>
@@ Line 83: / Line 80: @@
 <ul>
 <li>TypeEq and TypeCast as built-ins would have helped.
-<li>Deriving for Dat and TDat (the same problem as with various SYB
+<li>Deriving for LDat (the same problem as with various SYB
 approaches)
 </ul>
@@ Line 113: / Line 110: @@
 chosen statically, there is no need for run-time type representation.
+The set of traversal strategies is extensible.  To the
+pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the
+programmer may at any time add a new class of traversals. The
+message `Smash-along' shows one such extension, lazy gmap, and its
+application to
+mapping of terms that do not exist.
-We should note the duality of `sapply' with respect to
+We should note the duality of `stapply' with respect to
 typeclasses. Let us consider
 <haskell>
@@ Line 125: / Line 128: @@
 the instance of "fn" by matching the type of "x" against the types of
 C's instances. The same functionality can be implemented with
-`sapply':
+`stapply':
 <haskell>
+data ResFailure -- an abstract data type with no non-bottom values
-fn' x = sapply (SCons (\ (x::Bool) -> if x then 10 else 20)
-                 (SCons (\ (x::Char) -> fromEnum x)
+                -- and no defined operations. An attempt to actually
+                -- use it will trigger a type error
-		  SNil)) ((error "no match")::Int) x
+fn' x =
+    stapply ((\ (x::Bool) -> if x then (10::Int) else 20) :+:
+             (\ (x::Char) -> fromEnum x) :+:
+	     HNil)
+            x
+	    (undefined::ResFailure)
+testtyc_fn1 = fn' True
+testtyc_fn2 = fn' 'x'
+-- testtyc_fn3 = show $ fn' () -- type error, can't show ResFailure
 </haskell>
-(The function `sapply' has the `default' clause, for all other data
+(The function `stapply' has the `default' clause, for all other data
 types. One can do the same with the typeclasses, with the help of
 overlapping instances).
-With `sapply', the set of concrete processors is given explicitly as a
+With `stapply', the set of concrete processors is given explicitly as a
 list rather than implicitly as a set of all typeclass instances in
-effect. The implementation of `sapply' reifies typechecker's instance
+effect. The implementation of `stapply' reifies typechecker's instance
-selection algorithm. Since the concrete processors are given to sapply
+selection algorithm. Since the concrete processors are given to stapply
-in a list in a certain order, sapply easily deals with `overlapping'
+in a list in a certain order, stapply easily deals with `overlapping'
 instances: the first matching processing function is chosen.
-The sapply approach is comparable to vlookup in
+The stapply approach is comparable to vlookup in
 LIGD/Dynamics.lhs. However, in our case the `overriding', the
 dispatch, is decided statically.
@@ Line 154: / Line 168: @@
                 (SCons (\ (x::Char) -> fromEnum x)
 		  SNil))
-fn1' x = sapply fn1_insts ((error "no match")::Int) x
+fn1' x = stapply fn1_insts x (undefined::ResFailure)
 -- in another module, importing fn1_insts
-fn1' x = sapply (SCons (\ () -> 100) fn1_insts) ((error "no match")::Int) x
+fn1' x = stapply (SCons (\ () -> 100) fn1_insts) x (undefined::ResFailure)
 </haskell>
 Since fn1_insts is a list, we can inspect the instances, rearrange
 and <em>remove</em> them. These operations are not available with the typeclass
-instances. Thus sapply approach is more flexible and
+instances. Thus stapply approach is more flexible and
-powerful. Finally, the set of sapply instances can be made open, as
+powerful. Finally, the set of stapply instances can be made open, as
 described in
 [http://www.haskell.org/pipermail/haskell/2006-October/018684.html