Difference between revisions of "Applications and libraries/Generic programming/Smash"
(Initial submission) |
(Updates for Smash-along version.) |
||
Line 17: | Line 17: | ||
determined by the original term's type/structure). The return type |
determined by the original term's type/structure). The return type |
||
is computed by the typechecker rather than has to be specified by the |
is computed by the typechecker rather than has to be specified by the |
||
+ | user. Pure producers (e.g., gminimum) are supported too. |
||
− | user. |
||
− | <li>Dat/TDat duality |
||
<li>Local redefinitions are possible, see Discussion. |
<li>Local redefinitions are possible, see Discussion. |
||
</ul> |
</ul> |
||
Line 25: | Line 24: | ||
<ul> |
<ul> |
||
− | <li>Whatever |
+ | <li>Whatever LDat instances are declared |
<li>All primitive and algebraic types are supported. Function |
<li>All primitive and algebraic types are supported. Function |
||
spaces. No Dynamic types. Polymorphic types can be handled, via |
spaces. No Dynamic types. Polymorphic types can be handled, via |
||
Line 35: | Line 34: | ||
<li>''Library Writer'': |
<li>''Library Writer'': |
||
writes the core library (e.g., STApply class) and declares |
writes the core library (e.g., STApply class) and declares |
||
− | + | the LDat class. This is done once and for all. |
|
− | <li>''Power User'': (new datatype |
+ | <li>''Power User'': (new datatype or a new traversal strategy): defines |
+ | LDat instances |
||
<li>''User'': (new generic function): defines a new generic function |
<li>''User'': (new generic function): defines a new generic function |
||
by combining, in a HList, functions processing values of specific |
by combining, in a HList, functions processing values of specific |
||
Line 54: | Line 54: | ||
<ul> |
<ul> |
||
− | <li>instances of |
+ | <li>instances of LDat, which are similar to Functor and could be |
+ | automatically derived from the definition of the data type. |
||
− | like deriving a Functor). |
||
− | So far, there is no satisfactory way to merge Dat and TDat (of course, |
||
− | one can always jam them together. Alas, the result is not |
||
− | <em>less</em> than the sum of the parts. |
||
</ul> |
</ul> |
||
Line 83: | Line 80: | ||
<ul> |
<ul> |
||
<li>TypeEq and TypeCast as built-ins would have helped. |
<li>TypeEq and TypeCast as built-ins would have helped. |
||
− | <li>Deriving for |
+ | <li>Deriving for LDat (the same problem as with various SYB |
approaches) |
approaches) |
||
</ul> |
</ul> |
||
Line 113: | Line 110: | ||
chosen statically, there is no need for run-time type representation. |
chosen statically, there is no need for run-time type representation. |
||
+ | The set of traversal strategies is extensible. To the |
||
+ | pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the |
||
+ | programmer may at any time add a new class of traversals. The |
||
+ | message `Smash-along' shows one such extension, lazy gmap, and its |
||
+ | application to |
||
+ | mapping of terms that do not exist. |
||
− | We should note the duality of ` |
+ | We should note the duality of `stapply' with respect to |
typeclasses. Let us consider |
typeclasses. Let us consider |
||
<haskell> |
<haskell> |
||
Line 125: | Line 128: | ||
the instance of "fn" by matching the type of "x" against the types of |
the instance of "fn" by matching the type of "x" against the types of |
||
C's instances. The same functionality can be implemented with |
C's instances. The same functionality can be implemented with |
||
− | ` |
+ | `stapply': |
<haskell> |
<haskell> |
||
+ | data ResFailure -- an abstract data type with no non-bottom values |
||
⚫ | |||
− | + | -- and no defined operations. An attempt to actually |
|
+ | -- use it will trigger a type error |
||
− | SNil)) ((error "no match")::Int) x |
||
+ | |||
+ | fn' x = |
||
⚫ | |||
+ | (\ (x::Char) -> fromEnum x) :+: |
||
+ | HNil) |
||
+ | x |
||
+ | (undefined::ResFailure) |
||
+ | |||
+ | testtyc_fn1 = fn' True |
||
+ | testtyc_fn2 = fn' 'x' |
||
+ | -- testtyc_fn3 = show $ fn' () -- type error, can't show ResFailure |
||
</haskell> |
</haskell> |
||
− | (The function ` |
+ | (The function `stapply' has the `default' clause, for all other data |
types. One can do the same with the typeclasses, with the help of |
types. One can do the same with the typeclasses, with the help of |
||
overlapping instances). |
overlapping instances). |
||
− | With ` |
+ | With `stapply', the set of concrete processors is given explicitly as a |
list rather than implicitly as a set of all typeclass instances in |
list rather than implicitly as a set of all typeclass instances in |
||
− | effect. The implementation of ` |
+ | effect. The implementation of `stapply' reifies typechecker's instance |
− | selection algorithm. Since the concrete processors are given to |
+ | selection algorithm. Since the concrete processors are given to stapply |
− | in a list in a certain order, |
+ | in a list in a certain order, stapply easily deals with `overlapping' |
instances: the first matching processing function is chosen. |
instances: the first matching processing function is chosen. |
||
− | The |
+ | The stapply approach is comparable to vlookup in |
LIGD/Dynamics.lhs. However, in our case the `overriding', the |
LIGD/Dynamics.lhs. However, in our case the `overriding', the |
||
dispatch, is decided statically. |
dispatch, is decided statically. |
||
Line 154: | Line 168: | ||
(SCons (\ (x::Char) -> fromEnum x) |
(SCons (\ (x::Char) -> fromEnum x) |
||
SNil)) |
SNil)) |
||
− | fn1' x = |
+ | fn1' x = stapply fn1_insts x (undefined::ResFailure) |
-- in another module, importing fn1_insts |
-- in another module, importing fn1_insts |
||
− | fn1' x = |
+ | fn1' x = stapply (SCons (\ () -> 100) fn1_insts) x (undefined::ResFailure) |
</haskell> |
</haskell> |
||
Since fn1_insts is a list, we can inspect the instances, rearrange |
Since fn1_insts is a list, we can inspect the instances, rearrange |
||
and <em>remove</em> them. These operations are not available with the typeclass |
and <em>remove</em> them. These operations are not available with the typeclass |
||
− | instances. Thus |
+ | instances. Thus stapply approach is more flexible and |
− | powerful. Finally, the set of |
+ | powerful. Finally, the set of stapply instances can be made open, as |
described in |
described in |
||
[http://www.haskell.org/pipermail/haskell/2006-October/018684.html |
[http://www.haskell.org/pipermail/haskell/2006-October/018684.html |
Latest revision as of 00:59, 30 July 2007
Approach: Smash your boilerplate
Required features/Portability
- The type-level typecase/TypeEq requires overlapping and undecidable instances. The use of these extensions is limited to the library only. The end user code needs no such extensions.
- No need for Typeable, nor higher-rank types
Expressibility
- Can do both producer and consumer functions: e.g., gsize (consumer), replacing all Floats with Double (producer: the return type is determined by the original term's type/structure). The return type is computed by the typechecker rather than has to be specified by the user. Pure producers (e.g., gminimum) are supported too.
- Local redefinitions are possible, see Discussion.
Subset of data types covered
- Whatever LDat instances are declared
- All primitive and algebraic types are supported. Function spaces. No Dynamic types. Polymorphic types can be handled, via typeclass proxies.
Usage
- Library Writer: writes the core library (e.g., STApply class) and declares the LDat class. This is done once and for all.
- Power User: (new datatype or a new traversal strategy): defines LDat instances
- User: (new generic function): defines a new generic function by combining, in a HList, functions processing values of specific data types, with an appropriate generic function (cf. `everywhere') that does generic traversal. Polymorphism (when specific functions are polymorphic) requires special handling (see below).
- End User: apply the generic function like an ordinary function
Error Messages
- Can be hard to grasp??
Amount of work per data type (Boilerplate)
- instances of LDat, which are similar to Functor and could be automatically derived from the definition of the data type.
Extensibility
- Full
Reasoning
TODO ??? what does that mean?
Performance considerations
- Although that is hidden from the end user, the library heavily relies on the typeclasses. Thus the efficiency of the typeclass implementation has direct impact.
- Choosing the processor for each subterm is done at the compile time. There is no passing and interpreting types at run time (except when a compiler uses that to implement typeclasses).
Helpful extra features
- TypeEq and TypeCast as built-ins would have helped.
- Deriving for LDat (the same problem as with various SYB approaches)
Discussion
Our generic functions are composed of specific and generic parts. A specific part tells what to do if the input term happens to be of a particular type. The generic part tells us what to do with all other terms. The specific part is just an HList of ordinary functions, with different argument types. If the type of the input term matches the argument type of one of the functions in the list, we apply that function. If no specific function applies, we do the generic action, implicit in the traversal strategy (for example, apply the generic function to the subterms and reduce the results).
SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the value of `generic type' to the value of the specific type.
We observe that the typecase, at the type level, has always existed in Haskell (although that fact was not perhaps realized until HList). That typecase, which does not need any `cast' operation, is the type equality predicate TypeEq. The type-level typecase is the essence of the present approach. Since the handler for each subterm is chosen statically, there is no need for run-time type representation.
The set of traversal strategies is extensible. To the pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the programmer may at any time add a new class of traversals. The message `Smash-along' shows one such extension, lazy gmap, and its application to mapping of terms that do not exist.
We should note the duality of `stapply' with respect to typeclasses. Let us consider
class C a where fn :: a -> Int
instance C Bool where fn x = if x then 10 else 20
instance C Char where fn x = fromEnum x
The typeclass C declares the function fn overloaded over the argument types. When typechecking an application "fn x", the compiler selects the instance of "fn" by matching the type of "x" against the types of C's instances. The same functionality can be implemented with `stapply':
data ResFailure -- an abstract data type with no non-bottom values
-- and no defined operations. An attempt to actually
-- use it will trigger a type error
fn' x =
stapply ((\ (x::Bool) -> if x then (10::Int) else 20) :+:
(\ (x::Char) -> fromEnum x) :+:
HNil)
x
(undefined::ResFailure)
testtyc_fn1 = fn' True
testtyc_fn2 = fn' 'x'
-- testtyc_fn3 = show $ fn' () -- type error, can't show ResFailure
(The function `stapply' has the `default' clause, for all other data types. One can do the same with the typeclasses, with the help of overlapping instances). With `stapply', the set of concrete processors is given explicitly as a list rather than implicitly as a set of all typeclass instances in effect. The implementation of `stapply' reifies typechecker's instance selection algorithm. Since the concrete processors are given to stapply in a list in a certain order, stapply easily deals with `overlapping' instances: the first matching processing function is chosen.
The stapply approach is comparable to vlookup in LIGD/Dynamics.lhs. However, in our case the `overriding', the dispatch, is decided statically.
The overloaded function fn differs from fn' in that the set of fn instances is open (and can be extended at any time by defining a new instance of the class C). In contrast, all instances of fn' are explicitly enumerated and their set is closed. This is however an artifact of the particular definition fn'. We could just as well write
fn1_insts = (SCons (\ (x::Bool) -> if x then 10 else 20)
(SCons (\ (x::Char) -> fromEnum x)
SNil))
fn1' x = stapply fn1_insts x (undefined::ResFailure)
-- in another module, importing fn1_insts
fn1' x = stapply (SCons (\ () -> 100) fn1_insts) x (undefined::ResFailure)
Since fn1_insts is a list, we can inspect the instances, rearrange and remove them. These operations are not available with the typeclass instances. Thus stapply approach is more flexible and powerful. Finally, the set of stapply instances can be made open, as described in [http://www.haskell.org/pipermail/haskell/2006-October/018684.html Infinite, open, statically constrained HLists]
The present approach shares with SYB the requirement that specific
processing functions (the ones that handle terms of particular types)
must be monomorphic. That requirement can be relaxed, as described in
[http://pobox.com/~oleg/ftp/Haskell/poly2.txt
Type-class overloaded functions: second-order typeclass programming
with backtracking]