https://wiki.haskell.org/api.php?action=feedcontributions&user=Joe&feedformat=atomHaskellWiki - User contributions [en]2024-03-28T21:49:44ZUser contributionsMediaWiki 1.35.5https://wiki.haskell.org/index.php?title=HXT&diff=33886HXT2010-03-01T04:32:34Z<p>Joe: fixed typo</p>
<hr />
<div>[[Category:Web]]<br />
[[Category:XML]]<br />
[[Category:Tools]]<br />
[[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/hxt-8.2.0.tar.gz hxt-8.2.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs2.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
data NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTree]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
When used to combine predicate filters, the <hask>(>>>)</hask> serves as a logical "and" operator or, from the relational view, as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the logical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a non-empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulating trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. These traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops when a subtree is found, because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting<br />
general arrows into more specific arrows.<br />
<br />
Don't worry about all these conceptual details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the appropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attribute values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containing a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relative HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
=== Serialization and deserialisation to/from XML ===<br />
<br />
Examples can be found in [[HXT/Conversion of Haskell data from/to XML]]<br />
<br />
=== Practical examples of HXT ===<br />
<br />
More complex and complete examples of HXT in action<br />
can be found in [[HXT/Practical]]</div>Joehttps://wiki.haskell.org/index.php?title=Web/Literature/Practical_web_programming_in_Haskell&diff=32696Web/Literature/Practical web programming in Haskell2009-12-22T04:02:00Z<p>Joe: Fixed link.</p>
<hr />
<div>[[Category:Tutorials]]<br />
{{Template:Formal under construction}}<br />
<br />
== Introduction ==<br />
<br />
This tutorial aims to get you started with writing web applications<br />
in Haskell. We describe a relatively light-weight <br />
approach to Haskell web programming<br />
which uses a CGI library and an XHTML combinator library. <br />
<br />
We think that while the approach we describe here is not as sophisticated<br />
or innovative as some other approaches, it is simple, portable and easy<br />
to understand if you are already familiar with web programming in other <br />
languages.<br />
<br />
The tutorial starts with preliminaries such as how to install the<br />
necessary software and how to compile and run your web<br />
applications. We then show a number of working small example programs<br />
which introduce the basic features of the CGI and XHtml libraries. We<br />
then move on to how to use monad transformers to add application<br />
specific functionality such as sessions to the CGI monad, and how to<br />
create database-driven web applications.<br />
We also present FastCGI, and an approach to using dynamically<br />
loaded Haskell code.<br />
<br />
=== Other approaches ===<br />
<br />
[Web Authoring System Haskell <br />
(WASH) [http://www.informatik.uni-freiburg.de/~thiemann/WASH/].<br />
Domain-specific embedded language. Type-safe forms handling. <br />
Threads continuation through client. This gives good<br />
back-button and session splitting properties.<br />
<br />
Haskell Application Server <br />
(HAppS) [http://happs.org/].<br />
Complete system including web server in one program.<br />
Uses XSLT for output.<br />
<br />
Haskell Server Pages <br />
(HSP) [http://www.cs.chalmers.se/~d00nibro/hsp/].<br />
Uses preprocessor to make XML tags into Haskell expressions.<br />
Dynamic compilation.<br />
<br />
<br />
=== Assumed knowledge ===<br />
<br />
This tutorial is not meant as an introduction to Haskell or web programming.<br />
We will assume that you have some familiarity with the following <br />
concepts:<br />
<br />
==== Haskell ====<br />
<br />
This tutorial is not meant as a first introduction to Haskell. If you<br />
want to learn about Haskell in general, have a look at the lists of <br />
[[books and tutorials]]. You may want to start with [[Haskell in 5 steps]].<br />
<br />
==== (X)HTML ====<br />
<br />
[http://www.w3.org/MarkUp/ HTML (HyperText Markup Language)] is the <br />
"the lingua franca for publishing hypertext on the World Wide Web''.<br />
The XHtml library which we use in this tutorial produces <br />
[http://www.w3.org/TR/xhtml1/ XHTML 1.0], which is <br />
[http://www.w3.org/TR/html401/ HTML 4.0] formulated as <br />
[http://www.w3.org/XML/ XML].<br />
<br />
The combinators in the XHtml library do not make much sense unless you<br />
understand at least some parts of HTML. <br />
<br />
==== CGI ====<br />
<br />
CGI (Common Gateway Interface) programs are programs which run on the<br />
web server. They are given input which comes from the user's browser, <br />
and their output is given to the browser.<br />
<br />
To really understand how the CGI library works, you probably need to know<br />
a thing or two about CGI. The authoritative resource on CGI is the<br />
[http://hoohoo.ncsa.uiuc.edu/cgi/interface.html CGI specification].<br />
<br />
<br />
== Required software ==<br />
<br />
=== Haskell compiler ===<br />
<br />
[[GHC]], the Glasgow Haskell <br />
Compiler, is the Haskell implementation that we will use in this tutorial. <br />
However, any Haskell implementation that supports Haskell98 and multi-parameter<br />
type classes should work.<br />
<br />
=== Libraries: xhtml and cgi ===<br />
<br />
If your Haskell implementation does not come with the <tt>xhtml</tt> and<br />
<tt>cgi</tt> packages, download them from <br />
[http://hackage.haskell.org/packages/hackage.html HackageDB].<br />
<br />
=== Web server ===<br />
<br />
You need to have access to a web server on which you can run CGI programs.<br />
The most convenient way to do this when learning and developing is to run<br />
a web server on your development machine. If you run the programs on some<br />
other machine you need to make sure that you compile your programs <br />
so that they can run on that machine. This normally means that the machines<br />
must to have the same architecture and run the same operating system. <br />
<br />
====Deploying statically linked applications ====<br />
<br />
Linking your applications statically by giving the flags <tt>-static<br />
-optl-static</tt> to GHC will avoid problems with missing libraries on<br />
the web server.<br />
<br />
For example, this simple program,<br />
<br />
<haskell><br />
import Database.SQLite<br />
main = print "hey, test this"<br />
</haskell><br />
<br />
when compiled as $ ghc A.hs --make is dynamically linked against:<br />
<br />
<haskell><br />
$ ldd A<br />
A:<br />
Start End Type Open Ref GrpRef Name<br />
0000000000000000 0000000000000000 exe 1 0 0 A<br />
0000000041a85000 0000000041ee5000 rlib 0 1 0 /usr/local/lib/libsqlite3.so.9.0<br />
0000000049b04000 0000000049f1d000 rlib 0 1 0 /usr/lib/libm.so.2.3<br />
0000000042213000 000000004264f000 rlib 0 1 0 /usr/local/lib/libgmp.so.7.0<br />
0000000047d0e000 00000000481e0000 rlib 0 1 0 /usr/lib/libc.so.42.0<br />
0000000047900000 0000000047900000 rtld 0 1 0 /usr/libexec/ld.so<br />
</haskell><br />
<br />
Now, we can just pass some linker flags through to statically link this lot,<br />
<br />
<haskell><br />
$ ghc A.hs --make -optl-static -no-recomp<br />
$ ldd A<br />
ldd: A: not a dynamic executable<br />
$ file A<br />
A: ELF 64-bit LSB executable, AMD64, version 1, for OpenBSD, statically linked, not stripped<br />
</haskell><br />
<br />
You could also use the [[Haskell Web Server]].<br />
<br />
'''Caveats:'''<br />
<br />
* The <tt>-static</tt> flag in GHC 6.8.2 does not link the libraries in the correct order, resulting in a link failure (which you can hack around if you have to by shuffling <tt>-lpthread</tt> after <tt>-lrt</tt> in the gargantuan linker invocation). This problem should disappear with GHC 6.8.3.<br />
* Sometimes you will need to add <tt>extra-libraries</tt> fields to various libraries' <tt>.cabal</tt> files. This manifests as missing symbols. Note that many linkers are sensitive to the order of the <tt>-l</tt> arguments, so the order of libraries in this field matters.<br />
<br />
<br />
== Compiling and running web applications ==<br />
<br />
Use GHC to produce a binary executable called <tt>prog.cgi</tt> from the Haskell <br />
source code file <tt>prog.hs</tt>:<br />
<pre><br />
ghc --make -package cgi -package xhtml -o prog.cgi prog.hs<br />
</pre><br />
<br />
Put the compiled program in the cgi-bin directory,<br />
or give it the extension .cgi, depending on the configuration<br />
of the web server.<br />
<br />
Linking your applications statically <br />
by giving the flags <tt>-static -optl-static</tt> to GHC <br />
will avoid problems with missing libraries on the web server.<br />
<br />
To run the compiled program, visit the URL of the CGI <br />
program with your web browser.<br />
<br />
<br />
== Simple examples ==<br />
<br />
=== Hello World ===<br />
<br />
Here is a very simple example which just outputs some static HTML.<br />
The type signatures in this code are optional. We show them here <br />
for clarity, but omit them in some later examples.<br />
<br />
<haskell><br />
import Network.CGI<br />
import Text.XHtml<br />
<br />
page :: Html <br />
page = body << h1 << "Hello World!"<br />
<br />
cgiMain :: CGI CGIResult<br />
cgiMain = output $ renderHtml page<br />
<br />
main :: IO ()<br />
main = runCGI $ handleErrors cgiMain<br />
</haskell><br />
<br />
The <code>page</code> function constructs an HTML document which consists<br />
of a body containing a single header element which contains the text<br />
"Hello World". The CGI-action <code>cgiMain</code> renders the HTML document<br />
as a string, and produces that string as output. The <code>main</code> function<br />
runs <code>cgiMain</code>, using the normal CGI protocol for input and output.<br />
It also uses <code>handleErrors</code> to output an error page in case |cgiMain|<br />
throws an exception.<br />
<br />
Fans of one-liners may like this version better (<code>handleErrors</code> has been <br />
omitted since this simple program will not throw any exceptions):<br />
<br />
<haskell><br />
import Text.XHtml<br />
import Network.CGI<br />
<br />
main = runCGI $ output $ renderHtml $ body << h1 << "Hello World!"<br />
</haskell><br />
<br />
These are some of the important functions used in this example:<br />
<br />
<haskell><br />
-- creates a string containing the HTML document.<br />
renderHtml :: Html -> String<br />
<br />
-- outputs a string as the body of the HTTP response.<br />
output :: String -> CGI CGIResult<br />
<br />
-- Catches any exception thrown by the given CGI action, returns an <br />
-- error page with a 500 Internal Server Error, showing the exception <br />
-- information, and logs the error.<br />
handleErrors :: CGI CGIResult -> CGI CGIResult<br />
<br />
-- Runs a CGI action which produces a CGIResult, using the CGI protocol<br />
-- to get the inputs and send the outputs.<br />
runCGI :: CGI CGIResult -> IO ()<br />
</haskell><br />
<br />
==== HTML combinators ====<br />
<br />
See also [http://search.cpan.org/src/AUTRIJUS/Language-Haskell-0.01/hugs98-Nov2003/fptools/hslibs/text/html/doc/doc.htm].<br />
<br />
<code>Html</code> is the type of HTML fragments. It comes from the <code>Text.XHtml</code> module.<br />
There are functions for all XHTML 1.0 elements. <br />
Some examples:<br />
<br />
* header, body<br />
* h1, h2, ...<br />
* thediv<br />
* p<br />
* image<br />
<br />
The <code><<</code> operator is used for nesting HTML.<br />
<br />
<code>+++</code> concatenates HTML.<br />
<br />
Attributes are added to tags using the <code>!</code> operator.<br />
<br />
The function <code>renderHtml</code> (FIXME: explain variants) produces a string containing the<br />
document.<br />
<br />
<br />
=== Getting user input ===<br />
<br />
This program shows a form which asks the user for her name.<br />
When the form is submitted, the program greets the user by name.<br />
<br />
<haskell><br />
import Network.CGI<br />
import Text.XHtml<br />
<br />
inputForm = form << [paragraph << ("My name is " +++ textfield "name"),<br />
submit "" "Submit"]<br />
<br />
greet n = paragraph << ("Hello " ++ n ++ "!")<br />
<br />
page t b = header << thetitle << t +++ body << b<br />
<br />
cgiMain = do mn <- getInput "name"<br />
let x = maybe inputForm greet mn<br />
output $ renderHtml $ page "Input example" x<br />
<br />
main = runCGI $ handleErrors cgiMain<br />
</haskell><br />
<br />
<haskell><br />
-- Get the value of an input variable, for example from a form. <br />
-- If the variable has multiple values, the first one is returned.<br />
getInput :: String -> CGI (Maybe String)<br />
</haskell><br />
<br />
=== Cookies ===<br />
<br />
<haskell><br />
import Network.CGI<br />
import Text.XHtml<br />
<br />
import Control.Monad (liftM)<br />
import Data.Maybe (fromMaybe)<br />
<br />
hello :: Int -> Html<br />
hello 0 = h1 << "Welcome!"<br />
+++ p << "This is the first time I see you."<br />
hello c = h1 << "Welcome back!"<br />
+++ p << ("I have seen you " ++ show c ++ " times before.")<br />
<br />
page :: String -> Html -> Html<br />
page t b = header << thetitle << t +++ body << b<br />
<br />
cgiMain :: CGI CGIResult<br />
cgiMain = do c <- liftM (fromMaybe 0) $ readCookie "mycookie"<br />
setCookie (newCookie "mycookie" (show (c+1)))<br />
output $ renderHtml $ page "Cookie example" $ hello c<br />
<br />
main :: IO ()<br />
main = runCGI $ handleErrors cgiMain<br />
</haskell><br />
<br />
Here we use <code>newCookie</code>, <code>setCookie</code> and <code>readCookie</code> to store and retrieve a counter<br />
cookie in the browser. If you want to get the string value of a cookie, use <code>getCookie</code> instead of <code>readCookie</code>.<br />
<br />
<br />
=== File uploads ===<br />
<br />
FIXME: use a safer example<br />
<br />
<haskell><br />
-- Accepts file uploads and saves the files in the given directory.<br />
-- WARNING: this script is a SECURITY RISK and only for <br />
-- demo purposes. Do not put it on a public web server.<br />
<br />
import Network.CGI<br />
import Text.XHtml<br />
<br />
import qualified Data.ByteString.Lazy as BS<br />
<br />
import Control.Monad (liftM)<br />
import Data.Maybe (fromJust)<br />
<br />
uploadDir = "../upload"<br />
<br />
fileForm = form ! [method "post", enctype "multipart/form-data"]<br />
<< [afile "file", submit "" "Upload"]<br />
saveFile n =<br />
do cont <- liftM fromJust $ getInputFPS "file"<br />
let f = uploadDir ++ "/" ++ basename n<br />
liftIO $ BS.writeFile f cont<br />
return $ paragraph << ("Saved as " +++ anchor ! [href f] << f +++ ".")<br />
<br />
page t b = header << thetitle << t +++ body << b<br />
<br />
basename = reverse . takeWhile (`notElem` "/\\") . reverse<br />
<br />
cgiMain = <br />
do mn <- getInputFilename "file"<br />
h <- maybe (return fileForm) saveFile mn<br />
output $ renderHtml $ page "Upload example" h<br />
<br />
main = runCGI $ handleErrors cgiMain<br />
</haskell><br />
<br />
We first output a file upload form, which should use the HTTP POST method, <br />
and the multipart/form-data content type. Here we seen an example of the use of<br />
HTML attributes, added with the <code>!</code> operator.<br />
<br />
For efficiency reasons, we use Data.ByteString.Lazy to represent the file contents.<br />
getInputFPS gets the value of an input variable as a lazy ByteString.<br />
<br />
<br />
=== Error handling ===<br />
<br />
handleErrors catches all exceptions and <br />
outputs a default error page with some information about the exception.<br />
You can write you own exception handler if you want to do something else<br />
when an exception is thrown. It can be useful to set<br />
the response code, e.g. 404.<br />
<br />
=== Returning non-HTML ===<br />
<br />
Of course we do not have to output HTML. Use setHeader to set the value<br />
of the Content-type header, and you can output whatever string you like. <br />
<br />
In this example we return an image:<br />
<br />
<haskell><br />
import Network.CGI<br />
import System.IO<br />
import qualified Data.ByteString.Lazy as B<br />
<br />
main = do<br />
b <- B.readFile "./img/test.jpg" -- read the image<br />
runCGI $ handleErrors (cgiMain b)<br />
<br />
cgiMain :: B.ByteString -> CGI CGIResult<br />
cgiMain p = do<br />
-- we need to set the appropriate content-type<br />
setHeader "Content-type" "image/jpg"<br />
outputFPS p<br />
</haskell><br />
<br />
Examples: RSS<br />
<br />
=== Setting response headers ===<br />
<br />
You can use the <code>setHeader</code> function to set arbitrary HTTP response headers.<br />
You can also set the response code, as seen above.<br />
<br />
Example: output raw file data (with last-modified)<br />
<br />
<br />
<br />
== Going further ==<br />
<br />
This section explores some of possibilities beyond the basic web application<br />
programming.<br />
<br />
=== Extending the CGI monad with monad transformers ===<br />
<br />
At this point, you should be able to create many useful CGI scripts.<br />
As your scripts get more ambitious, however, you may find yourself<br />
needing to pass "global" parameters to your CGI actions (e.g. database<br />
connections, session information.) Rather than explicitly passing<br />
these values around, you can extend the CGI monad to do this work for<br />
you.<br />
<br />
The <hask>Network.CGI.Monad</hask> module defines a CGI monad<br />
transformer, allowing us to build a new monad that does everything the<br />
CGI monad does -- and more!<br />
<br />
For example, let's define a new CGI monad that provides a database<br />
connection (in this example, we use the<br />
<hask>Database.HSQL.PostgreSQL</hask> module for our database.) Since<br />
it will be used by the CGI application, I'll call the new monad "App".<br />
<br />
Should this not compile for you, you need to enable some extensions:<br />
<haskell><br />
{-# LANGUAGE GeneralizedNewtypeDeriving #-}<br />
{-# LANGUAGE FlexibleInstances #-}<br />
</haskell><br />
<br />
After importing the appropriate modules, we define a new type,<br />
<hask>AppT</hask> that is made up two monad transformers,<br />
<hask>CGIT</hask> and <hask>ReaderT</hask>. The <hask>CGIT</hask><br />
monad "wraps" the base monad "m". The <hask>CGIT</hask> monad, in<br />
turn, is wrapped by the <hask>ReaderT</hask> monad, which contains, in<br />
its environment, the database <hask>Connection</hask>.<br />
<br />
<hask>AppT</hask> takes two type parameters. The first is the base<br />
monad that the monad transformers are modifying. Usually this will be<br />
the <hask>IO</hask> monad. The second type is the data type that an<br />
action in the monad will return.<br />
<br />
<haskell><br />
import Control.Monad.Reader<br />
import Network.CGI<br />
import Network.CGI.Monad<br />
import Database.HSQL.PostgreSQL<br />
<br />
newtype AppT m a = App (ReaderT Connection (CGIT m) a)<br />
deriving (Monad, MonadIO, MonadReader Connection)<br />
</haskell><br />
<br />
Like <hask>CGI</hask>, we make a type synonym that defines the most<br />
common use of this new monad.<br />
<br />
<haskell><br />
type App a = AppT IO a<br />
</haskell><br />
<br />
We're not quite finished defining <hask>App</hask> yet. In order to be<br />
used like the CGI monad, <hask>App</hask> needs to be an instance of<br />
the <hask>MonadCGI</hask> class. This class defines two functions that<br />
we must support.<br />
<br />
<haskell><br />
instance MonadCGI (AppT IO) where<br />
cgiAddHeader n v = App $ lift $ cgiAddHeader n v<br />
cgiGet x = App $ lift $ cgiGet x<br />
</haskell><br />
<br />
So now we have an App monad that gives us all the functionality of<br />
CGI, but also carries around a database connection. The last step is<br />
to define the function that creates the monad so we can run actions<br />
inside it.<br />
<br />
<haskell><br />
import Control.Exception (bracket)<br />
import System.IO (stdin, stdout)<br />
<br />
runApp :: App CGIResult -> IO ()<br />
runApp (App a) =<br />
bracket (connect "host" "db" "user" "password")<br />
disconnect<br />
(\c -> do { env <- getCGIVars<br />
; hRunCGI env stdin stdout (runCGIT (runReaderT a c))<br />
; return () } )<br />
</haskell><br />
<br />
(either fill in your account/password information, or change<br />
<hask>runApp</hask> to accept the paramters as function arguments.)<br />
The function uses <hask>bracket</hask> so that the database connection<br />
gets released properly when the monad ends or if an exception is<br />
thrown.<br />
<br />
=== Templating ===<br />
<br />
There are times when you absolutely do not want to embed (X)HTML in Haskell. You can separate the code and the presentation (the Holy Grail of erm, web development). The code will be, well, Haskell, and the presentation will be buried inside templates. This might not be the case: fortunately, there's a very nice templating engine available, called [[HStringTemplate]].<br />
<br />
=== FastCGI ===<br />
<br />
[http://www.fastcgi.com/ FastCGI] is a standard for CGI-like programs that are not restarted<br />
for every request. This reduces the overhead involved in handling each <br />
request, and reduces the servers response time for each request.<br />
The overhead involved in starting a new process for each<br />
request can also include the need to set up new DB connections <br />
every time. With FastCGI, DB connections can be reused.<br />
<br />
Install FastCGI. Get a web server which can run FastCGI programs.<br />
Import Network.FastCGI. Use runFastCGI.<br />
<br />
See also a tutorial by Paul R Brown: [http://mult.ifario.us/p/wiring-haskell-into-a-fastcgi-web-server Wiring Haskell Into a FastCGI Web Server]<br />
<br />
Take a look at lightweight, minimalistic FastCGI-based web frameworks: [http://community.haskell.org/~sclv/hvac/ HVAC] (Haskell view and controller) and [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/kibro Kibro].<br />
<br />
=== SCGI ===<br />
<br />
[http://www.mems-exchange.org/software/scgi/ SCGI] is a simpler alternative to FastCGI for writing CGI-like programs in persistent processes, external to the web server. SCGI is less featureful than FastCGI, but has the advantage that it does not require an external library.<br />
<br />
Install [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/scgi-0.1 SCGI], import Network.SCGI, and use runSCGI. Everything else is then done inside a CGI monad as above.<br />
<br />
=== URL rewriting / dispatching ===<br />
<br />
Easy to use and expressive [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/UrlDisp URL dispatching library] is available on Hackage.<br />
<br />
=== Dynamic loading ===<br />
<br />
== Database-driven web-applications ==<br />
<br />
=== Database connectivity ===<br />
<br />
See [http://darcs.haskell.org/takusen/ Takusen] and [http://software.complete.org/software/projects/show/hdbc HDBC]. If you would like to write queries in Haskell (and not SQL), see also [http://haskelldb.sourceforge.net/ HaskellDB], which integrates with HDBC.<br />
<br />
==== Persistent DB connections with FastCGI ====<br />
<br />
FastCGI aren't restarted for each request, only the runFastCGI part is re-run. Everything (handles, datastructures etc.) you do outside of that loop will be persistent. However you need to handle errors yourself, because you're operating outside of handleErrors.<br />
<br />
=== Web services ===<br />
<br />
== Web frameworks ==<br />
<br />
[http://turbinado.org/Home Turbinado], an early stab at Ruby On Rails.<br />
<br />
== Existing applications ==<br />
<br />
[http://hackage.haskell.org/ HackageDB web interface].<br />
<br />
[http://tutorial.happstack.com/ Real World HAppS: The Cabalized, Self-Demoing HAppS Tutorial].<br />
<br />
[http://github.com/jgm/gitit/tree/master/ Gitit, a wiki written using Git, HAppS and Pandoc].<br />
<br />
[http://www.bringert.net/ Hope].<br />
<br />
<br />
''Authors: Björn Bringert''<br />
''Authors: Don Stewart''</div>Joehttps://wiki.haskell.org/index.php?title=What_a_Monad_is_not&diff=32551What a Monad is not2009-12-14T16:07:35Z<p>Joe: Made a complete sentence; hopefully kept the meaning what the author intended</p>
<hr />
<div>==Warning==<br />
This page is currently an unprocessed braindump. Feel free to dump additional stuff or massage stuff into didactic pleasures.<br />
<br />
Also, don't be surprised if you leave this page more confused than before. That just means that it has successfully destroyed your false assumptions, or that you've fallen for some horrible inside joke. Beware of [[Zygohistomorphic prepromorphisms]]. Go for [http://ertes.de/articles/monads.html warm and fuzzy], instead.<br />
<br />
==Monads are not a good choice as topic for your first Haskell blog entry==<br />
...just accept that they're [http://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/ burritos], and wait until later.<br />
<br />
==Monads are not a language feature==<br />
Really. They are defined in terms of Haskell, not Haskell in terms of them. Conversely,<br />
<br />
==Haskell doesn't need Monads==<br />
...well, apart from the Haskell standard defining the way IO is done in terms of Monads: It could be done differently and still work.<br />
<br />
==Monads are not impure==<br />
...In no way whatsoever. You don't even need flexible morals to claim it. To be more specific, it's IO that's impure. That makes the IO monad impure. But that's not a general property of monads - just IO. And even then, we can pretend that Haskell is a purely functional description language for imperative programs. But we didn't want to employ flexible morals, now did we?<br />
<br />
==Monads are not about state==<br />
While it is certainly possible to abstract away explicit state passing by using a Monad, that's not what a monad is.<br />
<br />
(some elaboration needed)<br />
<br />
==Monads are not about strictness==<br />
Monad operations (bind and return) have to be lazy in fact, always! However<br />
other operations can be specific to each monad.<br />
For instance some are strict (like IO), and some are lazy (like []). Then there are some that come in multiple flavours, like State.<br />
<br />
==Monads are not values==<br />
This point might be driven home best by pointing out that instance Monad Foo where ... is not a data type, but a declaration of a typeclass instance. However, to elaborate:<br />
<br />
Monads are not values in the same sense that addition and multiplication are not numbers: They capture a -- very specific -- relationship between values of a specific domain into a common abstraction. We're going to call these values monads manage ''mobits'', somewhat like this:<br />
<br />
type Mobit m a = Monad m => m a<br />
<br />
The IO monad manages mobits representing side-effects ("IO actions").<br />
<br />
The List monad manages mobits representing multiple values ("[a]")<br />
<br />
The Reader monads manages mobits that are pure computations that use asks to propagate information instead of explicit arguments<br />
<br />
...and while addition and multiplication are both monoids over the positive natural numbers, a monad is a monoid object in a category of endofunctors: return is the unit, and join is the binary operation. It couldn't be more simple.<br />
<br />
<br />
<br />
==Monads are not a replacement for applicative functors==<br />
Instead, every monad ''is'' an applicative functor (as well as a functor). It is considered good practice not to use >>= if all you need is <*>, or even fmap.<br />
<br />
Not confusing which features of monads are specific to monads only and which stem from applicative functors is vitally important for a deeper understanding of monads. As an example, the applicative functor interface of parser libraries can parse context-free grammars (and look just like EBNF), while the monadic interface can parse context-sensitive grammars: Monads allow you to influence further processing by inspecting the result of your parse. To understand why, have a look at the type of >>=. To understand why applicative functors by themselves are sufficient to track the current parsing position, have a look at the uu-parsinglib tutorial.<br />
<br />
The exact differences are elaborated in even greater detail in Brent Yorgey's excellent [[Typeclassopedia]].<br />
<br />
==Monads are not about ordering==<br />
Monads are commonly used to order sequences of computations. But this is misleading. Just as you can use monads for state, or strictness, you can use them to order computations. But there are also commutative monads, like Maybe, that don't order anything. So ordering is not in any way essential to what a monad is.<br />
<br />
I'll try to explain what's meant by ordering. Consider an expression like<br />
<br />
let x = a<br />
y = b<br />
in f x y<br />
<br />
That gives the same result as<br />
<br />
let y = b<br />
x = a<br />
in f x y<br />
<br />
It doesn't matter what order we write the two assignments. But for doing I/O we'd like ordering. Monads allow use to express<br />
<br />
do<br />
x <- getChar<br />
y <- getChar<br />
return (x,y)<br />
<br />
and have it be different from<br />
<br />
do<br />
y <- getChar<br />
x <- getChar<br />
return (x,y)<br />
<br />
The second example returns a pair of characters in the opposite order to which they were entered.<br />
<br />
However, there are monads for which swapping the order of lines like this makes no difference. For example the Maybe monad.<br />
<br />
So while it is correct to say that monads can be used to order operations, it would be wrong to say that monads are a mechanism for ordering operations.<br />
<br />
This notion of commutativity is different from the familiar the one in algebra where a+b=b+a. So it's not about the fact that<br />
<br />
(Just 2 >> Just 3) == Just 3<br />
<br />
(It took long to explain that, maybe someone can edit this down.)</div>Joehttps://wiki.haskell.org/index.php?title=What_a_Monad_is_not&diff=32550What a Monad is not2009-12-14T15:59:01Z<p>Joe: grammar touch-up</p>
<hr />
<div>==Warning==<br />
This page is currently an unprocessed braindump. Feel free to dump additional stuff or massage stuff into didactic pleasures.<br />
<br />
Also, don't be surprised if you leave this page more confused than before. That just means that it has successfully destroyed your false assumptions, or that you've fallen for some horrible inside joke. Beware of [[Zygohistomorphic prepromorphisms]]. Go for [http://ertes.de/articles/monads.html warm and fuzzy], instead.<br />
<br />
==Monads are not a good choice as topic for your first Haskell blog entry==<br />
...just accept that they're [http://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/ burritos], and wait until later.<br />
<br />
==Monads are not a language feature==<br />
Really. They are defined in terms of Haskell, not Haskell in terms of them. Conversely,<br />
<br />
==Haskell doesn't need Monads==<br />
...well, apart from the Haskell standard defining the way IO is done in terms of Monads: It could be done differently and still work.<br />
<br />
==Monads are not impure==<br />
...In no way whatsoever. You don't even need flexible morals to claim it. To be more specific, it's IO that's impure. That makes the IO monad impure. But that's not a general property of monads - just IO. And even then, we can pretend that Haskell is a purely functional description language for imperative programs. But we didn't want to employ flexible morals, now did we?<br />
<br />
==Monads are not about state==<br />
While it is certainly possible to abstract away explicit state passing by using a Monad, that's not what a monad is.<br />
<br />
(some elaboration needed)<br />
<br />
==Monads are not about strictness==<br />
Monad operations (bind and return) have to be lazy in fact, always! However<br />
other operations can be specific to each monad.<br />
For instance some are strict (like IO), and some are lazy (like []). Then there are some that come in multiple flavours, like State.<br />
<br />
==Monads are not values==<br />
This point might be driven home best by pointing out that instance Monad Foo where ... is not a data type, but a declaration of a typeclass instance. However, to elaborate:<br />
<br />
Monads are not values in the same sense that addition and multiplication are not numbers: They capture a -- very specific -- relationship between values of a specific domain into a common abstraction. We're going to call these values monads manage ''mobits'', somewhat like this:<br />
<br />
type Mobit m a = Monad m => m a<br />
<br />
The IO monad manages mobits representing side-effects ("IO actions").<br />
<br />
The List monad manages mobits representing multiple values ("[a]")<br />
<br />
The Reader monads manages mobits that are pure computations that use asks to propagate information instead of explicit arguments<br />
<br />
...and while addition and multiplication are both monoids over the positive natural numbers, a monad is a monoid object in a category of endofunctors: return is the unit, and join is the binary operation. It couldn't be more simple.<br />
<br />
<br />
<br />
==Monads are not a replacement for applicative functors==<br />
Instead, every monad ''is'' an applicative functor (as well as a functor). It is considered good practice not to use >>= if all you need is <*>, or even fmap.<br />
<br />
Not confusing which features of monads are specific to monads only and which stem from applicative functors is vitally important for a deeper understanding of monads. As an example, the applicative functor interface of parser libraries can parse context-free grammars (and look just like EBNF), while the monadic interface can parse context-sensitive grammars: Monads allow you to influence further processing by inspecting the result of your parse. To understand why, have a look at the type of >>=. To understand why applicative functors by themselves are sufficient to track the current parsing position, have a look at the uu-parsinglib tutorial.<br />
<br />
The exact differences are elaborated in even greater detail in Brent Yorgey's excellent [[Typeclassopedia]].<br />
<br />
==Monads are not about ordering==<br />
It's a commonplace that monads are about ordering sequences of computations. But this is misleading. Just as you can use monads for state, or strictness, you can use them to order computations. But there are also commutative monads, like Maybe, that don't order anything. So ordering is not in any way essential to what a monad is.<br />
<br />
I'll try to explain what's meant by ordering. Consider an expression like<br />
<br />
let x = a<br />
y = b<br />
in f x y<br />
<br />
That gives the same result as<br />
<br />
let y = b<br />
x = a<br />
in f x y<br />
<br />
It doesn't matter what order we write the two assignments. But for doing I/O we'd like ordering. Monads allow use to express<br />
<br />
do<br />
x <- getChar<br />
y <- getChar<br />
return (x,y)<br />
<br />
and have it be different from<br />
<br />
do<br />
y <- getChar<br />
x <- getChar<br />
return (x,y)<br />
<br />
The second example returns a pair of characters in the opposite order to which they were entered.<br />
<br />
However, there are monads for which swapping the order of lines like this makes no difference. For example the Maybe monad.<br />
<br />
So while it is correct to say that monads can be used to order operations, it would be wrong to say that monads are a mechanism for ordering operations.<br />
<br />
This notion of commutativity is different from the familiar the one in algebra where a+b=b+a. So it's not about the fact that<br />
<br />
(Just 2 >> Just 3) == Just 3<br />
<br />
(It took long to explain that, maybe someone can edit this down.)</div>Joehttps://wiki.haskell.org/index.php?title=What_a_Monad_is_not&diff=32548What a Monad is not2009-12-14T15:53:40Z<p>Joe: grammar touch-up</p>
<hr />
<div>==Warning==<br />
This page is currently an unprocessed braindump. Feel free to dump additional stuff or massage stuff into didactic pleasures.<br />
<br />
Also, don't be surprised if you leave this page more confused than before. That just means that it has successfully destroyed your false assumptions, or that you've fallen for some horrible inside joke. Beware of [[Zygohistomorphic prepromorphisms]]. Go for [http://ertes.de/articles/monads.html warm and fuzzy], instead.<br />
<br />
==Monads are not a good choice as topic for your first Haskell blog entry==<br />
...just accept that they're [http://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/ burritos], and wait until later.<br />
<br />
==Monads are not a language feature==<br />
Really. They are defined in terms of Haskell, not Haskell in terms of them. Conversely,<br />
<br />
==Haskell doesn't need Monads==<br />
...well, apart from the Haskell standard defining the way IO is done in terms of Monads: It could be done differently and still work.<br />
<br />
==Monads are not impure==<br />
...In no way whatsoever. You don't even need flexible morals to claim it. To be more specific, it's IO that's impure. That makes the IO monad impure. But that's not a general property of monads - just IO. And even then, we can pretend that Haskell is a purely functional description language for imperative programs. But we didn't want to employ flexible morals, now did we?<br />
<br />
==Monads are not about state==<br />
While it is certainly possible to abstract away explicit state passing by using a Monad, that's not what a monad is.<br />
<br />
(some elaboration needed)<br />
<br />
==Monads are not about strictness==<br />
Monad operations (bind and return) have to be lazy in fact, always! However<br />
other operations can be specific to each monad.<br />
For instance some are strict (like IO), and some are lazy (like []). Then there are some that come in multiple flavours, like State.<br />
<br />
==Monads are not values==<br />
This point might be driven home best by pointing out that instance Monad Foo where ... is not a data type, but a declaration of a typeclass instance. However, to elaborate:<br />
<br />
Monads are not values in the same sense that addition and multiplication are not numbers: They capture a -- very specific -- relationship between values of a specific domain into a common abstraction. We're going to call these values monads manage ''mobits'', somewhat like this:<br />
<br />
type Mobit m a = Monad m => m a<br />
<br />
The IO monad manages mobits representing side-effects ("IO actions").<br />
<br />
The List monad manages mobits representing multiple values ("[a]")<br />
<br />
The Reader monads manages mobits that are pure computations that use asks to propagate information instead of explicit arguments<br />
<br />
...and while addition and multiplication are both monoids over the positive natural numbers, a monad is a monoid object in a category of endofunctors: return is the unit, and join is the binary operation. It couldn't be more simple.<br />
<br />
<br />
<br />
==Monads are not a replacement for applicative functors==<br />
Instead, every monad ''is'' an applicative functor (as well as a functor). It is considered good practice not to use >>= if all you need is <*>, or even fmap.<br />
<br />
Not confusing which features of monads are specific to monads only and which stem from applicative functors is vitally important for a deeper understanding of monads. As an example, the applicative functor interface of parser libraries can parse context-free grammars (and look just like EBNF), while the monadic interface can parse context-sensitive grammars: Monads allow you to influence further processing by inspecting the result of your parse. To understand why, have a look at the type of >>=. To understand why applicative functors by themselves are sufficient to track the current parsing position, have a look at the uu-parsinglib tutorial.<br />
<br />
The exact differences are elaborated in even greater detail in Brent Yorgey's excellent [[Typeclassopedia]].<br />
<br />
==Monads are not about ordering==<br />
It's a commonplace that monads are about ordering sequences of computations. But this is misleading. Just as you can use monads for state, or strictness, you can use them to order computations. But there are also commutative monads, like Maybe, that don't order anything. So ordering is not in any way essential to what a monad is.<br />
<br />
I'll try to explain what's meant by ordering. Consider an expression like<br />
<br />
let x = a<br />
y = b<br />
in f x y<br />
<br />
That gives the same result as<br />
<br />
let y = b<br />
x = a<br />
in f x y<br />
<br />
It doesn't matter what order we write the two assignments. But for doing I/O we'd like ordering. Monads allow use to express<br />
<br />
do<br />
x <- getChar<br />
y <- getChar<br />
return (x,y)<br />
<br />
and have it be different to<br />
<br />
do<br />
y <- getChar<br />
x <- getChar<br />
return (x,y)<br />
<br />
The second example returns a pair of characters in the opposite order to which they were entered.<br />
<br />
However, there are monads for which swapping the order of lines like this makes no difference. For example the Maybe monad.<br />
<br />
So while it is correct to say that monads can be used to order operations, it would be wrong to say that monads are a mechanism for ordering operations.<br />
<br />
This notion of commutativity is different from the familiar the one in algebra where a+b=b+a. So it's not about the fact that<br />
<br />
(Just 2 >> Just 3) == Just 3<br />
<br />
(It took long to explain that, maybe someone can edit this down.)</div>Joehttps://wiki.haskell.org/index.php?title=AngloHaskell/2009&diff=29427AngloHaskell/20092009-08-04T14:31:19Z<p>Joe: Moved myself to "definite" list.</p>
<hr />
<div>AngloHaskell 2009 is taking place on the 7th of August at MSR Cambridge, with further activities on the 8th. It's free, and everyone is invited! Simply add your name to the wiki and we'll see you there :-)<br />
<br />
Organisational contact: Neil Mitchell, 07876 126 574. If you are lost or confused just give Neil a ring. If Neil's phone is busy, you can also drop Sam Martin a line on 07947 249 476. If you need help when you arrive at the train station, [[User:Peter McArthur|Peter McArthur]] (07804 596282) lives just nearby.<br />
<br />
We're still looking for people willing to put someone up for the night (even if on a floor) would also be much appreciated. Any volunteers?<br />
<br />
== Date and Venue ==<br />
<br />
7th-8th of August in Cambridge, UK, starting with talks at Microsoft Research and with more planning to happen below.<br />
<br />
=== Directions to MSR ===<br />
<br />
MSR has [http://research.microsoft.com/aboutmsr/visitmsr/cambridge/directions.aspx some directions], which can be best summarised as ‘get a taxi’. Here is (hopefully) a [http://earth.google.com/ Google Earth] [[Media:Microsoft_Research,_Cambridge.kmz|location]] of MSR, as well as a [http://maps.google.com/maps?q=CB3+0FB&ll=52.211499,0.117073&spn=0.02677,0.086517 Google Maps link]. (J J Thomson Avenue is immediately west of Clerk Maxwell Road.)<br />
<br />
If the weather is co-operative, the best way to get around Cambridge is by bike. If you're bringing a bike, you could ask [[User:Peter McArthur|Peter McArthur]] to be your guide.<br />
<br />
If you do take a taxi and the driver doesn't know where it is, tell him or her to drive down Madingley Road until you reach the West Cambridge site, J J Thomson Avenue. The Computer Laboratory (next door) has [http://www.cl.cam.ac.uk/UoCCL/contacts/#gettinghere marginally better instructions].<br />
<br />
The fastest way to MSR (on foot and public transport) from the station is to [http://maps.google.com/maps?saddr=CB1+2JW&daddr=Trumpington+Road,+Cambridge cut through to Trumpington Road via Bateman Street] (don't follow the driving directions!), and take the Citi 4 or Uni 4. There's a bus stop just across the road from Bateman Street.<br />
<br />
To get to the city centre by bus, take the Citi 1 or Citi 3. Do ask to make sure they're going in the right direction though! There are also a number of clearly marked shuttle busses between the centre and station running during the day every 10 minutes or so.<br />
<br />
To walk to the centre (20 minutes not carrying luggage), go straight down the road facing you when you come out of the station, bear right when the road ends at some traffic lights / a WW1 memorial / the botanic gardens, and keep walking straight (Hills Road / Regent St / St Andrews St) for quite a while until you reach a pedestrianised bit, at which point you are in the centre.<br />
<br />
From the city centre to MSR, you can catch the number 77 Madingley Road Park and Ride which goes from bus stop M on Emma St. (Or find your way to Pembroke or Silver Street, and catch the Citi 4 / Uni 4 from there.) (Note that the 77 doesn't stop by MSR any more, it goes to the park and ride from which you have to walk back, 10-15 mins. This caught me out the other day --SimonM)<br />
<br />
==== Parking ====<br />
<br />
To be verified:<br />
<br />
Some parking spaces will be available around the back of the MSR building. To get out again, drivers will need to talk to reception to obtain a token.<br />
<br />
== Attendees ==<br />
<br />
Per last year, all attendees should '''bring or make a nametag''' that identifies you by your real name and/or IRC name. If anyone wants to drag a roll of stickers and a pen along that'll help!<br />
<br />
If you can't make the start on Friday, or can only make it on Saturday, that's fine. If you're not sure where everyone's going to be, give one of the contacts a call or a text.<br />
<br />
=== Definite ===<br />
<br />
* Philippa Cowderoy<br />
* Neil Mitchell<br />
* Eric Kow<br />
* Tom Schrijvers<br />
* Eric Macaulay<br />
* Peter McArthur<br />
* Tristan Allwood (Friday only)<br />
* Neil Brown (Friday only)<br />
* Sam Martin<br />
* Thomas Schilling<br />
* Edwin Brady<br />
* Tony Cowderoy<br />
* Ashley Moran<br />
* Richard Smith<br />
* Tom Ellis<br />
* A O Van Emmenis (Friday only)<br />
* Joe Edmonds<br />
<br />
=== Possible ===<br />
<br />
* Lennart Augustsson<br />
* Magnus Therning<br />
* Michael Dever (Travelling over from Ireland, so if anyone else is going, get in touch :) )<br />
* Ganesh Sittampalam<br />
* Cal Paterson<br />
* Jón Fairbairn (probably only Friday afternoon)<br />
* Michael Furniss<br />
* James Rowe (Friday only)<br />
* Richard Barrell<br />
* Jon Pretty<br />
<br />
=== Wifi signup ===<br />
<br />
Wifi accounts are available on request. The signup deadline's the 31st of July. Everyone wanting an account should provide:<br />
<br />
* Full name<br />
* Institution<br />
* Country of residence<br />
* email address<br />
<br />
'''Signups here:'''<br />
<br />
If you'd prefer not to give details here, please email Philippa at ''flippa at flippac dot org'' with the subject "Wifi signup".<br />
<br />
* Philippa Cowderoy, flippac.org, UK, flippa at flippac dot org<br />
* Ganesh Sittampalam, Credit Suisse, UK, ganesh.sittampalam@credit-suisse.com<br />
* Neil Brown, University of Kent, UK, nccb2@kent.ac.uk<br />
* Eric Kow, University of Brighton, UK, kowey at darcs dot net<br />
* Peter McArthur, dysfunctor.org, UK, peter dot mcarthur at gmail dot com<br />
* Tristan Allwood, Imperial College, UK, tora@zonetora.co.uk<br />
* Edwin Brady, University of St Andrews, UK, eb@cs.st-and.ac.uk<br />
* Tony Cowderoy, MML, UK, tony dot cowderoy at mml-net dot com<br />
* Richard Barrell, ???, UK, mycatverbs at gmail dot com.<br />
<br />
== Lodging ==<br />
<br />
It's likely that there'll be people in need of crashspace and so forth, so please organise here! Both offers and requests are good.<br />
<br />
* I live in a studio flat near the station. I could accommodate one person, but if you value your personal space or privacy then this isn't the place for you. [[User:Peter McArthur|Peter McArthur]]<br />
<br />
=== Nearby Colleges ===<br />
<br />
Many of undergraduate colleges offer cheap accommodation over the holidays. Locations near MSR include Churchill College ([http://www.cambridgerooms.co.uk/book/ online booking]), Wolfson Court (an annexe of Girton College), Fitzwillian College, Robinson College, <del>New Hall</del> <ins>Murray Edwards</ins> (female only; recently renamed) and Burwells Field (an annexe of Trinity College). ([http://www.cam.ac.uk/map/v4/drawmap.cgi?mp=main;xx=900;yy=560;mt=c;mx=759;my=467;ms=75;tl=Microsoft%20Research This map] might prove useful.)<br />
<br />
=== Hostels ===<br />
<br />
There's a fairly inexpensive [http://www.yha.org.uk/find-accommodation/east-of-england/hostels/cambridge/index.aspx YHA hostel] in Cambridge.<br />
<br />
Another guest house right next to the station is Tenison Towers (01223 363924).<br />
<br />
== Programme ==<br />
<br />
Planning will be taking place on IRC as per previous years: #anglohaskell on irc.freenode.net<br />
<br />
If you're having trouble following things on IRC, the discussion page on the wiki might be a good place to leave comments and questions.<br />
<br />
Previous years in Cambridge we had talks in the day on a Friday, followed by pubbage in the evening and assorted activities on the Saturday. This seemed to work, so we'll follow a similar model this year. Sadly we can't have talk space at MSR on a Saturday.<br />
<br />
=== Timetable ===<br />
<br />
This is somewhat preliminary and subject to change as talks are confirmed or otherwise, but the overall structure should hold: <br />
<br />
{| class="wikitable"<br />
|-<br />
! Day !! Time !! Event<br />
|-<br />
| Friday || 10am || People start arriving at MS Research<br />
|-<br />
| || 10:30 am || Tea, coffee and biscuits<br />
|-<br />
| || 11am || Keynote<br />
|-<br />
| || shortly after || Talk 1<br />
|-<br />
| || ~11:30 pm || Talk 2<br />
|-<br />
| || ~12:00 pm || Talk 3<br />
|-<br />
| || ~12:30 pm || Talk 4<br />
|-<br />
| || 1pm || Lunch<br />
|-<br />
| || 2pm || Future of Anglohaskell<br />
|-<br />
| || 2:??pm || More talks<br />
|-<br />
| || 3:30pm || Tea, coffee and biscuits<br />
|-<br />
| || 4pm || Remaining talks<br />
|-<br />
| || 4:??pm || Functional Grit - small talks that may grow into functional pearls. Open session, anyone can give a quick talk!<br />
|-<br />
| || When people get hungry or MSR kick us out || Food! Likely we'll head out for a curry<br />
|-<br />
| || Beer o'Clock || When everyone's finished eating, we'll head for a nearby pub<br />
|-<br />
| Saturday || 11am || Brunch, chat and impromptu hacking at [http://www.beerintheevening.com/pubs/s/13/1361/Regal/Cambridge The Regal] - at least someone will stay on until 1pm, next activity may start earlier though, anyone who may show up late should keep phone numbers for one or more of the contacts<br />
|-<br />
| || 1pm || Afternoon activities - probably punting if it's not raining, failing that we'll find something<br />
|-<br />
| || When everyone gets tired/hungry || We'll retire to a pub for food, drink, chat and perhaps hacking. A pub with wifi'll be preferred, so feel free to bring a laptop or PDA!<br />
|}<br />
<br />
=== Talks ===<br />
<br />
Volunteers please! Previously we have had a largely more practical set of talks than you might find at Fun in the Afternoon or an academic event. This was a good thing, and some of the best talks were from people who were far from considering themselves as experts, so feel free to tell us about your experiences.<br />
<br />
In the event that more talks are offered than we have time for at MSR, we'll have to work out what we can do to find more time.<br />
<br />
Talks planned and/or offered:<br />
<br />
* Neil Mitchell - hopefully "Make Considered Harmful"<br />
* Tom Schrijvers - "Monadic Constraint Programming"<br />
* Tristan Allwood - "Using the GHC API to automatically find errors"<br />
* Neil Brown - "CSP Models on the Cheap" (like several others, I spoke last year -- so others should have priority if we get too many talks)<br />
* Sam Martin - "Functional languages in games development: plotting the coup"<br />
<br />
==== Abstracts ====<br />
<br />
People giving talks should add these as they have them :-)<br />
<br />
* Monadic Constraint Programming<br />
<br />
A constraint programming system combines two essential components: a constraint solver and a search engine. The constraint solver reasons about satisfiability of conjunctions of constraints, and the search engine controls the search for solutions by iteratively exploring a disjunctive search tree defined by the constraint program.<br />
<br />
The Monadic Constraint Programming framework gives a monadic definition of constraint programming where the solver is defined as a monad threaded through the monadic search tree. Search and search strategies can then be defined as first-class objects that can themselves be built or extended by composable search transformers. Search transformers give a powerful and unifying approach to viewing search in constraint programming, and the resulting constraint programming system is first class and extremely flexible. <br />
<br />
* CSP Models on the Cheap<br />
<br />
Hoare's Communicating Sequential Processes and the model-checker FDR provide a way to check implementations of concurrent programs against formal specifications and also to check for deadlock-freedom. The Communicating Haskell Processes (CHP) library already provides a way to implement CSP-style message-passing concurrency in Haskell using a CHP monad. In this talk, I discuss substituting the definition of the CHP monad for one that emits a formal model of the program, never requiring the full program to be executed and bypassing the need for source code analysis. This model can then be checked for deadlock or refinement of a specification. I will explain how several features of Haskell make this work possible, particularly monads, purity and lazy evaluation.<br />
<br />
* Make Considered Harmful<br />
<br />
The hardest part when writing a compiler for a functional language seems to be the make system - how to compile the compiler. GHC has rewritten its build system from scratch at least 3 times. Yhc died under 10,000 lines of Python Scons scripts. There have been many alternatives to make proposed (SCons, CMake ...) but none of them seem to work as well as one might hope. This talk discusses an alternative approach, writing a make system as a Haskell program with a suitable make library providing a convenient DSL. Practical experience suggests that this approach is the only sensible choice for a build system.<br />
<br />
* Functional languages in games development: plotting the coup<br />
<br />
As a games developer by trade, my experience of the industry leads me to suspect games development is approaching a tipping point where functional languages could enact a successful coup. The revolution would claim a chunk of C++-owned territory for the victor and mark an important milestone in the development of functional languages. It will not be easy. Games development is notoriously demanding and the successful functional language would need to meet stringent performance requirements, have clearly demonstrable 'killer apps', jump through hoops of fire and tell jokes at parties. This talk will discuss how close Haskell is to meeting these demands, the challenges that remain, evidence of functional languages already in games, and how Haskell compares against its nearest competitors.<br />
<br />
==== Functional Grit ====<br />
<br />
In previous years there has been a successful 'functional grit' section. Usually an informal session for people to briefly talk/demo works in progress, no need to pre-register, just turn up and talk. Think small stones that might turn into functional pearls. If there's time it'd be great to do again this year.<br />
<br />
=== Future of Anglohaskell ===<br />
<br />
In previous years there's not really been much of a plan - the first year was classic benevolent opportunism when the GHC maintainer interviews brought a number of people together, and then someone offered to run the next year in the pub each year. There was a bit of a hiccup this year. At the same time, four years starts to seem like tradition. Time to work out what the tradition should really be, no?<br />
<br />
Rather than a talk, this'll be a discussion session. Any and all ideas welcome. Organisers for future years doubly so!<br />
<br />
=== Other activity ===<br />
<br />
After Friday's talks, food and drink would be a good idea! Curry is traditional and probably the default, but we're open to other suggestions. After that, we'll retreat to a pub for the evening.<br />
<br />
Repeating previous years, I suggest we go to [http://www.beerintheevening.com/pubs/s/13/1361/Regal/Cambridge The Regal] for brunch on Saturday to kick off with. That's the Wetherspoons from previous years. After that, punting again if it's not raining too much? Any suggestions for if it's wet?<br />
<br />
[[User:PhilippaCowderoy|PhilippaCowderoy]]<br />
<br />
[[Category:Events]]</div>Joehttps://wiki.haskell.org/index.php?title=AngloHaskell/2009&diff=29424AngloHaskell/20092009-08-04T13:49:18Z<p>Joe: Added a link for online booking.</p>
<hr />
<div>AngloHaskell 2009 is taking place on the 7th of August at MSR Cambridge, with further activities on the 8th. It's free, and everyone is invited! Simply add your name to the wiki and we'll see you there :-)<br />
<br />
Organisational contact: Neil Mitchell, 07876 126 574. If you are lost or confused just give Neil a ring. If Neil's phone is busy, you can also drop Sam Martin a line on 07947 249 476. If you need help when you arrive at the train station, [[User:Peter McArthur|Peter McArthur]] (07804 596282) lives just nearby.<br />
<br />
We're still looking for people willing to put someone up for the night (even if on a floor) would also be much appreciated. Any volunteers?<br />
<br />
== Date and Venue ==<br />
<br />
7th-8th of August in Cambridge, UK, starting with talks at Microsoft Research and with more planning to happen below.<br />
<br />
=== Directions to MSR ===<br />
<br />
MSR has [http://research.microsoft.com/aboutmsr/visitmsr/cambridge/directions.aspx some directions], which can be best summarised as ‘get a taxi’. Here is (hopefully) a [http://earth.google.com/ Google Earth] [[Media:Microsoft_Research,_Cambridge.kmz|location]] of MSR, as well as a [http://maps.google.com/maps?q=CB3+0FB&ll=52.211499,0.117073&spn=0.02677,0.086517 Google Maps link]. (J J Thomson Avenue is immediately west of Clerk Maxwell Road.)<br />
<br />
If the weather is co-operative, the best way to get around Cambridge is by bike. If you're bringing a bike, you could ask [[User:Peter McArthur|Peter McArthur]] to be your guide.<br />
<br />
If you do take a taxi and the driver doesn't know where it is, tell him or her to drive down Madingley Road until you reach the West Cambridge site, J J Thomson Avenue. The Computer Laboratory (next door) has [http://www.cl.cam.ac.uk/UoCCL/contacts/#gettinghere marginally better instructions].<br />
<br />
The fastest way to MSR (on foot and public transport) from the station is to [http://maps.google.com/maps?saddr=CB1+2JW&daddr=Trumpington+Road,+Cambridge cut through to Trumpington Road via Bateman Street] (don't follow the driving directions!), and take the Citi 4 or Uni 4. There's a bus stop just across the road from Bateman Street.<br />
<br />
To get to the city centre by bus, take the Citi 1 or Citi 3. Do ask to make sure they're going in the right direction though! There are also a number of clearly marked shuttle busses between the centre and station running during the day every 10 minutes or so.<br />
<br />
To walk to the centre (20 minutes not carrying luggage), go straight down the road facing you when you come out of the station, bear right when the road ends at some traffic lights / a WW1 memorial / the botanic gardens, and keep walking straight (Hills Road / Regent St / St Andrews St) for quite a while until you reach a pedestrianised bit, at which point you are in the centre.<br />
<br />
From the city centre to MSR, you can catch the number 77 Madingley Road Park and Ride which goes from bus stop M on Emma St. (Or find your way to Pembroke or Silver Street, and catch the Citi 4 / Uni 4 from there.) (Note that the 77 doesn't stop by MSR any more, it goes to the park and ride from which you have to walk back, 10-15 mins. This caught me out the other day --SimonM)<br />
<br />
==== Parking ====<br />
<br />
To be verified:<br />
<br />
Some parking spaces will be available around the back of the MSR building. To get out again, drivers will need to talk to reception to obtain a token.<br />
<br />
== Attendees ==<br />
<br />
Per last year, all attendees should '''bring or make a nametag''' that identifies you by your real name and/or IRC name. If anyone wants to drag a roll of stickers and a pen along that'll help!<br />
<br />
If you can't make the start on Friday, or can only make it on Saturday, that's fine. If you're not sure where everyone's going to be, give one of the contacts a call or a text.<br />
<br />
=== Definite ===<br />
<br />
* Philippa Cowderoy<br />
* Neil Mitchell<br />
* Eric Kow<br />
* Tom Schrijvers<br />
* Eric Macaulay<br />
* Peter McArthur<br />
* Tristan Allwood (Friday only)<br />
* Neil Brown (Friday only)<br />
* Sam Martin<br />
* Thomas Schilling<br />
* Edwin Brady<br />
* Tony Cowderoy<br />
* Ashley Moran<br />
* Richard Smith<br />
* Tom Ellis<br />
* A O Van Emmenis (Friday only)<br />
<br />
=== Possible ===<br />
<br />
* Lennart Augustsson<br />
* Magnus Therning<br />
* Michael Dever (Travelling over from Ireland, so if anyone else is going, get in touch :) )<br />
* Ganesh Sittampalam<br />
* Cal Paterson<br />
* Jón Fairbairn (probably only Friday afternoon)<br />
* Michael Furniss<br />
* James Rowe (Friday only)<br />
* Richard Barrell<br />
* Jon Pretty<br />
* Joe Edmonds<br />
<br />
=== Wifi signup ===<br />
<br />
Wifi accounts are available on request. The signup deadline's the 31st of July. Everyone wanting an account should provide:<br />
<br />
* Full name<br />
* Institution<br />
* Country of residence<br />
* email address<br />
<br />
'''Signups here:'''<br />
<br />
If you'd prefer not to give details here, please email Philippa at ''flippa at flippac dot org'' with the subject "Wifi signup".<br />
<br />
* Philippa Cowderoy, flippac.org, UK, flippa at flippac dot org<br />
* Ganesh Sittampalam, Credit Suisse, UK, ganesh.sittampalam@credit-suisse.com<br />
* Neil Brown, University of Kent, UK, nccb2@kent.ac.uk<br />
* Eric Kow, University of Brighton, UK, kowey at darcs dot net<br />
* Peter McArthur, dysfunctor.org, UK, peter dot mcarthur at gmail dot com<br />
* Tristan Allwood, Imperial College, UK, tora@zonetora.co.uk<br />
* Edwin Brady, University of St Andrews, UK, eb@cs.st-and.ac.uk<br />
* Tony Cowderoy, MML, UK, tony dot cowderoy at mml-net dot com<br />
* Richard Barrell, ???, UK, mycatverbs at gmail dot com.<br />
<br />
== Lodging ==<br />
<br />
It's likely that there'll be people in need of crashspace and so forth, so please organise here! Both offers and requests are good.<br />
<br />
* I live in a studio flat near the station. I could accommodate one person, but if you value your personal space or privacy then this isn't the place for you. [[User:Peter McArthur|Peter McArthur]]<br />
<br />
=== Nearby Colleges ===<br />
<br />
Many of undergraduate colleges offer cheap accommodation over the holidays. Locations near MSR include Churchill College ([http://www.cambridgerooms.co.uk/book/ online booking]), Wolfson Court (an annexe of Girton College), Fitzwillian College, Robinson College, <del>New Hall</del> <ins>Murray Edwards</ins> (female only; recently renamed) and Burwells Field (an annexe of Trinity College). ([http://www.cam.ac.uk/map/v4/drawmap.cgi?mp=main;xx=900;yy=560;mt=c;mx=759;my=467;ms=75;tl=Microsoft%20Research This map] might prove useful.)<br />
<br />
=== Hostels ===<br />
<br />
There's a fairly inexpensive [http://www.yha.org.uk/find-accommodation/east-of-england/hostels/cambridge/index.aspx YHA hostel] in Cambridge.<br />
<br />
Another guest house right next to the station is Tenison Towers (01223 363924).<br />
<br />
== Programme ==<br />
<br />
Planning will be taking place on IRC as per previous years: #anglohaskell on irc.freenode.net<br />
<br />
If you're having trouble following things on IRC, the discussion page on the wiki might be a good place to leave comments and questions.<br />
<br />
Previous years in Cambridge we had talks in the day on a Friday, followed by pubbage in the evening and assorted activities on the Saturday. This seemed to work, so we'll follow a similar model this year. Sadly we can't have talk space at MSR on a Saturday.<br />
<br />
=== Timetable ===<br />
<br />
This is somewhat preliminary and subject to change as talks are confirmed or otherwise, but the overall structure should hold: <br />
<br />
{| class="wikitable"<br />
|-<br />
! Day !! Time !! Event<br />
|-<br />
| Friday || 10am || People start arriving at MS Research<br />
|-<br />
| || 10:30 am || Tea, coffee and biscuits<br />
|-<br />
| || 11am || Keynote<br />
|-<br />
| || shortly after || Talk 1<br />
|-<br />
| || ~11:30 pm || Talk 2<br />
|-<br />
| || ~12:00 pm || Talk 3<br />
|-<br />
| || ~12:30 pm || Talk 4<br />
|-<br />
| || 1pm || Lunch<br />
|-<br />
| || 2pm || Future of Anglohaskell<br />
|-<br />
| || 2:??pm || More talks<br />
|-<br />
| || 3:30pm || Tea, coffee and biscuits<br />
|-<br />
| || 4pm || Remaining talks<br />
|-<br />
| || 4:??pm || Functional Grit - small talks that may grow into functional pearls. Open session, anyone can give a quick talk!<br />
|-<br />
| || When people get hungry or MSR kick us out || Food! Likely we'll head out for a curry<br />
|-<br />
| || Beer o'Clock || When everyone's finished eating, we'll head for a nearby pub<br />
|-<br />
| Saturday || 11am || Brunch, chat and impromptu hacking at [http://www.beerintheevening.com/pubs/s/13/1361/Regal/Cambridge The Regal] - at least someone will stay on until 1pm, next activity may start earlier though, anyone who may show up late should keep phone numbers for one or more of the contacts<br />
|-<br />
| || 1pm || Afternoon activities - probably punting if it's not raining, failing that we'll find something<br />
|-<br />
| || When everyone gets tired/hungry || We'll retire to a pub for food, drink, chat and perhaps hacking. A pub with wifi'll be preferred, so feel free to bring a laptop or PDA!<br />
|}<br />
<br />
=== Talks ===<br />
<br />
Volunteers please! Previously we have had a largely more practical set of talks than you might find at Fun in the Afternoon or an academic event. This was a good thing, and some of the best talks were from people who were far from considering themselves as experts, so feel free to tell us about your experiences.<br />
<br />
In the event that more talks are offered than we have time for at MSR, we'll have to work out what we can do to find more time.<br />
<br />
Talks planned and/or offered:<br />
<br />
* Neil Mitchell - hopefully "Make Considered Harmful"<br />
* Tom Schrijvers - "Monadic Constraint Programming"<br />
* Tristan Allwood - "Using the GHC API to automatically find errors"<br />
* Neil Brown - "CSP Models on the Cheap" (like several others, I spoke last year -- so others should have priority if we get too many talks)<br />
* Sam Martin - "Functional languages in games development: plotting the coup"<br />
<br />
==== Abstracts ====<br />
<br />
People giving talks should add these as they have them :-)<br />
<br />
* Monadic Constraint Programming<br />
<br />
A constraint programming system combines two essential components: a constraint solver and a search engine. The constraint solver reasons about satisfiability of conjunctions of constraints, and the search engine controls the search for solutions by iteratively exploring a disjunctive search tree defined by the constraint program.<br />
<br />
The Monadic Constraint Programming framework gives a monadic definition of constraint programming where the solver is defined as a monad threaded through the monadic search tree. Search and search strategies can then be defined as first-class objects that can themselves be built or extended by composable search transformers. Search transformers give a powerful and unifying approach to viewing search in constraint programming, and the resulting constraint programming system is first class and extremely flexible. <br />
<br />
* CSP Models on the Cheap<br />
<br />
Hoare's Communicating Sequential Processes and the model-checker FDR provide a way to check implementations of concurrent programs against formal specifications and also to check for deadlock-freedom. The Communicating Haskell Processes (CHP) library already provides a way to implement CSP-style message-passing concurrency in Haskell using a CHP monad. In this talk, I discuss substituting the definition of the CHP monad for one that emits a formal model of the program, never requiring the full program to be executed and bypassing the need for source code analysis. This model can then be checked for deadlock or refinement of a specification. I will explain how several features of Haskell make this work possible, particularly monads, purity and lazy evaluation.<br />
<br />
* Make Considered Harmful<br />
<br />
The hardest part when writing a compiler for a functional language seems to be the make system - how to compile the compiler. GHC has rewritten its build system from scratch at least 3 times. Yhc died under 10,000 lines of Python Scons scripts. There have been many alternatives to make proposed (SCons, CMake ...) but none of them seem to work as well as one might hope. This talk discusses an alternative approach, writing a make system as a Haskell program with a suitable make library providing a convenient DSL. Practical experience suggests that this approach is the only sensible choice for a build system.<br />
<br />
* Functional languages in games development: plotting the coup<br />
<br />
As a games developer by trade, my experience of the industry leads me to suspect games development is approaching a tipping point where functional languages could enact a successful coup. The revolution would claim a chunk of C++-owned territory for the victor and mark an important milestone in the development of functional languages. It will not be easy. Games development is notoriously demanding and the successful functional language would need to meet stringent performance requirements, have clearly demonstrable 'killer apps', jump through hoops of fire and tell jokes at parties. This talk will discuss how close Haskell is to meeting these demands, the challenges that remain, evidence of functional languages already in games, and how Haskell compares against its nearest competitors.<br />
<br />
==== Functional Grit ====<br />
<br />
In previous years there has been a successful 'functional grit' section. Usually an informal session for people to briefly talk/demo works in progress, no need to pre-register, just turn up and talk. Think small stones that might turn into functional pearls. If there's time it'd be great to do again this year.<br />
<br />
=== Future of Anglohaskell ===<br />
<br />
In previous years there's not really been much of a plan - the first year was classic benevolent opportunism when the GHC maintainer interviews brought a number of people together, and then someone offered to run the next year in the pub each year. There was a bit of a hiccup this year. At the same time, four years starts to seem like tradition. Time to work out what the tradition should really be, no?<br />
<br />
Rather than a talk, this'll be a discussion session. Any and all ideas welcome. Organisers for future years doubly so!<br />
<br />
=== Other activity ===<br />
<br />
After Friday's talks, food and drink would be a good idea! Curry is traditional and probably the default, but we're open to other suggestions. After that, we'll retreat to a pub for the evening.<br />
<br />
Repeating previous years, I suggest we go to [http://www.beerintheevening.com/pubs/s/13/1361/Regal/Cambridge The Regal] for brunch on Saturday to kick off with. That's the Wetherspoons from previous years. After that, punting again if it's not raining too much? Any suggestions for if it's wet?<br />
<br />
[[User:PhilippaCowderoy|PhilippaCowderoy]]<br />
<br />
[[Category:Events]]</div>Joehttps://wiki.haskell.org/index.php?title=AngloHaskell/2009&diff=29421AngloHaskell/20092009-08-03T11:23:27Z<p>Joe: Added myself.</p>
<hr />
<div>AngloHaskell 2009 is taking place on the 7th of August at MSR Cambridge, with further activities on the 8th. It's free, and everyone is invited! Simply add your name to the wiki and we'll see you there :-)<br />
<br />
Organisational contact: Neil Mitchell, 07876 126 574. If you are lost or confused just give Neil a ring. If Neil's phone is busy, you can also drop Sam Martin a line on 07947 249 476. If you need help when you arrive at the train station, [[User:Peter McArthur|Peter McArthur]] (07804 596282) lives just nearby.<br />
<br />
We're still looking for people willing to put someone up for the night (even if on a floor) would also be much appreciated. Any volunteers?<br />
<br />
== Date and Venue ==<br />
<br />
7th-8th of August in Cambridge, UK, starting with talks at Microsoft Research and with more planning to happen below.<br />
<br />
=== Directions to MSR ===<br />
<br />
MSR has [http://research.microsoft.com/aboutmsr/visitmsr/cambridge/directions.aspx some directions], which can be best summarised as ‘get a taxi’. Here is (hopefully) a [http://earth.google.com/ Google Earth] [[Media:Microsoft_Research,_Cambridge.kmz|location]] of MSR, as well as a [http://maps.google.com/maps?q=CB3+0FB&ll=52.211499,0.117073&spn=0.02677,0.086517 Google Maps link]. (J J Thomson Avenue is immediately west of Clerk Maxwell Road.)<br />
<br />
If the weather is co-operative, the best way to get around Cambridge is by bike. If you're bringing a bike, you could ask [[User:Peter McArthur|Peter McArthur]] to be your guide.<br />
<br />
If you do take a taxi and the driver doesn't know where it is, tell him or her to drive down Madingley Road until you reach the West Cambridge site, J J Thomson Avenue. The Computer Laboratory (next door) has [http://www.cl.cam.ac.uk/UoCCL/contacts/#gettinghere marginally better instructions].<br />
<br />
The fastest way to MSR (on foot and public transport) from the station is to [http://maps.google.com/maps?saddr=CB1+2JW&daddr=Trumpington+Road,+Cambridge cut through to Trumpington Road via Bateman Street] (don't follow the driving directions!), and take the Citi 4 or Uni 4. There's a bus stop just across the road from Bateman Street.<br />
<br />
To get to the city centre by bus, take the Citi 1 or Citi 3. Do ask to make sure they're going in the right direction though! There are also a number of clearly marked shuttle busses between the centre and station running during the day every 10 minutes or so.<br />
<br />
To walk to the centre (20 minutes not carrying luggage), go straight down the road facing you when you come out of the station, bear right when the road ends at some traffic lights / a WW1 memorial / the botanic gardens, and keep walking straight (Hills Road / Regent St / St Andrews St) for quite a while until you reach a pedestrianised bit, at which point you are in the centre.<br />
<br />
From the city centre to MSR, you can catch the number 77 Madingley Road Park and Ride which goes from bus stop M on Emma St. (Or find your way to Pembroke or Silver Street, and catch the Citi 4 / Uni 4 from there.) (Note that the 77 doesn't stop by MSR any more, it goes to the park and ride from which you have to walk back, 10-15 mins. This caught me out the other day --SimonM)<br />
<br />
==== Parking ====<br />
<br />
To be verified:<br />
<br />
Some parking spaces will be available around the back of the MSR building. To get out again, drivers will need to talk to reception to obtain a token.<br />
<br />
== Attendees ==<br />
<br />
Per last year, all attendees should '''bring or make a nametag''' that identifies you by your real name and/or IRC name. If anyone wants to drag a roll of stickers and a pen along that'll help!<br />
<br />
If you can't make the start on Friday, or can only make it on Saturday, that's fine. If you're not sure where everyone's going to be, give one of the contacts a call or a text.<br />
<br />
=== Definite ===<br />
<br />
* Philippa Cowderoy<br />
* Neil Mitchell<br />
* Eric Kow<br />
* Tom Schrijvers<br />
* Eric Macaulay<br />
* Peter McArthur<br />
* Tristan Allwood (Friday only)<br />
* Neil Brown (Friday only)<br />
* Sam Martin<br />
* Thomas Schilling<br />
* Edwin Brady<br />
* Tony Cowderoy<br />
* Ashley Moran<br />
* Richard Smith<br />
* Tom Ellis<br />
* A O Van Emmenis (Friday only)<br />
<br />
=== Possible ===<br />
<br />
* Lennart Augustsson<br />
* Magnus Therning<br />
* Michael Dever (Travelling over from Ireland, so if anyone else is going, get in touch :) )<br />
* Ganesh Sittampalam<br />
* Cal Paterson<br />
* Jón Fairbairn (probably only Friday afternoon)<br />
* Michael Furniss<br />
* James Rowe (Friday only)<br />
* Richard Barrell<br />
* Jon Pretty<br />
* Joe Edmonds<br />
<br />
=== Wifi signup ===<br />
<br />
Wifi accounts are available on request. The signup deadline's the 31st of July. Everyone wanting an account should provide:<br />
<br />
* Full name<br />
* Institution<br />
* Country of residence<br />
* email address<br />
<br />
'''Signups here:'''<br />
<br />
If you'd prefer not to give details here, please email Philippa at ''flippa at flippac dot org'' with the subject "Wifi signup".<br />
<br />
* Philippa Cowderoy, flippac.org, UK, flippa at flippac dot org<br />
* Ganesh Sittampalam, Credit Suisse, UK, ganesh.sittampalam@credit-suisse.com<br />
* Neil Brown, University of Kent, UK, nccb2@kent.ac.uk<br />
* Eric Kow, University of Brighton, UK, kowey at darcs dot net<br />
* Peter McArthur, dysfunctor.org, UK, peter dot mcarthur at gmail dot com<br />
* Tristan Allwood, Imperial College, UK, tora@zonetora.co.uk<br />
* Edwin Brady, University of St Andrews, UK, eb@cs.st-and.ac.uk<br />
* Tony Cowderoy, MML, UK, tony dot cowderoy at mml-net dot com<br />
* Richard Barrell, ???, UK, mycatverbs at gmail dot com.<br />
<br />
== Lodging ==<br />
<br />
It's likely that there'll be people in need of crashspace and so forth, so please organise here! Both offers and requests are good.<br />
<br />
* I live in a studio flat near the station. I could accommodate one person, but if you value your personal space or privacy then this isn't the place for you. [[User:Peter McArthur|Peter McArthur]]<br />
<br />
=== Nearby Colleges ===<br />
<br />
Many of undergraduate colleges offer cheap accommodation over the holidays. Locations near MSR include Churchill College, Wolfson Court (an annexe of Girton College), Fitzwillian College, Robinson College, <del>New Hall</del> <ins>Murray Edwards</ins> (female only; recently renamed) and Burwells Field (an annexe of Trinity College). ([http://www.cam.ac.uk/map/v4/drawmap.cgi?mp=main;xx=900;yy=560;mt=c;mx=759;my=467;ms=75;tl=Microsoft%20Research This map] might prove useful.)<br />
<br />
=== Hostels ===<br />
<br />
There's a fairly inexpensive [http://www.yha.org.uk/find-accommodation/east-of-england/hostels/cambridge/index.aspx YHA hostel] in Cambridge.<br />
<br />
Another guest house right next to the station is Tenison Towers (01223 363924).<br />
<br />
== Programme ==<br />
<br />
Planning will be taking place on IRC as per previous years: #anglohaskell on irc.freenode.net<br />
<br />
If you're having trouble following things on IRC, the discussion page on the wiki might be a good place to leave comments and questions.<br />
<br />
Previous years in Cambridge we had talks in the day on a Friday, followed by pubbage in the evening and assorted activities on the Saturday. This seemed to work, so we'll follow a similar model this year. Sadly we can't have talk space at MSR on a Saturday.<br />
<br />
=== Timetable ===<br />
<br />
This is somewhat preliminary and subject to change as talks are confirmed or otherwise, but the overall structure should hold: <br />
<br />
{| class="wikitable"<br />
|-<br />
! Day !! Time !! Event<br />
|-<br />
| Friday || 10am || People start arriving at MS Research<br />
|-<br />
| || 10:30 am || Tea, coffee and biscuits<br />
|-<br />
| || 11am || Keynote<br />
|-<br />
| || shortly after || Talk 1<br />
|-<br />
| || ~11:30 pm || Talk 2<br />
|-<br />
| || ~12:00 pm || Talk 3<br />
|-<br />
| || ~12:30 pm || Talk 4<br />
|-<br />
| || 1pm || Lunch<br />
|-<br />
| || 2pm || Future of Anglohaskell<br />
|-<br />
| || 2:??pm || More talks<br />
|-<br />
| || 3:30pm || Tea, coffee and biscuits<br />
|-<br />
| || 4pm || Remaining talks<br />
|-<br />
| || 4:??pm || Functional Grit - small talks that may grow into functional pearls. Open session, anyone can give a quick talk!<br />
|-<br />
| || When people get hungry or MSR kick us out || Food! Likely we'll head out for a curry<br />
|-<br />
| || Beer o'Clock || When everyone's finished eating, we'll head for a nearby pub<br />
|-<br />
| Saturday || 11am || Brunch, chat and impromptu hacking at [http://www.beerintheevening.com/pubs/s/13/1361/Regal/Cambridge The Regal] - at least someone will stay on until 1pm, next activity may start earlier though, anyone who may show up late should keep phone numbers for one or more of the contacts<br />
|-<br />
| || 1pm || Afternoon activities - probably punting if it's not raining, failing that we'll find something<br />
|-<br />
| || When everyone gets tired/hungry || We'll retire to a pub for food, drink, chat and perhaps hacking. A pub with wifi'll be preferred, so feel free to bring a laptop or PDA!<br />
|}<br />
<br />
=== Talks ===<br />
<br />
Volunteers please! Previously we have had a largely more practical set of talks than you might find at Fun in the Afternoon or an academic event. This was a good thing, and some of the best talks were from people who were far from considering themselves as experts, so feel free to tell us about your experiences.<br />
<br />
In the event that more talks are offered than we have time for at MSR, we'll have to work out what we can do to find more time.<br />
<br />
Talks planned and/or offered:<br />
<br />
* Neil Mitchell - hopefully "Make Considered Harmful"<br />
* Tom Schrijvers - "Monadic Constraint Programming"<br />
* Tristan Allwood - "Using the GHC API to automatically find errors"<br />
* Neil Brown - "CSP Models on the Cheap" (like several others, I spoke last year -- so others should have priority if we get too many talks)<br />
* Sam Martin - "Functional languages in games development: plotting the coup"<br />
<br />
==== Abstracts ====<br />
<br />
People giving talks should add these as they have them :-)<br />
<br />
* Monadic Constraint Programming<br />
<br />
A constraint programming system combines two essential components: a constraint solver and a search engine. The constraint solver reasons about satisfiability of conjunctions of constraints, and the search engine controls the search for solutions by iteratively exploring a disjunctive search tree defined by the constraint program.<br />
<br />
The Monadic Constraint Programming framework gives a monadic definition of constraint programming where the solver is defined as a monad threaded through the monadic search tree. Search and search strategies can then be defined as first-class objects that can themselves be built or extended by composable search transformers. Search transformers give a powerful and unifying approach to viewing search in constraint programming, and the resulting constraint programming system is first class and extremely flexible. <br />
<br />
* CSP Models on the Cheap<br />
<br />
Hoare's Communicating Sequential Processes and the model-checker FDR provide a way to check implementations of concurrent programs against formal specifications and also to check for deadlock-freedom. The Communicating Haskell Processes (CHP) library already provides a way to implement CSP-style message-passing concurrency in Haskell using a CHP monad. In this talk, I discuss substituting the definition of the CHP monad for one that emits a formal model of the program, never requiring the full program to be executed and bypassing the need for source code analysis. This model can then be checked for deadlock or refinement of a specification. I will explain how several features of Haskell make this work possible, particularly monads, purity and lazy evaluation.<br />
<br />
* Make Considered Harmful<br />
<br />
The hardest part when writing a compiler for a functional language seems to be the make system - how to compile the compiler. GHC has rewritten its build system from scratch at least 3 times. Yhc died under 10,000 lines of Python Scons scripts. There have been many alternatives to make proposed (SCons, CMake ...) but none of them seem to work as well as one might hope. This talk discusses an alternative approach, writing a make system as a Haskell program with a suitable make library providing a convenient DSL. Practical experience suggests that this approach is the only sensible choice for a build system.<br />
<br />
* Functional languages in games development: plotting the coup<br />
<br />
As a games developer by trade, my experience of the industry leads me to suspect games development is approaching a tipping point where functional languages could enact a successful coup. The revolution would claim a chunk of C++-owned territory for the victor and mark an important milestone in the development of functional languages. It will not be easy. Games development is notoriously demanding and the successful functional language would need to meet stringent performance requirements, have clearly demonstrable 'killer apps', jump through hoops of fire and tell jokes at parties. This talk will discuss how close Haskell is to meeting these demands, the challenges that remain, evidence of functional languages already in games, and how Haskell compares against its nearest competitors.<br />
<br />
==== Functional Grit ====<br />
<br />
In previous years there has been a successful 'functional grit' section. Usually an informal session for people to briefly talk/demo works in progress, no need to pre-register, just turn up and talk. Think small stones that might turn into functional pearls. If there's time it'd be great to do again this year.<br />
<br />
=== Future of Anglohaskell ===<br />
<br />
In previous years there's not really been much of a plan - the first year was classic benevolent opportunism when the GHC maintainer interviews brought a number of people together, and then someone offered to run the next year in the pub each year. There was a bit of a hiccup this year. At the same time, four years starts to seem like tradition. Time to work out what the tradition should really be, no?<br />
<br />
Rather than a talk, this'll be a discussion session. Any and all ideas welcome. Organisers for future years doubly so!<br />
<br />
=== Other activity ===<br />
<br />
After Friday's talks, food and drink would be a good idea! Curry is traditional and probably the default, but we're open to other suggestions. After that, we'll retreat to a pub for the evening.<br />
<br />
Repeating previous years, I suggest we go to [http://www.beerintheevening.com/pubs/s/13/1361/Regal/Cambridge The Regal] for brunch on Saturday to kick off with. That's the Wetherspoons from previous years. After that, punting again if it's not raining too much? Any suggestions for if it's wet?<br />
<br />
[[User:PhilippaCowderoy|PhilippaCowderoy]]<br />
<br />
[[Category:Events]]</div>Joehttps://wiki.haskell.org/index.php?title=Xmonad/Using_xmonad_in_Gnome&diff=28718Xmonad/Using xmonad in Gnome2009-06-21T01:03:12Z<p>Joe: me too!</p>
<hr />
<div>{{xmonad}}<br />
[[Category:XMonad]]<br />
<br />
==Introduction==<br />
<br />
[[Image:Screen-nomeata-ewhm.png|200px|A screenshot of xmonad cooperating with gnome|center]]<br />
<br />
Xmonad makes an excellent drop-in replacement for Gnome's default window manager (metacity) giving you a slick tiling window manager. This guide will help you set up Gnome to use Xmonad 0.7.<br />
<br />
This is an update to the previous page on [[Xmonad/Using xmonad in Gnome/0.6]].<br />
<br />
==Setting up Gnome to use Xmonad==<br />
<br />
===.gnomerc===<br />
<br />
The easiest way is to let Gnome start Xmonad itself by setting the environment variable WINDOW_MANAGER to point to the xmonad executable before the Gnome session manager starts. The best way to do this is to edit ''~/.gnomerc'' to contain:<br />
<br />
export WINDOW_MANAGER=xmonad<br />
<br />
<br />
If Gnome is hanging on startup, try placing the following lines in ''~/.xsession'' instead:<br />
<br />
export WINDOW_MANAGER=xmonad<br />
exec gnome-session --purge-delay=3000<br />
<br />
(If the xmonad binary is not installed to one of the directories in your display manager $PATH environment, you must give the full path, i.e.:<br />
<br />
export WINDOW_MANAGER=${HOME}/bin/xmonad<br />
<br />
When using a directory outside your display manager path and customizing the configuration with ~/.xmonad/xmonad.hs, you will also want to modify your restart keybinding in xmonad.hs to use the full path. <br />
<br />
To edit the mod-q binding to use /path/to/xmonad, see the keybindings in the sample xmonad.hs included with xmonad documentation, source code for Config.hs, or the template xmonad.hs for your xmonad version from the [[Xmonad/Config_archive|xmonad config archive]].)<br />
<br />
===gnome-session===<br />
<br />
Another way to do this is to configure the session. Go to ''gnome-session-properties'' (or Desktop -> Preferences -> Sessions) -> Current Session, select Metacity and change style to Trash. Add ''xmonad'' to Startup Programs and run:<br />
<br />
$ killall metacity; xmonad &<br />
<br />
Close all programs, and in ''gnome-session-properties'', go to Session Options and click on Remember Currently Running Applications.<br />
<br />
===Ubuntu Intrepid===<br />
[http://ubuntuforums.org/showthread.php?t=975329 This forum thread] has instructions for making Gnome play nice with xmonad on intrepid.<br />
<br />
===Ubuntu Jaunty===<br />
At least 3 XMonad users have found that the <tt>~/.gnomerc</tt> will not work on Jaunty Ubuntu when one is upgrading from Intrepid; apparently the <tt>~/.gconf/</tt> directory is incompatible or something, so Gnome/Ubuntu will not read .gnomerc and any settings in it will be ignored. <br />
<br />
The work-around is essentially to remove .gconf entirely. On the next login, a fresh default .gconf will be created and .gnomerc will be read. This of course implies that one's settings and preferences will also be removed, and one will have to redo them. (Copying over selected directories from the old .gconf to the new one may or may not work.)<br />
<br />
Or alternatively, the following worked for me (without touching .gconf or .gnomerc or exports):<br />
Add an xmonad launcher in the gnome-session-properties and then execute:<br />
$ gconftool -t string -s /desktop/gnome/applications/window_manager/current xmonad<br />
$ gconftool -t string -s /desktop/gnome/session/required_components/windowmanager xmonad<br />
$ killall metacity; xmonad &<br />
<br />
To add an xmonad launcher you can put the following in /usr/share/applications/xmonad.desktop<br />
[Desktop Entry]<br />
Type=Application<br />
Encoding=UTF-8<br />
Name=Xmonad<br />
Exec=xmonad<br />
NoDisplay=true<br />
X-GNOME-WMName=Xmonad<br />
X-GNOME-Autostart-Phase=WindowManager<br />
X-GNOME-Provides=windowmanager<br />
X-GNOME-Autostart-Notify=false<br />
This lets gnome know that xmonad is a windowmanager and where to look for it.<br />
<br />
===Fedora 10 and further links===<br />
[http://thread.gmane.org/gmane.comp.lang.haskell.xmonad/6557 This mailing list thread] contains fedora 10 specific setup instructions, but also a bunch of other gnome setup links if you are having trouble with the .gnomerc/gnome-session methods.<br />
<br />
None of this worked for me (Colin Adams). <br />
What did work was to start gconf-editor and change desktop/gnome/session/required_components/windowmanager from metacity to $HOME/bin/xmonad .<br />
I then have to type "xmonad" from a terminal.<br />
<br />
==Configure Xmonad to interoperate with Gnome==<br />
<br />
<br />
[[Image:Screen-xmonad-gnome-darktheme4.jpg|200px|xmonad and gnome-panel|center]]<br />
<br />
=== Using the Config.Gnome module ===<br />
For xmonad-0.8 or greater, see [[Xmonad/Basic Desktop Environment Integration | Basic DE Integration]] for a simple three line <code>xmonad.hs</code> configuration that:<br />
* integrates docks and gnome-panel using ewmh's<br />
* allows gap-toggling<br />
* binds the gnome run dialog to mod-p, and mod-shift-q to save your session and logout <br />
* otherwise keeps xmonad defaults. <br />
<br />
It is a good starting point. You can then come back and add some of the features below once everything's working.<br />
<br />
<br />
=== No Config.Gnome module ===<br />
Put this in ''~/.xmonad/xmonad.hs'':<br />
<br />
<haskell><br />
import XMonad<br />
import XMonad.Hooks.ManageDocks<br />
import XMonad.Hooks.EwmhDesktops<br />
import qualified XMonad.StackSet as W<br />
<br />
main = xmonad $ defaultConfig<br />
{ manageHook = manageDocks <+> manageHook defaultConfig<br />
, logHook = ewmhDesktopsLogHook<br />
, layoutHook = ewmhDesktopsLayout $ avoidStruts<br />
$ layoutHook defaultConfig<br />
}<br />
</haskell><br />
<br />
This should set up Xmonad to make space for Gnome's panel and status bar automatically. Note that this requires xmonad-contrib to be installed. (The myKeys part isn't needed for a minimal install but that was something that gave me problems early on. See the XMonad.Utils.EZConfig documentation for details.)<br />
<br />
Depending on your versions of nautilus and xmonad, step 4.1 (Disabling the nautilus desktop, see below) may be mandatory. (Otherwise, the nautilus desktop will be raised into the floating layer where it covers all your other windows.)<br />
<br />
Having done this, you should now be able to use Gnome with Xmonad, and most things will work. (An example of something that doesn't is iconifying windows from the panel window list.)<br />
<br />
Explanations of the various options are given below, along with some other things you might want to tweak.<br />
<br />
[[Image:screen-ohmega-tab-gnome-twopane.jpg|200px|A screenshot of xmonad cooperating with gnome|center]]<br />
<br />
==Tweak Gnome to work better with Xmonad==<br />
<br />
These are a few steps that greatly improves the experience of running Xmonad under Gnome. Note that on some systems the binary <tt>gconftool</tt> is called <tt>gconftool-2</tt>.<br />
<br />
===Disable the Nautilus desktop===<br />
<br />
This step is not required, but some users prefer to disable the desktop. From the command line execute:<br />
<br />
gconftool --type boolean --set /apps/nautilus/preferences/show_desktop false<br />
<br />
(Using recent gnome and xmonad I found that it was necessary.)<br />
<br />
===Changing desktop background===<br />
<br />
If you need to change the workspace background programmatically (i.e. from some extension setting in xmonad's configuration file), you can use the command:<br />
<br />
gconftool --type string --set /desktop/gnome/background/picture_filename "/path/to/your/image.png"<br />
<br />
==Tips on configuring Xmonad==<br />
<br />
All the configuration is done in <tt>~/.xmonad/xmonad.hs</tt>.<br />
<br />
===Change the mod key===<br />
<br />
The default ''mod key'' is ''alt'', which conflicts with Gnome keybindings. In order to use be able to use the keyboard to e.g. getting rid of dialogues we rebind it to the left ''logo key'':<br />
<br />
<haskell><br />
main = xmonad defaultConfig<br />
{ modMask = mod4Mask<br />
}<br />
</haskell><br />
<br />
===Extended Window Manager Hints===<br />
<br />
[http://xmonad.org/xmonad-docs/xmonad-contrib/XMonad-Hooks-EwmhDesktops.html EwmhDesktops] makes it possible to let Gnome know about Xmonad windows and workspaces. ''EwmhDesktops'' has been enabled in the example configuration above. By itself, configuration looks like this:<br />
<br />
<haskell><br />
import XMonad.Hooks.EwmhDesktops<br />
main = xmonad defaultConfig<br />
{ logHook = ewmhDesktopsLogHook<br />
}<br />
</haskell><br />
<br />
===Key bindings for switching desktops===<br />
<br />
====In 1 dimension: CycleWS====<br />
<br />
Gnome lays out the desktops in a row by default, and uses Ctrl+Alt+Left/Right for switching desktops left/right. To get similar behaviour in Xmonad, you need to add some keybindings. The contrib module [http://xmonad.org/xmonad-docs/xmonad-contrib/XMonad-Actions-CycleWS.html XMonad.Actions.CycleWS] has some useful actions for cycling workspaces, and I use these keybindings:<br />
<br />
<haskell><br />
-- moving workspaces<br />
, ("M-<Left>", prevWS )<br />
, ("M-<Right>", nextWS )<br />
, ("M-S-<Left>", shiftToPrev )<br />
, ("M-S-<Right>", shiftToNext )<br />
</haskell><br />
<br />
[[Image:Xmonad-screen-gnome-brownblack5.jpg|center|200px]]<br />
<br />
====In 2 dimensions: Plane====<br />
<br />
If Gnome is configured to lay out desktops in more than one line, it's possible to navigate with Ctrl+Alt+Up/Bottom also. The contrib module XMonad.Actions.Plane, available in the xmonad-0.8 or greater, or the [http://code.haskell.org/xmonad darcs] version of XMonad, provides support for this kind of navigation. To use it with 3 lines, for instance, you could use this configuration:<br />
<br />
<haskell><br />
[ ((keyMask .|. m, keySym), function 3 Finite direction)<br />
| (keySym, direction) <- zip [xK_Left .. xK_Down] $ enumFrom ToLeft<br />
, (keyMask, function) <- [(0, planeMove), (shiftMask, planeShift)]<br />
]<br />
</haskell><br />
<br />
===Logging out of the Gnome session vs. quitting Xmonad===<br />
<br />
When running Xmonad as above, it is launched by ''gnome-session'', the "Gnome session manager." Quitting Xmonad in this situation ''will not log you out.'' If you make no changes, using mod+shift+q will leave you with all your applications still running and no window manager to navigate them! There are several remedies for this.<br />
<br />
* Run 'xmonad &' from a command line.<br />
* Quit X using Alt-Ctrl-Backspace.<br />
* Rebind mod+shift+q<br />
<br />
====Rebind mod+shift+q====<br />
To avoid exiting Xmonad and being stuck with no window manager, you might rebind mod+shift+q to execute the gnome-session "log out" functionality. This will of course prevent you from "quitting" Xmonad in the normal way, which may or may not be desirable. When the session logs out, the X11 server is terminated, which will in turn terminate all running X11 applications, including Xmonad.<br />
<br />
(TODO: improve the description of changes that need to be made here.)<br />
<br />
, ("M-S-q", spawn "gnome-session-save --gui --kill") )<br />
<br />
==== Configure rudimentary power management ====<br />
<br />
It might be useful to include bindings for hibernation, screen locking, and other assorted basic functions. While Gnome provides the capability to do so, it's functionality is limited (Mod4 cannot be used as a mask), and you may prefer to have Xmonad manage it. Fortunately, these things can be controlled from the command line, and the following bindings may help. (NB: These are for one handed use of Dvorak control; make sure to bind them to something more fitting. They also use mod1 not to clash with mod4 by accident.)<br />
<br />
-- Lock Screen<br />
, ((modMask .|. shiftMask, xK_l), spawn "gnome-screensaver-command -l")<br />
-- Logout<br />
, ((modMask .|. mod1Mask .|. shiftMask, xK_l), spawn "gnome-session-save --gui --kill")<br />
-- Sleep<br />
, ((mod1Mask .|. shiftMask, xK_apostrophe), spawn "gnome-power-cmd.sh suspend")<br />
-- Reboot<br />
, ((mod1Mask .|. shiftMask, xK_comma), spawn "gnome-power-cmd.sh reboot")<br />
-- Deep Sleep<br />
, ((mod1Mask .|. shiftMask, xK_period), spawn "gnome-power-cmd.sh hibernate")<br />
-- Death<br />
, ((mod1Mask .|. shiftMask, xK_p), spawn "gnome-power-cmd.sh shutdown")<br />
<br />
<br />
====Configure the session manager to relaunch Xmonad====<br />
You can configure the Gnome Session Manager to restart Xmonad whenever it exits (i.e., if you haven't rebound mod-shift-q.) This is rarely a useful feature as xmonad has its builtin compile-and-restart (mod-q), but it will prevent you from accidentally ending up with no window manager and no way to launch one.<br />
<br />
However, as of version 0.7, XMonad does not itself communicate with any session managers in they way they prefer, so setting things up takes some hackery:<br />
<br />
(TBD: steps, see also [[Xmonad/Using_xmonad_in_Gnome/0.5#Preparing_your_GNOME_session]])</div>Joehttps://wiki.haskell.org/index.php?title=File_talk:Accordion.jpg&diff=27176File talk:Accordion.jpg2009-03-24T00:22:34Z<p>Joe: </p>
<hr />
<div>What web browser is that? XEmacs?</div>Joehttps://wiki.haskell.org/index.php?title=UrlDisp&diff=26444UrlDisp2009-02-12T23:15:50Z<p>Joe: grammar fix</p>
<hr />
<div>== What is UrlDisp ==<br />
<br />
<br />
=== Problem statement ===<br />
<br />
URLs are everywhere on the web. Most of them, however, are hard to remember, because they are meaningless for humans. This is wrong: URLs are a part of user interface, and therefore should be kept simple, meaningful and memorizeable.<br />
<br />
=== Solution ===<br />
<br />
UrlDisp provides (Fast)CGI programs a minimalistic domain-specific parser for URLs.<br />
<br />
Hierarchical part of the URL is tokenized and matched against rules defined using UrlDisp combinators. Every rule consists of, basically, a predicate and a CGI action. Once a predicate is satisfied, an action is performed; otherwise, alternatives are tried in order. The matching algorithm is backtracking.<br />
<br />
== Usage examples ==<br />
<br />
=== Basics ===<br />
<br />
A regular CGI action looks like this:<br />
<br />
<hask>output "hello, world!"</hask><br />
<br />
Adding a predicate:<br />
<br />
<hask><br />
-- if URL matches /hello, then output "hello, world!"<br />
h |/ "hello" *> output "hello, world!"</hask><br />
<br />
More examples:<br />
<br />
<hask><br />
-- if URL contains /hello, output "woot, it works!", otherwise check for<br />
-- /foo<br />
(h |/ "hello" *> output "woot, it works!") <|> (h |/ "foo" *> output "foo")<br />
</hask><br />
<br />
As you can see, the |/ combinator matches current token against its right operand. h is a special predicate that matches anything, it is used to begin a string of combinators.<br />
<br />
One can also match against<br />
* URL parameters,<br />
* HTTP methods,<br />
* and also convert token into a variable which is an instance of Read<br />
<br />
There's also an API which is believed to be more human-readable.<br />
<br />
=== Extending UrlDisp ===<br />
<br />
The examples given above are not very interesting since one wants to interact with outside world. Let's see how to extend UrlDisp to handle database access.<br />
<br />
Wrapping UrlDisp around a ReaderT will do the trick.<br />
<br />
<hask><br />
{-# LANGUAGE FlexibleInstances, ScopedTypeVariables #-}<br />
import Network.UrlDisp<br />
import Database.HDBC<br />
import Database.HDBC.ODBC<br />
import Control.Exception (bracket)<br />
import Network.CGI<br />
import Network.CGI.Monad<br />
<br />
instance MonadCGI (ReaderT Connection (CGIT IO)) where<br />
cgiAddHeader n v = lift $ cgiAddHeader n v<br />
cgiGet = lift . cgiGet<br />
<br />
-- once a request to "/db/" is sent,<br />
-- execute an SQL query and show its results<br />
main :: IO ()<br />
main = bracket (connectODBC connStr) disconnect<br />
(\c -> runCGI $ (flip runReaderT) c $ evalUrlDisp $<br />
((h |/ "db" *> m) <|> output "not found"))<br />
<br />
m :: UrlDisp (ReaderT Connection (CGIT IO)) CGIResult<br />
m = do<br />
v <- lift ask >>= \c -> liftIO (quickQuery' c "select * from ..." [])<br />
output $ show v<br />
</hask><br />
<br />
[[Category:Web]]<br />
[[Category:Packages]]</div>Joehttps://wiki.haskell.org/index.php?title=UrlDisp&diff=26443UrlDisp2009-02-12T23:13:25Z<p>Joe: grammar fix</p>
<hr />
<div>== What is UrlDisp ==<br />
<br />
<br />
=== Problem statement ===<br />
<br />
URLs are everywhere on the web. Most of them, however, are hard to remember, because they are meaningless for humans. This is wrong: URLs are a part of user interface, and therefore should be kept simple, meaningful and memorizeable.<br />
<br />
=== Solution ===<br />
<br />
UrlDisp provides (Fast)CGI programs a minimalistic domain-specific parser for URLs.<br />
<br />
Hierarchical part of the URL is tokenized and matched against rules defined using UrlDisp combinators. Every rule consists of, basically, a predicate and a CGI action. Once a predicate is satisfied, an action is performed; otherwise, alternatives are tried in order. The matching algorithm is backtracking.<br />
<br />
== Usage examples ==<br />
<br />
=== Basics ===<br />
<br />
A regular CGI action looks like this:<br />
<br />
<hask>output "hello, world!"</hask><br />
<br />
Adding a predicate:<br />
<br />
<hask><br />
-- if URL matches /hello, then output "hello, world!"<br />
h |/ "hello" *> output "hello, world!"</hask><br />
<br />
More examples:<br />
<br />
<hask><br />
-- if URL contains /hello, output "woot, it works!", otherwise check for<br />
-- /foo<br />
(h |/ "hello" *> output "woot, it works!") <|> (h |/ "foo" *> output "foo")<br />
</hask><br />
<br />
As you can see, the |/ combinator matches current token against its right operand. h is a special predicate that matches anything, it is used to begin a string of combinators.<br />
<br />
One can also match against<br />
* URL parameters,<br />
* HTTP methods,<br />
* and also convert token into a variable which is an instance of Read<br />
<br />
There's also an API which is believed to be more human-readable.<br />
<br />
=== Extending UrlDisp ===<br />
<br />
The examples given above are not very interesting since one wants to interact with outside world. Let's see how to extend UrlDisp to handle database access.<br />
<br />
Wrapping UrlDisp around a ReaderT will do the trick.<br />
<br />
<hask><br />
{-# LANGUAGE FlexibleInstances, ScopedTypeVariables #-}<br />
import Network.UrlDisp<br />
import Database.HDBC<br />
import Database.HDBC.ODBC<br />
import Control.Exception (bracket)<br />
import Network.CGI<br />
import Network.CGI.Monad<br />
<br />
instance MonadCGI (ReaderT Connection (CGIT IO)) where<br />
cgiAddHeader n v = lift $ cgiAddHeader n v<br />
cgiGet = lift . cgiGet<br />
<br />
-- once a request to "/db/" is sent,<br />
-- execute an SQL query and show it's results<br />
main :: IO ()<br />
main = bracket (connectODBC connStr) disconnect<br />
(\c -> runCGI $ (flip runReaderT) c $ evalUrlDisp $<br />
((h |/ "db" *> m) <|> output "not found"))<br />
<br />
m :: UrlDisp (ReaderT Connection (CGIT IO)) CGIResult<br />
m = do<br />
v <- lift ask >>= \c -> liftIO (quickQuery' c "select * from ..." [])<br />
output $ show v<br />
</hask><br />
<br />
[[Category:Web]]<br />
[[Category:Packages]]</div>Joehttps://wiki.haskell.org/index.php?title=Maintaining_laziness&diff=25502Maintaining laziness2009-01-04T22:39:58Z<p>Joe: fixed typo</p>
<hr />
<div>One of Haskell's main features is [[non-strict semantics]], which in is implemented by [[lazy evaluation]] in all popular Haskell compilers.<br />
However many Haskell libraries found on [[Hackage]] are implemented just as if Haskell would be a strict language.<br />
This leads to unnecessary inefficiencies, [[memory leak]]s and, we suspect, unintended semantics.<br />
In this article we want to go through some techniques on how to check lazy behaviour on functions,<br />
examples of typical constructs which break laziness without need,<br />
and finally we want to link to techniques that may yield the same effect without laziness.<br />
<br />
== Checking laziness ==<br />
<br />
If you want to check whether a function is lazy enough, you may feed it with undefined values.<br />
An undefined value can be <hask>undefined</hask>, <hask>error "reason"</hask>, or an infinite loop.<br />
The latter one has the advantage that it cannot be hidden by some hacks like "catching" the error in the IO monad.<br />
<br />
Examples:<br />
Check whether <hask>filter</hask> is lazy:<br />
<haskell><br />
filter even [0..]<br />
filter even ([0..5] ++ undefined)<br />
</haskell><br />
If the <hask>filter</hask> function is lazy<br />
then it keeps generating elements in the first case<br />
and it outputs a prefix of the output list, before breaking because of the undefined, in the second case.<br />
<br />
An automated unit test can check whether infinite or corrupted input data produces correct prefixes.<br />
Those tests usually do not fail by returning <hask>False</hask> but by leading to undefined results,<br />
either explicit <hask>undefined</hask> or an infinite loop.<br />
<haskell><br />
testFilter0 = filter even [0..100] `isPrefixOf` filter even [0..]<br />
testFilter1 = filter even [0..100] `isPrefixOf` filter even ([0..102]++undefined)<br />
testFilter2 = let x = filter even [0..] !! 100 in x==x<br />
testFilter3 = let x = filter even ([0..102]++undefined) !! 50 in x==x<br />
</haskell><br />
<br />
<br />
== Laziness breakers ==<br />
<br />
=== Maybe, Either, Exceptions ===<br />
<br />
Some laziness breakers are visible in type signatures:<br />
<haskell><br />
decodeUTF8 :: [Word8] -> Either Message String<br />
</haskell><br />
The <hask>Either</hask> type signals that the function marks decoding failure by using the <hask>Left</hask> constructor of <hask>Either</hask>.<br />
This function cannot be lazy, because when you access the first character of the result,<br />
it must already be computed, whether the result is <hask>Left</hask> or <hask>Right</hask>.<br />
For this decision, the complete input must be decoded.<br />
A better type signature is<br />
<haskell><br />
decodeUTF8 :: [Word8] -> (Maybe Message, String)<br />
</haskell><br />
where the <hask>String</hask> contains as much characters as could be decoded<br />
and <hask>Maybe Message</hask> gives the reason for the stop of the decoding.<br />
<hask>Nothing</hask> means the input was completely read,<br />
<hask>Just msg</hask> means the decoding was aborted for the reason described in <hask>msg</hask>.<br />
If you touch the first element of the pair, the complete decodings is triggered, thus laziness is broken.<br />
This means you should first process the <hask>String</hask> and look at <hask>Maybe Message</hask> afterwards.<br />
<br />
Instead of the unspecific pair type you should use the special type for asynchronous exceptions as found in the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/explicit-exception explicit exception] package.<br />
<br />
<br />
Especially in parsers you may find a function, called Wadler's force function.<br />
It works as follows:<br />
<haskell><br />
force y =<br />
let Just x = y<br />
in Just x<br />
</haskell><br />
It looks like a complicated expression for <hask>y</hask><br />
with an added danger of failing unrecoverably when <hask>y</hask> is not <hask>Just</hask>.<br />
Its purpose is to use the lazy pattern matching of <hask>let</hask><br />
and to show to the runtime system, that we expect that <hask>y</hask> is always a <hask>Just</hask>.<br />
Then the runtime system need not to wait until it can determine the right constructor but it can proceed immediately.<br />
This way a function can be made lazy, also if it returns <hask>Maybe</hask>.<br />
It can however fail, if later it turns out, that <hask>y</hask> is actually <hask>Nothing</hask>. <!-- fail how? To be lazy? Or it is some hideous failure like 'head []'? --><br />
<br />
Using force like functions is sometimes necessary,<br />
but should be avoided for data types with more than one constructor.<br />
It is better to use an interim data type with one constructor and lift to the multi-constructor datatype when needed.<br />
Consider parsers of type <hask>StateT [Word8] Maybe a</hask>.<br />
Now consider the parser combinator <haskell>many :: StateT [Word8] Maybe a -> StateT [Word8] Maybe [a]</haskell><br />
which parses as many elements of type <hask>a</hask> as possible.<br />
It shall be lazy and thus must be infallible and must not use the <hask>Maybe</hask>.<br />
It shall just return an empty list, if parsing of one element fails.<br />
A quick hack would be to define <hask>many</hask> using a <hask>force</hask> function.<br />
It would be better to show by the type, that <hask>many</hask> cannot fail:<br />
<haskell>many :: StateT [Word8] Maybe a -> StateT [Word8] Identity [a]</haskell>.<br />
<br />
=== Early decision ===<br />
<br />
==== List construction ====<br />
<br />
Be aware that the following two expressions are not equivalent.<br />
<haskell><br />
-- less lazy<br />
if b then f x else f y<br />
-- more lazy<br />
f (if b then x else y)<br />
</haskell><br />
It is <hask>if undefined then f x else f y</hask> is <hask>undefined</hask>,<br />
whereas <hask>f (if b then x else y)</hask> is <hask>f undefined</hask>,<br />
which is a difference in [[non-strict semantics]].<br />
Consider e.g. <hask>if b then 'a':x else 'a':y</hask>.<br />
<br />
It is common source of too much strictness to make decisions too early and thus duplicate code in the decision branches.<br />
Intuitively spoken, the bad thing about [[code duplication]] (stylistic questions put aside) is,<br />
that the run-time system cannot see that in the branches some things are equal and do it in common before the critical decision.<br />
Actually, the compiler and run-time system could be "improved" to do so, but in order to keep things predictable, they do not do so.<br />
Even more, this behaviour is required by theory,<br />
since by pushing decisions to the inner of an expression you change the semantics of the expression.<br />
So we return to the question, what the programmer actually wants.<br />
<br />
Now, do you think this expression<br />
<haskell><br />
if b<br />
then [x]<br />
else y:ys<br />
</haskell><br />
is maximally lazy?<br />
It seems so, but actually it is not. In both branches we create non-empty lists, but the run-time system cannot see this.<br />
It is <hask>null (if undefined then [x] else y:ys)</hask> again <hask>undefined</hask>,<br />
but we like to have it evaluated to <hask>False</hask>.<br />
Here we need lazy pattern matching as provided by <hask>let</hask>.<br />
<haskell><br />
let z:zs =<br />
if b<br />
then [x]<br />
else y:ys<br />
in z:zs<br />
</haskell><br />
This expression always returns the constructor <hask>(:)</hask> and thus <hask>null</hask> knows that the list is not empty.<br />
However, this is a little bit unsafe, because the <hask>let z:zs</hask> may fail if in the branches of <hask>if</hask> there is an empty list.<br />
This error can only caught at run-time which is bad.<br />
We can avoid it using the single constructor pair type.<br />
<haskell><br />
let (z,zs) =<br />
if b<br />
then (x,[])<br />
else (y,ys)<br />
in z:zs<br />
</haskell><br />
which can be abbreviated to<br />
<haskell><br />
uncurry (:) (if b then (x,[]) else (y,ys))<br />
</haskell><br />
<br />
<br />
Another example is the <hask>inits</hask> function.<br />
In the Haskell 98 report the implementation<br />
<haskell><br />
inits :: [a] -> [[a]]<br />
inits [] = [[]]<br />
inits (x:xs) = [[]] ++ map (x:) (inits xs)<br />
</haskell><br />
is suggested.<br />
However you find that <hask>inits undefined</hask> is undefined,<br />
although <hask>inits</hask> always should return the empty list as first element.<br />
The following implementation does exactly this:<br />
<haskell><br />
inits :: [a] -> [[a]]<br />
inits xt =<br />
[] :<br />
case xt of<br />
[] -> []<br />
x:xs -> map (x:) (inits xs)<br />
</haskell><br />
See also the article on [[base cases and identities]].<br />
<br />
<br />
==== Reader-Writer-State monad ====<br />
<br />
I do not know whether the following example can be simplified.<br />
In this form it occured in a real application, namely the HTTP package.<br />
<br />
Consider the following action of the <hask>Control.Monad.RWS</hask> which fetches a certain number of elements from a list.<br />
The state of the monad is the input list we fetch the elements from.<br />
The reader part provides an element which means that the input is consumed.<br />
It is returned as singleton when the caller tries to read from a completely read input.<br />
The writer allows to log some information, however the considered action does not output something to the log.<br />
<haskell><br />
getN :: Int -> RWS a [Int] [a] [a]<br />
getN n =<br />
do input <- get<br />
if null input<br />
then asks (:[])<br />
else let (fetched,rest) = splitAt n input<br />
in put rest >> return fetched<br />
</haskell><br />
As we learned as good imperative programmers, we only call <hask>splitAt</hask> when the input is non-empty,<br />
that is, only if there is something to fetch.<br />
This works even more many corner cases, but not in the following one.<br />
Although <hask>getN</hask> does obviously not log something (i.e. it does not call <hask>tell</hask>),<br />
it requires to read the input in order to find out, that nothing was logged:<br />
<haskell><br />
*Test> (\(_a,_s,w) -> w) $ runRWS (getN 5) '\n' undefined<br />
*** Exception: Prelude.undefined<br />
</haskell><br />
<br />
The problem is again, that <hask>if</hask> checks the emptiness of the input,<br />
which is undefined, since the input is undefined.<br />
Thus we must ensure, that the invoked monadic actions are run independent from the input.<br />
Only this way, the run-time system can see that the logging stream is never touched.<br />
We start refactoring by calling <hask>put</hask> independently from <hask>input</hask>'s content.<br />
It works as well for empty lists, since <hask>splitAt</hask> will just return empty lists in this case.<br />
<haskell><br />
getN :: Int -> RWS a [Int] [a] [a]<br />
getN n =<br />
do input <- get<br />
let (fetched,rest) = splitAt n input<br />
put rest<br />
if null input<br />
then asks (:[])<br />
else return fetched<br />
</haskell><br />
This doesn't resolve the problem. There is still a choice between <hask>asks</hask> and <hask>return</hask>.<br />
We have to pull out <hask>ask</hask> as well.<br />
<haskell><br />
getN :: Int -> RWS a [Int] [a] [a]<br />
getN n =<br />
do input <- get<br />
let (fetched,rest) = splitAt n input<br />
put rest<br />
endOfInput <- ask<br />
return $<br />
if null input<br />
then [endOfInput]<br />
else fetched<br />
</haskell><br />
Now things work as expected:<br />
<haskell><br />
*Test> (\(_a,_s,w) -> w) $ runRWS (getN 5) '\n' undefined<br />
[]<br />
</haskell><br />
We learn from this example, that sometimes in Haskell it is more efficient to call functions that are not needed under some circumstances.<br />
Always remind, that the [[Do notation considered harmful|do notation]] looks only imperative, but it is not imperative.<br />
E.g., <hask>endOfInput</hask> is only evaluated if the end of the input is really reached.<br />
Thus, the call <hask>ask</hask> does not mean that there is actually an action performed between <hask>put</hask> and <hask>return</hask>.<br />
<br />
<br />
=== Strict pattern matching in a recursion ===<br />
<br />
Consider the <hask>partition</hask> function which sorts elements, that match a predicate, into one list and the non-matching elements into another list.<br />
This function should also work on infinite lists,<br />
but the implementation shipped with GHC up to 6.2 [http://www.haskell.org/pipermail/libraries/2004-October/002645.html failed on infinite lists].<br />
What happened?<br />
<br />
The reason was too strict pattern matching.<br />
<br />
Let's first consider the following correct implementation:<br />
<haskell><br />
partition :: (a -> Bool) -> [a] -> ([a], [a])<br />
partition p =<br />
foldr<br />
(\x ~(y,z) -><br />
if p x<br />
then (x : y, z)<br />
else (y, x : z))<br />
([],[])<br />
</haskell><br />
The usage of <hask>foldr</hask> seems to be reserved for advanced programmers.<br />
Formally <hask>foldr</hask> runs from the end to the start of the list.<br />
However, how can this work if there is a list without an end?<br />
That can be seen when applying the definition of <hask>foldr</hask>.<br />
<haskell><br />
foldr :: (a -> b -> b) -> b -> [a] -> b<br />
foldr _ b [] = b<br />
foldr f b (a:as) = f a (foldr f b as)<br />
</haskell><br />
Now we expand this once for an infinite input list, we get<br />
<haskell><br />
partition p (a:as) =<br />
(\ ~(y,z) -> if p a then (a:y, z) else (y, a:z)) (foldr ... ([],[]) as)<br />
</haskell><br />
We see that the whether <hask>a</hask> is prepended to the first or the second list,<br />
does only depend on <hask>p a</hask>, and neither on <hask>y</hask> nor on <hask>z</hask>.<br />
The laziness annotation <hask>~</hask> is crucial, since it tells, intuitively spoken,<br />
that we can rely on the recursive call of <hask>foldr</hask> to return a pair and not <hask>undefined</hask>.<br />
Omitting it, would require the evaluation of the whole input list before the first output element can be determined.<br />
This fails for infinite lists and is inefficient for finite lists, and that was the bug in former implementations of <hask>partition</hask>.<br />
Btw. by the expansion you also see, that it would not help to omit the tilde and apply the above 'force' trick to the 'if-then-else' expression.<br />
<br />
=== List reversal ===<br />
<br />
Any use of the list function <hask>reverse</hask> should alert you,<br />
since when you access the first element of a reversed list, then all nodes of the input list must be evaluated and stored in memory.<br />
Think twice whether it is really needed.<br />
The article [[Infinity and efficiency]] shows how to avoid list reversal.<br />
<br />
== Alternatives ==<br />
<br />
From the above issues you see that laziness is a fragile thing.<br />
Make one mistake and a function, carefully developed with laziness in mind, is no longer lazy.<br />
The type system will rarely help you hunting laziness breakers, and there is little support by debuggers.<br />
<br />
Thus detecting laziness breakers will often requires understanding of a large portion of code, which is against the idea of modularity.<br />
<!-- ... and knowledge about low-level details of compilation or the runtime system.<br />
In principle that's not true.<br />
I have argued with how the runtime system may work,<br />
but it all follows from the non-strict semantics.<br />
--><br />
<br />
Maybe for your case you will prefer a different idiom, that achieves the same goals in a safer way. See e.g. the [[Enumerator and iteratee]] pattern.<br />
<br />
[[Category:Idioms]]</div>Joehttps://wiki.haskell.org/index.php?title=AutoForms/Tutorial&diff=21019AutoForms/Tutorial2008-05-18T23:05:23Z<p>Joe: spelling fix</p>
<hr />
<div>IMPORTANT! Development of AutoForms has stopped. Instead use [http://lindstroem.wordpress.com/2008/05/03/introducing-wxgeneric/ WxGeneric] which is based upon AutoForms. However, this page still explains why you would want to generically construct GUIs.<br />
<br />
= Introduction =<br />
<br />
[http://autoforms.sourceforge.net AutoForms] is a library to ease the creation of Graphical User Interfaces (GUI). It do this by using generic programming to construct GUI components. The AutoForms user creates an ordinary [http://en.wikipedia.org/wiki/Algebraic_datatype algebraic data type (ADT)], that reflects the data model of an application. From this ADT AutoForms automatically derives a GUI component, by using the structure and identifiers of the ADT.<br />
<br />
In this article we will see how AutoForms is used via a practical example. We will explain how AutoForms is used (the interface), rather than how it is implemented. And you should be able to see some of the advantages and disadvantages of the AutoForms library. In comparison to conventional GUI libraries, we aim to show that AutoForms can:<br />
<br />
* speed up the creation of GUI applications<br />
* abstract over a larger set of standard dialogs - by standard dialog we mean dialogs like a file dialog or an error dialog<br />
* more easily adapts to chageing requirements<br />
* stronger type safety<br />
<br />
Furthermore, we aim to show that:<br />
<br />
* while AutoForms is build upon the automatic construction of GUI applications, using an ADT as input, we will show that it is still possible to customize the applications look and functionality<br />
<br />
To understand this article the reader should have programming experience with [http://www.haskell.org Haskell] and atleast have looked at the [[AutoForms | AutoForms homepage]].<br />
<br />
We will achieve the goals by creating a GUI frontend for a [http://en.wikipedia.org/wiki/Cron Cron]-like daemon. We will start with a simple example and evolve more complex examples. This is not just to avoid presenting too many details at once, but also because it resembles a typical software development effort. The application is chosen as it is simple, yet practically usefull. Also, different aspects of the AutoForms library is presented well with this kind of application.<br />
<br />
The reader is encouraged to download and run the examples. As the AutoForms library is build upon the cross platform library [http://wxhaskell.sourceforge.net/ WxHaskell] it should work on MS Windows, Apple, and Linux. The examples shown in this tutorial are included with the AutoForms library as well. If you [[AutoForms#Installation | install this library]] then you will be able to compile the HCron GUIs, shown in this article, by doing "cd src/Examples/HCron/ && make <editor name>". Or you can compile all of them by "cd src/Examples/HCron && make all".<br />
<br />
= The Cron Editors =<br />
== Cron ==<br />
Cron is a scheduling service for recurring jobs. It is normally only seen in Unix-(like) environments. Which jobs to run are decided by editing crontab-files using a text editor. For example the following crontab entry says that the job ''getMail'' should be executed every day 20 minutes past 10 o'clock.<br />
<br />
20 10 * * * getMail<br />
<br />
This kind of service can be very useful. However, editing text files is not the easiest or most intuitive way to interface with a Cron-like service. A GUI should be able to do a much better job. Thus we want to create a GUI for a Cron-like service.<br />
<br />
We have decided to make our own Cron-like daemon, in stead of using the standard Unix Cron, as:<br />
<br />
* our own tool can be cross platform.<br />
* parsing will be alot simpler, as we can store the Crontab-like file as an ordinary Haskell ADT. This will mean that we can use Haskell' <hask>show</hask> and <hask>read</hask> functions for serialization and deserialization.<br />
* we can have features not found in Cron, such as handling one-off jobs and not just recurring jobs.<br />
<br />
== Our first program ==<br />
<br />
=== The daemon (service) ===<br />
Before creating the GUI, we will create the daemon to execute the batch jobs. We have decided to begin by creating a very simple daemon, that can only execute jobs at a specified time. It cannot handle recurring jobs. We will laiter add more functionality to the daemon. Note that this resembles how much software is actually build, by starting with a simple prototype and then incrementally adding more functionality.<br />
<br />
We will not go into detail about how the daemon is implemented, as it is not the point of this tutorial. However, the interested reader can look at the [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Daemon1st.hs?revision=627&view=markup source code].<br />
<br />
What is interesting is the data type the daemon uses to store which jobs to execute. This is important as the GUI needs to manipulate this datatype. The type is defined in [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Entry1st.hs?revision=627&view=markup the Entry1st module] as:<br />
<br />
type HCron = [Entry Time]<br />
<br />
data Entry t = Entry { when :: t<br />
, command :: String } deriving (Show, Read, Eq, Ord)<br />
<br />
data Time = Time { year :: Int<br />
, month :: System.Time.Month<br />
, day :: Int<br />
, hour :: Int<br />
, minute :: Int } deriving (Show, Read, Eq, Ord)<br />
<br />
The module also defines:<br />
<br />
specLocation = "./hcron.spec"<br />
<br />
which is the file, where we will store our crontab entries.<br />
<br />
=== The GUI ===<br />
Now that we have made the daemon and the HCron data type, we are ready for what AutoForms is all about - user interfaces. We want to create a GUI to edit the HCron data type and to store the result.<br />
<br />
We will show the code for our first GUI in small steps and explain each part as it is shown. Firstly,<br />
<br />
module Editor1st where<br />
<br />
import Entry1st<br />
import Graphics.UI.AF.WxFormAll<br />
import Control.Monad.Trans(liftIO)<br />
<br />
we need to import the data type defined above (Entry1st) and to import [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Graphics/UI/AF/WxFormAll.hs?revision=627&view=markup WxFormAll], which makes it possible to use the AutoForms library.<br />
<br />
To automatically generate GUIs, AutoForms uses the [http://www.cs.vu.nl/boilerplate/ Scrap Your Boilerplate 3 (SYB3)] approch to generic programming. SYB3 requires that we use the template-function ''derive'' for all data types we want to process generically. Thus we use derive on the Entry data type and its children:<br />
<br />
$(derive [<nowiki>''</nowiki>Entry,<nowiki>''</nowiki>Time,<nowiki>''</nowiki>Month])<br />
<br />
Finally we get to the main function, which constructs the GUI:<br />
<br />
main = startWx "Editor1st" $<br />
do entryValue :: HCron <- liftIO $ (readFile specLocation >>= readIO) `catch` (\_ -> return [])<br />
entry <- builderCom entryValue<br />
chState <- makeChangedState entry<br />
<br />
let saveFile = do x <- getValue entry<br />
liftIO $ writeFile specLocation $ show x<br />
setValue chState $ Unchanged x<br />
button "Save" saveFile >>= enabledWhen chState (== Changed)<br />
button "Quit" closeWindow<br />
<br />
In the first line ''startWx'' runs an AutoForms GUI. The second line reads the specification of jobs to execute and when from the file system. The file path is in the ''specLocation'' constant, which is defined in the Entry1st module. If this file cannot be read, it falls back to an empty specification, one with no jobs defined.<br />
<br />
The third line morphs an ADT (entryValue) to widgets. The fourth line creates state, which tells if the edited values has changed since they were last saved.<br />
<br />
''saveFile'' is an action which stores the edited values on the disc. This action is executed whenever the save-button is pressed. The save- and quit -button are created on the two last lines.<br />
<br />
As would be expected from a Haskell program, but not from conventional GUI toolkits, this GUI is constructed in a type-safe fashion.<br />
<br />
[http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Editor1st.hs?revision=627&view=markup Here you can find the complete code for the application]. If you installed AutoForms you can compile the our GUI by doing "cd src/Examples/HCron && make Editor1st" from within the AutoForms source distribution.<br />
<br />
== Using standard dialogs ==<br />
In the last section we saw that you could easily create a GUI with AutoForms. However, editing some file using a nice GUI seems like a common task. Thus we should be able to reuse as much of the functionality as possible, i.e. we should not have to create save & quit buttons every time we need to edit some file. And with the ''editFile'' function we can avoid this:<br />
<br />
module Editor2nd where<br />
<br />
import Entry1st<br />
import Graphics.UI.AF.WxFormAll<br />
<br />
$(derive [<nowiki>''</nowiki>Entry,<nowiki>''</nowiki>Time,<nowiki>''</nowiki>Month])<br />
<br />
main = startWx "Editor2nd" $ editFile specLocation ([]::HCron)<br />
<br />
that is all. Five lines of code and you have a GUI. This would not have been possible with a conventional GUI toolkit, as you need to specify the GUI elements and layout manually.<br />
<br />
Again you can compile the application yourself, by doing "cd src/Examples/HCron && make Editor2nd" from within the AutoForms source code.<br />
<br />
== Editor with limits ==<br />
The current editor accepts a lot more values than makes sense. For instance you can specify that you want the zero'th day of the month. It should not be a big surprise, as Haskell only lets us specify type constrains when they can be statically checked. This is not good enough for a user friendly GUI. We will therefore limit the possible values the GUI can have:<br />
<br />
instance TypePresentation (Entry Time) tp1 tp2 tp3 tp4 tp5 where<br />
mkCom x = limit timeLimit ("Incorrect time")<br />
(defaultCom x)<br />
where<br />
timeLimit :: Entry Time -> IO Bool<br />
timeLimit Entry { when = Time y month d h m } = return $<br />
y > 1970 && y < 2100 &&<br />
d >= 1 &&<br />
((month == February && d <= 28) ||<br />
(month == February && d == 29 && y `mod` 4 == 0) ||<br />
(month `elem` [ January, March, May, July, August<br />
, October, December] && d <= 30) ||<br />
(month `elem` [April, June, September, November] && d <= 30)<br />
) &&<br />
h >= 0 && h <= 23 &&<br />
m >= 0 && m <= 59<br />
<br />
we do this by instantiate the ''Entry Time'' type as a member of the type class ''TypePresentataion''. Reading the ''mkCom'' function backwards, we first call ''defaultCom'' which constructs a default GUI for ''Entry Time''. We then limit this default GUI to accept only sensible values using the timeLimit function. The GUI will complain with "Incorrect time" if the user inputs a non-sensible value.<br />
<br />
Not only is this specification succinct, it also manages to separate the set of legal values from laying out GUI elements and from the interactive aspects of the GUI. In conventional GUI toolkits this restriction of legal values are intermixed with the GUI element creation and/or layout of elements.<br />
<br />
Finally, we need to add a default Time value by specializing GInstanceCreator:<br />
<br />
instance GInstanceCreator Time where<br />
gGenUpTo _ = [Time 2000 January 1 10 00]<br />
<br />
Again you can compile the application yourself, by doing "cd src/Examples/HCron && make Editor3rdLimit" and you can find the [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Editor3rdLimit.hs?revision=627&view=markup source code here].<br />
<br />
== Recurring tasks ==<br />
We set out to create a crontab-like GUI editor. But crontab handles recurring tasks and we do not. First we add recurring tasks to the Entry data type by adding a recurring field and changing the [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Entry2ndRecurring.hs?revision=627&view=markup daemon accordingly]:<br />
<br />
<haskell><br />
data Entry t = Entry { when :: t<br />
, recurring :: Maybe TimeDiff<br />
, command :: String } deriving (Show, Read, Eq, Ord)<br />
</haskell><br />
<br />
We also need to change the GUI. This is done by adding TimeDiff to the derive template function:<br />
<br />
<haskell><br />
$(derive [<nowiki>''</nowiki>Entry,<nowiki>''</nowiki>Time,<nowiki>''</nowiki>Month,<nowiki>''</nowiki>TimeDiff])<br />
</haskell><br />
<br />
that is, we need to tell AutoForms to construct GUI elements for the TimeDiff datatype. With conventional toolkits we would have to specify a new layout of elements, but with AutoForms it is done automatically. The changing requirements leeves us with almost no extra work for the GUI part.<br />
<br />
The new GUI can be compiled by doing "cd src/Examples/HCron && make Editor4thRecurring.hs" and you can find the [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Editor4thRecurring.hs?revision=627&view=markup source code here].<br />
<br />
== Executing commands ==<br />
As pointy-haired booses are known for, we are presented with yet another change in requirements. The GUI should now be able to execute the commands directly. To facilitate this change, ''TypePresentation'' is specialicalized to:<br />
<br />
<haskell><br />
instance ( AutoForm WxAct ComH WxM SatCxt EC<br />
, Sat (SatCxt (Entry Time)), Sat (SatCxt [String]))<br />
=> TypePresentation (Entry Time) WxAct ComH WxM SatCxt EC where<br />
mkCom x = limit timeLimit ("Incorrect time") $<br />
label "Entry" entryCom<br />
where<br />
entryCom :: EC (Entry Time)<br />
entryCom = builderToCom $<br />
do entryHandle <- addCom $ defaultCom x -- don't use mkCom here, as this<br />
-- results in eternal recursion<br />
outputHandle <- addCom $ label "Command output" $ mkCom [""]<br />
button "Exec..." (do cmd <- getValue entryHandle<br />
(_, out) <- liftIO $ readCommand (Entry.command cmd) ""<br />
setValue outputHandle (lines out)<br />
)<br />
return entryHandle<br />
timeLimit :: Entry Time -> IO Bool<br />
...<br />
</haskell><br />
<br />
In stead of using 'mkCom' we know use our own component-creating function, named 'entryCom'. Our component consists of three sub-components. The entry (entryHandle) used for editing the command. An output component (outputHandle) where the output of executing the command is shown. And finally a button, which when pressed by the user, execute the command.<br />
<br />
'entryCom' returns entryHandle, as we still wants the component to keep the same type as in the previous examples. That is, we do not want the type to reflect the button or the output component, but just the command-entry. 'builderToCom' (see top of entryCom-function) packages the three sub-components, as one big component.<br />
<br />
The readCommand function is defined in [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Run.hs?revision=627&view=markup the Run module] and executes a command yielding a return code and the output.<br />
<br />
The new GUI can be compiled by doing "cd src/Examples/HCron && make Editor5thOutputWindow.hs" and you can find the [http://autoforms.svn.sourceforge.net/viewvc/autoforms/trunk/AForms/src/Examples/HCron/Editor5thOutputWindow.hs?revision=701&view=markup source code here].<br />
<br />
== Making a GUI for Cron ==<br />
Most people using Unix-like operating systems, would probably prefer to keep using the ordinary Crontab daemon. Some may even suggest that just creating a new daemon (made to fit the AutoForms library) is a cop-out. To satiesfy these people and to make a more usefull application we have created a [http://autoforms.svn.sourceforge.net/viewvc/autoforms/tags/release-0.4/AForms/src/Examples/HCron/Editor6thCrontab.hs?revision=627&view=markup GUI for Crontab].<br />
<br />
How this GUI works is left as an exercise for the reader.<br />
<br />
= Future work =<br />
While the GUIs above did show that AutoForms has potential, the GUIs were still small and the development of AutoForms has been directed by the needs of this tutorial. Therefore we need to create larger and more complex applications. We need this both to verify that AutoForms can be used for larger applications and to direct the development of AutoForms. To begin this work, we are currently creating a [http://autoforms.svn.sourceforge.net/viewvc/autoforms/trunk/AForms/Examples/GhciGui/ GUI version of GHCi].<br />
<br />
Also the layout of GUI elements is not nearly as nice as it could be. This needs to be improved for AutoForms to become a serious alternative as a GUI toolkit.<br />
<br />
= Acknowledgements =<br />
<br />
I would like to thank Kido Takahiro as an early adaptor of AutoForms, with his [https://sourceforge.net/projects/kamiariduki/ Kamiariduki project]. I would also like to thank him for valuable feedback.<br />
<br />
Also the creatators of Scrap Your Boilerplate should be mentioned, without whom this project would not be possible.<br />
<br />
Finally, the creators of the [http://clean.cs.ru.nl/gec/ GEC toolkit] should be mentioned as many ideas was borrowed from them.<br />
<br />
<br />
<br />
[[Category:Tutorials]]<br />
[[Category:User interfaces]]</div>Joehttps://wiki.haskell.org/index.php?title=HXT&diff=16960HXT2007-11-22T21:26:22Z<p>Joe: grammar improvement</p>
<hr />
<div>[[Category:Tools]] [[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/HXT-7.0.tar.gz HXT-7.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
type NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTrees]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people do know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
When used to combine predicate filters, the <hask>(>>>)</hask> serves as a logical "and" operator or, from the relational view, as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the locical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a non-empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulating trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. These traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops when a subtree is found, because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop, when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting the<br />
special arrows into the more general ones.<br />
<br />
Don't worry about all these conceptional details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the apropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attributes values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containg a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relativ HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
''' to be done '''<br />
<br />
=== Automatic read/writing between xml and Haskell data types ===<br />
<br />
'''Question''': is there any way to write/read Haskell types to/from XML in HXT? HaXml has readXml and showXml, but I can't find any similar mechanism in HXT. Help! -- AlsonKemp<br />
<br />
==== Serializing to Xml ====<br />
<br />
We can create an HXT tree from a single-layer data class as follows:<br />
<br />
<haskell><br />
import IO<br />
import Char<br />
import Text.XML.HXT.Arrow<br />
import Data.Generics<br />
<br />
-- our data class we'll convert into xml<br />
data Config = <br />
Config { username :: String,<br />
logNumDays :: Int,<br />
oleDbString :: String }<br />
deriving (Show, Typeable,Data)<br />
<br />
-- helper function adapted from http://www.defmacro.org/ramblings/haskell-web.html<br />
-- (gshow replaced by gshow')<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- function to create HXT tree arrow from single-layer Haskell data type:<br />
createHxtArrow object = runLA( constA ( xmlSerialize object ) >>> xread)<br />
<br />
-- create a config object to serialize:<br />
<br />
createConfig = Config { username = "test", logNumDays = 3, oleDbString = "qsdf" }<br />
<br />
-- test function, using our Config data type<br />
testConversion = createHxtArrow( createConfig ) ()<br />
</haskell><br />
<br />
-- hughperkins<br />
<br />
==== Deserializing from Xml ====<br />
<br />
Here's a solution to deserialize a simple haskell data type containing Strings and Ints.<br />
<br />
It's not really pretty, but it works.<br />
<br />
Basically, we just convert the incoming xml into gread-compatible format, then use gread :-D<br />
<br />
Currently it works for a simple single-layer Haskell data type containing Ints and Strings. You can add new child data types by adding to the case statement in xmlToGShowFormat.<br />
<br />
If someone has a more elegant solution, please let me know ( hughperkins@gmail.com )<br />
<br />
<haskell><br />
module ParseXml<br />
where<br />
<br />
import IO<br />
import Char<br />
import List<br />
import Maybe<br />
import Data.Generics hiding (Unit)<br />
import Text.XML.HXT.Arrow hiding (when)<br />
<br />
data Config = Config{ name :: String, age :: Int } <br />
--data Config = Config{ age :: Int } <br />
deriving( Data, Show, Typeable, Ord, Eq, Read )<br />
<br />
createConfig = Config "qsdfqsdf" 3<br />
--createConfig = Config 3<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- helper function from http://www.defmacro.org/ramblings/haskell-web.html<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- parse xml to HXT tree, and obtain the value of node "fieldname"<br />
-- returns a string<br />
getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist)<br />
| otherwise = Nothing<br />
where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[]<br />
<br />
-- parse templateobject to get list of field names<br />
-- apply these to xml to get list of values<br />
-- return (fieldnames list, value list)<br />
xmlToGShowFormat :: Data a => String -> a -> String<br />
xmlToGShowFormat xml templateobject = <br />
go<br />
where mainconstructorname = (showConstr $ toConstr templateobject)<br />
fields = constrFields $ toConstr templateobject<br />
values = map ( \fieldname -> getValue xml fieldname ) fields<br />
datatypes = gmapQ (dataTypeOf) templateobject<br />
constrs = gmapQ (toConstr) templateobject<br />
datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject<br />
fieldtogshowformat (value,datatyperep) = case datatyperep of<br />
IntRep -> "(" ++ fromJust value ++ ")"<br />
_ -> show(fromJust value)<br />
formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps)<br />
go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")"<br />
<br />
xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject)<br />
<br />
dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config<br />
dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config<br />
</haskell><br />
<br />
-- hughperkins</div>Joehttps://wiki.haskell.org/index.php?title=HXT&diff=16959HXT2007-11-22T21:24:16Z<p>Joe: fixed typo</p>
<hr />
<div>[[Category:Tools]] [[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/HXT-7.0.tar.gz HXT-7.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
type NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTrees]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people do know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
When used to combine predicate filters, the <hask>(>>>)</hask> serves as a logical "and" operator or, from the relational view, as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the locical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a non-empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulating trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. These traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops, when a subtree is found because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop, when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting the<br />
special arrows into the more general ones.<br />
<br />
Don't worry about all these conceptional details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the apropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attributes values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containg a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relativ HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
''' to be done '''<br />
<br />
=== Automatic read/writing between xml and Haskell data types ===<br />
<br />
'''Question''': is there any way to write/read Haskell types to/from XML in HXT? HaXml has readXml and showXml, but I can't find any similar mechanism in HXT. Help! -- AlsonKemp<br />
<br />
==== Serializing to Xml ====<br />
<br />
We can create an HXT tree from a single-layer data class as follows:<br />
<br />
<haskell><br />
import IO<br />
import Char<br />
import Text.XML.HXT.Arrow<br />
import Data.Generics<br />
<br />
-- our data class we'll convert into xml<br />
data Config = <br />
Config { username :: String,<br />
logNumDays :: Int,<br />
oleDbString :: String }<br />
deriving (Show, Typeable,Data)<br />
<br />
-- helper function adapted from http://www.defmacro.org/ramblings/haskell-web.html<br />
-- (gshow replaced by gshow')<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- function to create HXT tree arrow from single-layer Haskell data type:<br />
createHxtArrow object = runLA( constA ( xmlSerialize object ) >>> xread)<br />
<br />
-- create a config object to serialize:<br />
<br />
createConfig = Config { username = "test", logNumDays = 3, oleDbString = "qsdf" }<br />
<br />
-- test function, using our Config data type<br />
testConversion = createHxtArrow( createConfig ) ()<br />
</haskell><br />
<br />
-- hughperkins<br />
<br />
==== Deserializing from Xml ====<br />
<br />
Here's a solution to deserialize a simple haskell data type containing Strings and Ints.<br />
<br />
It's not really pretty, but it works.<br />
<br />
Basically, we just convert the incoming xml into gread-compatible format, then use gread :-D<br />
<br />
Currently it works for a simple single-layer Haskell data type containing Ints and Strings. You can add new child data types by adding to the case statement in xmlToGShowFormat.<br />
<br />
If someone has a more elegant solution, please let me know ( hughperkins@gmail.com )<br />
<br />
<haskell><br />
module ParseXml<br />
where<br />
<br />
import IO<br />
import Char<br />
import List<br />
import Maybe<br />
import Data.Generics hiding (Unit)<br />
import Text.XML.HXT.Arrow hiding (when)<br />
<br />
data Config = Config{ name :: String, age :: Int } <br />
--data Config = Config{ age :: Int } <br />
deriving( Data, Show, Typeable, Ord, Eq, Read )<br />
<br />
createConfig = Config "qsdfqsdf" 3<br />
--createConfig = Config 3<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- helper function from http://www.defmacro.org/ramblings/haskell-web.html<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- parse xml to HXT tree, and obtain the value of node "fieldname"<br />
-- returns a string<br />
getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist)<br />
| otherwise = Nothing<br />
where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[]<br />
<br />
-- parse templateobject to get list of field names<br />
-- apply these to xml to get list of values<br />
-- return (fieldnames list, value list)<br />
xmlToGShowFormat :: Data a => String -> a -> String<br />
xmlToGShowFormat xml templateobject = <br />
go<br />
where mainconstructorname = (showConstr $ toConstr templateobject)<br />
fields = constrFields $ toConstr templateobject<br />
values = map ( \fieldname -> getValue xml fieldname ) fields<br />
datatypes = gmapQ (dataTypeOf) templateobject<br />
constrs = gmapQ (toConstr) templateobject<br />
datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject<br />
fieldtogshowformat (value,datatyperep) = case datatyperep of<br />
IntRep -> "(" ++ fromJust value ++ ")"<br />
_ -> show(fromJust value)<br />
formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps)<br />
go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")"<br />
<br />
xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject)<br />
<br />
dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config<br />
dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config<br />
</haskell><br />
<br />
-- hughperkins</div>Joehttps://wiki.haskell.org/index.php?title=HXT&diff=16958HXT2007-11-22T21:22:01Z<p>Joe: fixed typo</p>
<hr />
<div>[[Category:Tools]] [[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/HXT-7.0.tar.gz HXT-7.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
type NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTrees]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people do know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
When used to combine predicate filters, the <hask>(>>>)</hask> serves as a logical "and" operator or, from the relational view, as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the locical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a non-empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulating trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. Theses traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops, when a subtree is found because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop, when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting the<br />
special arrows into the more general ones.<br />
<br />
Don't worry about all these conceptional details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the apropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attributes values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containg a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relativ HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
''' to be done '''<br />
<br />
=== Automatic read/writing between xml and Haskell data types ===<br />
<br />
'''Question''': is there any way to write/read Haskell types to/from XML in HXT? HaXml has readXml and showXml, but I can't find any similar mechanism in HXT. Help! -- AlsonKemp<br />
<br />
==== Serializing to Xml ====<br />
<br />
We can create an HXT tree from a single-layer data class as follows:<br />
<br />
<haskell><br />
import IO<br />
import Char<br />
import Text.XML.HXT.Arrow<br />
import Data.Generics<br />
<br />
-- our data class we'll convert into xml<br />
data Config = <br />
Config { username :: String,<br />
logNumDays :: Int,<br />
oleDbString :: String }<br />
deriving (Show, Typeable,Data)<br />
<br />
-- helper function adapted from http://www.defmacro.org/ramblings/haskell-web.html<br />
-- (gshow replaced by gshow')<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- function to create HXT tree arrow from single-layer Haskell data type:<br />
createHxtArrow object = runLA( constA ( xmlSerialize object ) >>> xread)<br />
<br />
-- create a config object to serialize:<br />
<br />
createConfig = Config { username = "test", logNumDays = 3, oleDbString = "qsdf" }<br />
<br />
-- test function, using our Config data type<br />
testConversion = createHxtArrow( createConfig ) ()<br />
</haskell><br />
<br />
-- hughperkins<br />
<br />
==== Deserializing from Xml ====<br />
<br />
Here's a solution to deserialize a simple haskell data type containing Strings and Ints.<br />
<br />
It's not really pretty, but it works.<br />
<br />
Basically, we just convert the incoming xml into gread-compatible format, then use gread :-D<br />
<br />
Currently it works for a simple single-layer Haskell data type containing Ints and Strings. You can add new child data types by adding to the case statement in xmlToGShowFormat.<br />
<br />
If someone has a more elegant solution, please let me know ( hughperkins@gmail.com )<br />
<br />
<haskell><br />
module ParseXml<br />
where<br />
<br />
import IO<br />
import Char<br />
import List<br />
import Maybe<br />
import Data.Generics hiding (Unit)<br />
import Text.XML.HXT.Arrow hiding (when)<br />
<br />
data Config = Config{ name :: String, age :: Int } <br />
--data Config = Config{ age :: Int } <br />
deriving( Data, Show, Typeable, Ord, Eq, Read )<br />
<br />
createConfig = Config "qsdfqsdf" 3<br />
--createConfig = Config 3<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- helper function from http://www.defmacro.org/ramblings/haskell-web.html<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- parse xml to HXT tree, and obtain the value of node "fieldname"<br />
-- returns a string<br />
getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist)<br />
| otherwise = Nothing<br />
where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[]<br />
<br />
-- parse templateobject to get list of field names<br />
-- apply these to xml to get list of values<br />
-- return (fieldnames list, value list)<br />
xmlToGShowFormat :: Data a => String -> a -> String<br />
xmlToGShowFormat xml templateobject = <br />
go<br />
where mainconstructorname = (showConstr $ toConstr templateobject)<br />
fields = constrFields $ toConstr templateobject<br />
values = map ( \fieldname -> getValue xml fieldname ) fields<br />
datatypes = gmapQ (dataTypeOf) templateobject<br />
constrs = gmapQ (toConstr) templateobject<br />
datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject<br />
fieldtogshowformat (value,datatyperep) = case datatyperep of<br />
IntRep -> "(" ++ fromJust value ++ ")"<br />
_ -> show(fromJust value)<br />
formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps)<br />
go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")"<br />
<br />
xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject)<br />
<br />
dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config<br />
dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config<br />
</haskell><br />
<br />
-- hughperkins</div>Joehttps://wiki.haskell.org/index.php?title=HXT&diff=16957HXT2007-11-22T21:18:54Z<p>Joe: grammar and clarity</p>
<hr />
<div>[[Category:Tools]] [[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/HXT-7.0.tar.gz HXT-7.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
type NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTrees]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people do know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
When used to combine predicate filters, the <hask>(>>>)</hask> serves as a logical "and" operator or, from the relational view, as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the locical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a non-empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulation trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. Theses traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops, when a subtree is found because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop, when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting the<br />
special arrows into the more general ones.<br />
<br />
Don't worry about all these conceptional details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the apropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attributes values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containg a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relativ HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
''' to be done '''<br />
<br />
=== Automatic read/writing between xml and Haskell data types ===<br />
<br />
'''Question''': is there any way to write/read Haskell types to/from XML in HXT? HaXml has readXml and showXml, but I can't find any similar mechanism in HXT. Help! -- AlsonKemp<br />
<br />
==== Serializing to Xml ====<br />
<br />
We can create an HXT tree from a single-layer data class as follows:<br />
<br />
<haskell><br />
import IO<br />
import Char<br />
import Text.XML.HXT.Arrow<br />
import Data.Generics<br />
<br />
-- our data class we'll convert into xml<br />
data Config = <br />
Config { username :: String,<br />
logNumDays :: Int,<br />
oleDbString :: String }<br />
deriving (Show, Typeable,Data)<br />
<br />
-- helper function adapted from http://www.defmacro.org/ramblings/haskell-web.html<br />
-- (gshow replaced by gshow')<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- function to create HXT tree arrow from single-layer Haskell data type:<br />
createHxtArrow object = runLA( constA ( xmlSerialize object ) >>> xread)<br />
<br />
-- create a config object to serialize:<br />
<br />
createConfig = Config { username = "test", logNumDays = 3, oleDbString = "qsdf" }<br />
<br />
-- test function, using our Config data type<br />
testConversion = createHxtArrow( createConfig ) ()<br />
</haskell><br />
<br />
-- hughperkins<br />
<br />
==== Deserializing from Xml ====<br />
<br />
Here's a solution to deserialize a simple haskell data type containing Strings and Ints.<br />
<br />
It's not really pretty, but it works.<br />
<br />
Basically, we just convert the incoming xml into gread-compatible format, then use gread :-D<br />
<br />
Currently it works for a simple single-layer Haskell data type containing Ints and Strings. You can add new child data types by adding to the case statement in xmlToGShowFormat.<br />
<br />
If someone has a more elegant solution, please let me know ( hughperkins@gmail.com )<br />
<br />
<haskell><br />
module ParseXml<br />
where<br />
<br />
import IO<br />
import Char<br />
import List<br />
import Maybe<br />
import Data.Generics hiding (Unit)<br />
import Text.XML.HXT.Arrow hiding (when)<br />
<br />
data Config = Config{ name :: String, age :: Int } <br />
--data Config = Config{ age :: Int } <br />
deriving( Data, Show, Typeable, Ord, Eq, Read )<br />
<br />
createConfig = Config "qsdfqsdf" 3<br />
--createConfig = Config 3<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- helper function from http://www.defmacro.org/ramblings/haskell-web.html<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- parse xml to HXT tree, and obtain the value of node "fieldname"<br />
-- returns a string<br />
getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist)<br />
| otherwise = Nothing<br />
where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[]<br />
<br />
-- parse templateobject to get list of field names<br />
-- apply these to xml to get list of values<br />
-- return (fieldnames list, value list)<br />
xmlToGShowFormat :: Data a => String -> a -> String<br />
xmlToGShowFormat xml templateobject = <br />
go<br />
where mainconstructorname = (showConstr $ toConstr templateobject)<br />
fields = constrFields $ toConstr templateobject<br />
values = map ( \fieldname -> getValue xml fieldname ) fields<br />
datatypes = gmapQ (dataTypeOf) templateobject<br />
constrs = gmapQ (toConstr) templateobject<br />
datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject<br />
fieldtogshowformat (value,datatyperep) = case datatyperep of<br />
IntRep -> "(" ++ fromJust value ++ ")"<br />
_ -> show(fromJust value)<br />
formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps)<br />
go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")"<br />
<br />
xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject)<br />
<br />
dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config<br />
dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config<br />
</haskell><br />
<br />
-- hughperkins</div>Joehttps://wiki.haskell.org/index.php?title=User:Joe&diff=16955User:Joe2007-11-22T21:14:27Z<p>Joe: identify myself</p>
<hr />
<div>http://joeedmonds.com/</div>Joehttps://wiki.haskell.org/index.php?title=HXT&diff=16954HXT2007-11-22T21:10:53Z<p>Joe: grammar and clarity</p>
<hr />
<div>[[Category:Tools]] [[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/HXT-7.0.tar.gz HXT-7.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
type NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTrees]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people do know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
When used to combine predicate filters, the <hask>(>>>)</hask> serves as a logical and operator, or from the relational view as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the locical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a non-empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulation trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. Theses traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops, when a subtree is found because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop, when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting the<br />
special arrows into the more general ones.<br />
<br />
Don't worry about all these conceptional details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the apropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attributes values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containg a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relativ HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
''' to be done '''<br />
<br />
=== Automatic read/writing between xml and Haskell data types ===<br />
<br />
'''Question''': is there any way to write/read Haskell types to/from XML in HXT? HaXml has readXml and showXml, but I can't find any similar mechanism in HXT. Help! -- AlsonKemp<br />
<br />
==== Serializing to Xml ====<br />
<br />
We can create an HXT tree from a single-layer data class as follows:<br />
<br />
<haskell><br />
import IO<br />
import Char<br />
import Text.XML.HXT.Arrow<br />
import Data.Generics<br />
<br />
-- our data class we'll convert into xml<br />
data Config = <br />
Config { username :: String,<br />
logNumDays :: Int,<br />
oleDbString :: String }<br />
deriving (Show, Typeable,Data)<br />
<br />
-- helper function adapted from http://www.defmacro.org/ramblings/haskell-web.html<br />
-- (gshow replaced by gshow')<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- function to create HXT tree arrow from single-layer Haskell data type:<br />
createHxtArrow object = runLA( constA ( xmlSerialize object ) >>> xread)<br />
<br />
-- create a config object to serialize:<br />
<br />
createConfig = Config { username = "test", logNumDays = 3, oleDbString = "qsdf" }<br />
<br />
-- test function, using our Config data type<br />
testConversion = createHxtArrow( createConfig ) ()<br />
</haskell><br />
<br />
-- hughperkins<br />
<br />
==== Deserializing from Xml ====<br />
<br />
Here's a solution to deserialize a simple haskell data type containing Strings and Ints.<br />
<br />
It's not really pretty, but it works.<br />
<br />
Basically, we just convert the incoming xml into gread-compatible format, then use gread :-D<br />
<br />
Currently it works for a simple single-layer Haskell data type containing Ints and Strings. You can add new child data types by adding to the case statement in xmlToGShowFormat.<br />
<br />
If someone has a more elegant solution, please let me know ( hughperkins@gmail.com )<br />
<br />
<haskell><br />
module ParseXml<br />
where<br />
<br />
import IO<br />
import Char<br />
import List<br />
import Maybe<br />
import Data.Generics hiding (Unit)<br />
import Text.XML.HXT.Arrow hiding (when)<br />
<br />
data Config = Config{ name :: String, age :: Int } <br />
--data Config = Config{ age :: Int } <br />
deriving( Data, Show, Typeable, Ord, Eq, Read )<br />
<br />
createConfig = Config "qsdfqsdf" 3<br />
--createConfig = Config 3<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- helper function from http://www.defmacro.org/ramblings/haskell-web.html<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- parse xml to HXT tree, and obtain the value of node "fieldname"<br />
-- returns a string<br />
getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist)<br />
| otherwise = Nothing<br />
where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[]<br />
<br />
-- parse templateobject to get list of field names<br />
-- apply these to xml to get list of values<br />
-- return (fieldnames list, value list)<br />
xmlToGShowFormat :: Data a => String -> a -> String<br />
xmlToGShowFormat xml templateobject = <br />
go<br />
where mainconstructorname = (showConstr $ toConstr templateobject)<br />
fields = constrFields $ toConstr templateobject<br />
values = map ( \fieldname -> getValue xml fieldname ) fields<br />
datatypes = gmapQ (dataTypeOf) templateobject<br />
constrs = gmapQ (toConstr) templateobject<br />
datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject<br />
fieldtogshowformat (value,datatyperep) = case datatyperep of<br />
IntRep -> "(" ++ fromJust value ++ ")"<br />
_ -> show(fromJust value)<br />
formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps)<br />
go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")"<br />
<br />
xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject)<br />
<br />
dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config<br />
dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config<br />
</haskell><br />
<br />
-- hughperkins</div>Joehttps://wiki.haskell.org/index.php?title=HXT&diff=16950HXT2007-11-22T16:46:06Z<p>Joe: grammar improvements</p>
<hr />
<div>[[Category:Tools]] [[Category:Tutorials]]<br />
<br />
== A gentle introduction to the Haskell XML Toolbox ==<br />
<br />
The [http://www.fh-wedel.de/~si/HXmlToolbox/index.html Haskell XML Toolbox (HXT)] is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language consisting of a set of combinators for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator.<br />
<br />
__TOC__<br />
<br />
== Background ==<br />
<br />
The Haskell XML Toolbox bases on the ideas of [http://www.cs.york.ac.uk/fp/HaXml/ HaXml] and [http://www.flightlab.com/~joe/hxml/ HXML] but introduces a more general approach for processing XML with Haskell. HXT uses a generic data model for representing XML documents, including the DTD subset, entity references, CData parts and processing instructions. This data model makes it possible to use tree transformation functions as a uniform design of XML processing steps from parsing, DTD processing, entity processing, validation, namespace propagation, content processing and output.<br />
<br />
== Resources ==<br />
<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/index.html HXT Home] :<br />
; [http://www.fh-wedel.de/~si/HXmlToolbox/HXT-7.0.tar.gz HXT-7.0.tar.gz] : lastest release<br />
; [http://darcs.fh-wedel.de/hxt/ darcs.fh-wedel.de/hxt] : darcs repository with head revision of HXT<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc_arrow/ Arrow API] : Haddock documentation of head revision with links to source files<br />
; [http://darcs.fh-wedel.de/hxt/doc/hdoc/ Complete API] : Haddock documentation with arrows and old API based on filters<br />
<br />
== The basic concepts ==<br />
<br />
=== The basic data structures ===<br />
<br />
Processing of XML is a task of processing tree structures. This is can be done in Haskell in a very elegant way by defining an appropriate tree data type, a Haskell DOM (document object model) structure. The tree structure in HXT is a rose tree with a special XNode data type for storing the XML node information.<br />
<br />
The generally useful tree structure (NTree) is separated from the node type (XNode). This allows for reusing the tree structure and the tree traversal and manipulation functions in other applications.<br />
<br />
<haskell><br />
type NTree a = NTree a [NTree a] -- rose tree<br />
<br />
data XNode = XText String -- plain text node<br />
| ...<br />
| XTag QName XmlTrees -- element name and list of attributes<br />
| XAttr QName -- attribute name<br />
| ...<br />
<br />
type QName = ... -- qualified name<br />
<br />
type XmlTree = NTree XNode<br />
<br />
type XmlTrees = [XmlTree]<br />
</haskell><br />
<br />
=== The concept of filters ===<br />
<br />
Selecting, transforming and generating trees often requires routines, which compute not only a single result tree, but a (possibly empty) list of (sub-)trees. This leads to the idea of XML filters like in HaXml. Filters are functions, which take an XML tree as input and compute a list of result trees.<br />
<br />
<haskell><br />
type XmlFilter = XmlTree -> [XmlTree]<br />
</haskell><br />
<br />
More generally we can define a filter as<br />
<br />
<haskell><br />
type Filter a b = a -> [b]<br />
</haskell><br />
<br />
We will do this abstraction later, when introducing arrows. Many of the functions in the following motivating examples can be generalised this way. But for getting the idea, the <hask>XmlFilter</hask> is sufficient.<br />
<br />
The filter functions are used so frequently, that the idea of defining a domain specific language with filters as the basic processing units comes up. In such a DSL the basic filters are predicates, selectors, constructors and transformers, all working on the HXT DOM tree structure. For a DSL it becomes necessary to define an appropriate set of combinators for building more complex functions from simpler ones. Of course filter composition, like (.) becomes one of the most frequently used combinators. there are more complex filters for traversal of a whole tree and selection or transformation of several nodes. We will see a few first examples in the following part.<br />
<br />
The first task is to build filters from pure functions, to define a lift operator. Pure functions are lifted to filters in the following way:<br />
<br />
Predicates are lifted by mapping False to the empty list and True to the single element list, containing the input tree.<br />
<br />
<haskell><br />
p :: XmlTree -> Bool -- pure function<br />
p t = ...<br />
<br />
pf :: XmlTree -> [XmlTree] -- or XmlFilter<br />
pf t<br />
| p t = [t]<br />
| otherwise = []<br />
</haskell><br />
<br />
The combinator for this type of lifting is called <hask>isA</hask>, it works on any type and is defined as<br />
<br />
<haskell><br />
isA :: (a -> Bool) -> (a -> [a])<br />
isA p x<br />
| p x = [x]<br />
| otherwise = []<br />
</haskell><br />
<br />
A predicate for filtering text nodes looks like this<br />
<br />
<haskell><br />
isXText :: XmlFilter -- XmlTree -> [XmlTrees]<br />
isXText t@(NTree (XText _) _) = [t]<br />
isXText _ = []<br />
</haskell><br />
<br />
Transformers -- functions that map a tree into another tree -- are lifted in a trivial way:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t = exp(t)<br />
<br />
ff :: XmlTree -> [XmlTree]<br />
ff t = [exp(t)]<br />
</haskell><br />
<br />
This basic function is called <hask>arr</hask>, it comes from the Control.Arrow module of the basic library package of ghc.<br />
<br />
Partial functions, functions that can't always compute a result, are usually lifted to totally defined filters:<br />
<br />
<haskell><br />
f :: XmlTree -> XmlTree<br />
f t<br />
| p t = expr(t)<br />
| otherwise = error "f not defined"<br />
<br />
ff :: XmlFilter<br />
ff t<br />
| p t = [expr(t)]<br />
| otherwise = []<br />
</haskell><br />
<br />
This is a rather comfortable situation, with these filters we don't have to deal with illegal argument errors. Illegal arguments are just mapped to the empty list.<br />
<br />
When processing trees, there's often the case, that no, exactly one, or more than one result is possible. These functions, returning a set of results are often a bit imprecisely called ''nondeterministic'' functions. These functions, e.g. selecting all children of a node or all grandchildren, are exactly our filters. In this context lists instead of sets of values are the appropriate result type, because the ordering in XML is important and duplicates are possible.<br />
<br />
Working with filters is rather similar to working with binary relations, and working with relations is rather natural and comfortable, database people do know this very well.<br />
<br />
Two first examples for working with ''nondeterministic'' functions are selecting the children and the grandchildren of an XmlTree which can be implemented by<br />
<br />
<haskell><br />
getChildren :: XmlFilter<br />
getChildren (NTree n cs)<br />
= cs<br />
<br />
getGrandChildren :: XmlFilter<br />
getGrandChildren (NTree n cs)<br />
= concat [ getChildren c | c <- cs ]<br />
</haskell><br />
<br />
=== Filter combinators ===<br />
<br />
Composition of filters (like function composition) is the most important combinator. We will use the infix operator <hask>(>>>)</hask> for filter composition and reverse the arguments, so we can read composition sequences from left to right, like with pipes in Unix. Composition is defined as follows:<br />
<br />
<haskell><br />
(>>>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f >>> g) t = concat [g t' | t' <- f t]<br />
</haskell><br />
<br />
This definition corresponds 1-1 to the composition of binary relations. With help of the <hask>(>>>)</hask> operator the definition of <hask>getGrandChildren</hask> becomes rather simple:<br />
<br />
<haskell><br />
getGrandChildren :: XmlFilter<br />
getGrandChildren = getChildren >>> getChildren<br />
</haskell><br />
<br />
Selecting all text nodes of the children of an element can also be formulated very easily with the help of <hask>(>>>)</hask><br />
<br />
<haskell><br />
getTextChildren :: XmlFilter<br />
getTextChildren = getChildren >>> isXText<br />
</haskell><br />
<br />
In case of predicate filter the <hask>(>>>)</hask> serves as a logical and operator, or from the relational view as an intersection operator: <hask>isA p1 >>> isA p2</hask> selects all values for which p1 and p2 both hold.<br />
<br />
The dual operator to <hask>(>>>)</hask> is the logical or, (thinking in sets: The union operator). For this we define a sum operator <hask>(<+>)</hask>. The sum of two filters is defined as follows:<br />
<br />
<haskell><br />
(<+>) :: XmlFilter -> XmlFilter -> XmlFilter<br />
<br />
(f <+> g) t = f t ++ g t<br />
</haskell><br />
<br />
Example: <hask>isA p1 <+> isA p2</hask> is the locical or for filter.<br />
<br />
Combining elementary filters with (>>>) and (<+>) leads to more complex functionality. For example, selecting all text nodes within two levels of depth (in left to right order) can be formulated with:<br />
<br />
<haskell><br />
getTextChildren2 :: XmlFilter<br />
getTextChildren2 = getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
</haskell><br />
<br />
'''Exercise:''' Are these filters equivalent or what's the difference between the two filters?<br />
<br />
<haskell><br />
getChildren >>> ( isXText <+> ( getChildren >>> isXText ) )<br />
<br />
( getChildren >>> isXText ) <+> ( getChildren >>> getChildren >>> isXText )<br />
</haskell><br />
<br />
Of course we need choice combinators. The first idea is an if-then-else filter, <br />
built up from three simpler filters. But often it's easier and more elegant to work with simpler binary combinators for choice. So we will introduce the simpler ones first.<br />
<br />
One of these choice combinators is called <hask>orElse</hask> and is defined as<br />
follows:<br />
<br />
<haskell><br />
orElse :: XmlFilter -> XmlFilter -> XmlFilter<br />
orElse f g t<br />
| null res1 = g t<br />
| otherwise = res1<br />
where<br />
res1 = f t<br />
</haskell><br />
<br />
The meaning is the following: If f computes a none empty list as result, f succeeds and this list is the result, else g is applied to the input and this yields the result. There are two other simple choice combinators usually written in infix notation, <hask> g `guards` f</hask> and <hask>f `when` g</hask>:<br />
<br />
<haskell><br />
guards :: XmlFilter -> XmlFilter -> XmlFilter<br />
guards g f t<br />
| null (g t) = []<br />
| otherwise = f t<br />
<br />
when :: XmlFilter -> XmlFilter -> XmlFilter<br />
when f g t<br />
| null (g t) = [t]<br />
| otherwise = f t<br />
</haskell><br />
<br />
These choice operators become useful when transforming and manipulation trees.<br />
<br />
=== Tree traversal filter ===<br />
<br />
A very basic operation on tree structures is the traversal of all nodes and the selection and/or transformation of nodes. Theses traversal filters serve as control structures for processing whole trees. They correspond to the map and fold combinators for lists.<br />
<br />
The simplest traversal filter does a top down search of all nodes with a special feature. This filter, called <hask>deep</hask>, is defined as follows:<br />
<br />
<haskell><br />
deep :: XmlFilter -> XmlFilter<br />
deep f = f `orElse` (getChildren >>> deep f)<br />
</haskell><br />
<br />
When a predicate filter is applied to <hask>deep</hask>, a top down search is done and all subtrees satisfying the predicate are collected. The descent into the tree stops, when a subtree is found because of the use of <hask>orElse</hask>.<br />
<br />
'''Example:''' Selecting all plain text nodes of a document can be formulated with:<br />
<br />
<haskell><br />
deep isXText<br />
</haskell><br />
<br />
'''Example:''' Selecting all "top level" tables in a HTML documents looks like<br />
this:<br />
<br />
<haskell><br />
deep (isElem >>> hasName "table")<br />
</haskell><br />
<br />
A variant of <hask>deep</hask>, called <hask>multi</hask>, performs a complete search, where the tree traversal does not stop, when a node is found.<br />
<br />
<haskell><br />
multi :: XmlFilter -> XmlFilter<br />
multi f = f <+> (getChildren >>> multi f)<br />
</haskell><br />
<br />
'''Example:''' Selecting all tables in a HTML document, even nested ones, <hask>multi</hask> has to be used instead of <hask>deep</hask>:<br />
<br />
<hask>multi (isElem >>> hasName "table")</hask><br />
<br />
=== Arrows ===<br />
<br />
We've already seen, that the filters <hask>a -> [b]</hask> are a very<br />
powerful and sometimes a more elegant way to process XML than pure<br />
function. This is the good news. The bad news is, that filter are not<br />
general enough. Of course we sometimes want to do some I/O and we want<br />
to stay in the filter level. So we need something like<br />
<br />
<haskell><br />
type XmlIOFilter = XmlTree -> IO [XmlTree]<br />
</haskell><br />
<br />
for working in the IO monad.<br />
<br />
Sometimes it's appropriate to thread some state through the computation<br />
like in state monads. This leads to a type like<br />
<br />
<haskell><br />
type XmlStateFilter state = state -> XmlTree -> (state, [XmlTree])<br />
</haskell><br />
<br />
And in real world applications we need both extensions at the same<br />
time. Of course I/O is necessary but usually there are also some<br />
global options and variables for controlling the computations. In HXT,<br />
for instance there are variables for controlling trace output, options<br />
for setting the default encoding scheme for input data and a base URI<br />
for accessing documents, which are addressed in a content or in a DTD<br />
part by relative URIs. So we need something like<br />
<br />
<haskell><br />
type XmlIOStateFilter state = state -> XmlTree -> IO (state, [XmlTree])<br />
</haskell><br />
<br />
We want to work with all four filter variants, and in the future<br />
perhaps with even more general filters, but of course not with four<br />
sets of filter names, e.g. <hask>deep, deepST, deepIO, deepIOST</hask>.<br />
<br />
This is the point where <hask>newtype</hask>s and <hask>class</hask>es<br />
come in. Classes are needed for overloading names and<br />
<hask>newtype</hask>s are needed to declare instances. Further the<br />
restriction of <hask>XmlTree</hask> as argument and result type is<br />
not neccessary and hinders reuse in many cases.<br />
<br />
A filter discussed above has all features of an arrow. Arrows are<br />
introduced for generalising the concept of functions and function<br />
combination to more general kinds of computation than pure functions.<br />
<br />
A basic set of combinators for arrows is defined in the classes in the<br />
<hask>Control.Arrow</hask> module, containing the above mentioned <hask>(>>>), (<+>), arr</hask>.<br />
<br />
In HXT the additional classes for filters working with lists as result type are<br />
defined in <hask>Control.Arrow.ArrowList</hask>. The choice operators are<br />
in <hask>Control.Arrow.ArrowIf</hask>, tree filters, like <hask>getChildren, deep, multi, ...</hask> in<br />
<hask>Control.Arrow.ArrowTree</hask> and the elementary XML specific<br />
filters in <hask>Text.XML.HXT.XmlArrow</hask>.<br />
<br />
In HXT there are four types instantiated with these classes for<br />
pure list arrows, list arrows with a state, list arrows with IO<br />
and list arrows with a state and IO.<br />
<br />
<haskell><br />
newtype LA a b = LA { runLA :: (a -> [b]) }<br />
<br />
newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }<br />
<br />
newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }<br />
<br />
newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) }<br />
</haskell><br />
<br />
The first one and the last one are those used most frequently in the<br />
toolbox, and of course there are lifting functions for converting the<br />
special arrows into the more general ones.<br />
<br />
Don't worry about all these conceptional details. Let's have a look into some<br />
''Hello world'' examples.<br />
<br />
== Getting started: Hello world examples ==<br />
<br />
=== copyXML ===<br />
<br />
The first complete example is a program for<br />
copying an XML document<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
import System.Environment<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
[src, dst] <- getArgs<br />
runX ( readDocument [(a_validate, v_0)] src<br />
>>><br />
writeDocument [] dst<br />
)<br />
return ()<br />
</haskell><br />
<br />
The interesting part of this example is<br />
the call of <hask>runX</hask>. <hask>runX</hask> executes an<br />
arrow. This arrow is one of the more powerful list arrows with IO and<br />
a HXT system state.<br />
<br />
The arrow itself is a composition of <hask>readDocument</hask> and<br />
<hask>writeDocument</hask>.<br />
<hask>readDocument</hask> is an arrow for reading, DTD processing and<br />
validation of documents. Its behaviour can be controlled by a list of<br />
options. Here we turn off the validation step. The <hask>src</hask>, a file<br />
name or an URI is read and parsed and a document tree is built. This<br />
tree is ''piped'' into the output arrow. This one also is<br />
controlled by a set of options. Here all the defaults are used.<br />
<hask>writeDocument</hask> converts the tree into a string and writes<br />
it to the <hask>dst</hask>.<br />
<br />
We've omitted here the boring stuff of option parsing and error<br />
handling.<br />
<br />
Compilation and a test run looks like this:<br />
<br />
<pre><br />
hobel > ghc -o copyXml -package hxt CopyXML.hs<br />
hobel > cat hello.xml<br />
<hello>world</hello><br />
hobel > copyXml hello.xml -<br />
<?xml version="1.0" encoding="UTF-8"?><br />
<hello>world</hello><br />
hobel ><br />
</pre><br />
<br />
The mini XML document in file <tt>hello.xml</tt> is read and<br />
a document tree is built. Then this tree is converted into a string<br />
and written to standard output (filename: <tt>-</tt>). It is decorated<br />
with an XML declaration containing the version and the output<br />
encoding.<br />
<br />
For processing HTML documents there is a HTML parser, which tries to<br />
parse and interprete rather anything as HTML. The HTML parser can be<br />
selected by calling<br />
<br />
<hask>readDocument [(a_parse_html, v_1), ...]</hask><br />
<br />
with the apropriate option.<br />
<br />
=== Pattern for a main program ===<br />
<br />
A more realistic pattern for a simple Unix filter like program has<br />
the following structure:<br />
<br />
<haskell><br />
module Main<br />
where<br />
<br />
import Text.XML.HXT.Arrow<br />
<br />
import System.IO<br />
import System.Environment<br />
import System.Console.GetOpt<br />
import System.Exit<br />
<br />
main :: IO ()<br />
main<br />
= do<br />
argv <- getArgs<br />
(al, src, dst) <- cmdlineOpts argv<br />
[rc] <- runX (application al src dst)<br />
if rc >= c_err<br />
then exitWith (ExitFailure (0-1))<br />
else exitWith ExitSuccess<br />
<br />
-- | the dummy for the boring stuff of option evaluation,<br />
-- usually done with 'System.Console.GetOpt'<br />
<br />
cmdlineOpts :: [String] -> IO (Attributes, String, String)<br />
cmdlineOpts argv<br />
= return ([(a_validate, v_0)], argv!!0, argv!!1)<br />
<br />
-- | the main arrow<br />
<br />
application :: Attributes -> String -> String -> IOSArrow b Int<br />
application al src dst<br />
= readDocument al src<br />
>>><br />
processChildren (processDocumentRootElement `when` isElem) -- (1)<br />
>>><br />
writeDocument al dst<br />
>>><br />
getErrStatus<br />
<br />
<br />
-- | the dummy for the real processing: the identity filter<br />
<br />
processDocumentRootElement :: IOSArrow XmlTree XmlTree<br />
processDocumentRootElement<br />
= this -- substitute this by the real application<br />
</haskell><br />
<br />
This program has the same functionality as our first example,<br />
but it separates the arrow from the boring option evaluation and<br />
return code computation.<br />
<br />
The interesing line is (1).<br />
<hask>readDocument</hask> generates a tree structure with a so called extra<br />
root node. This root node is a node above the XML document root<br />
element. The node above the XML document root element is neccessary<br />
because of possible other elements on the same tree level as the XML<br />
root, for instance comments, processing instructions or whitespace.<br />
<br />
Furthermore the artificial root node serves for storing meta<br />
information about the document in the attribute list, like the<br />
document name, the encoding scheme, the HTTP transfer headers and<br />
other information.<br />
<br />
To process the real XML root element, we have to take the children of<br />
the root node, select the XML root element and process this, but<br />
remain all other children unchanged. This is done with<br />
<hask>processChildren</hask> and the <hask>when</hask> choice<br />
operator. <hask>processChildren</hask> applies a filter elementwise to<br />
all children of a node. All results form processing the list of children from<br />
the result node.<br />
<br />
The structure of internal document tree can be made visible<br />
e.g. by adding the option pair <hask>(a_show_tree, v_1)</hask> to the<br />
<hask>writeDocument</hask> arrow. This will emit the tree in a readable<br />
text representation instead of the real document.<br />
<br />
In the next section we will give examples for the<br />
<hask>processDocumentRootElement</hask> arrow.<br />
<br />
== Selection examples ==<br />
<br />
=== Selecting text from an HTML document ===<br />
<br />
Selecting all the plain text of an XML/HTML document<br />
can be formulated with<br />
<br />
<haskell><br />
selectAllText :: ArrowXml a => a XmlTree XmlTree<br />
selectAllText<br />
= deep isXText<br />
</haskell><br />
<br />
<hask>deep</hask> traverses the whole tree, stops the traversal when<br />
a node is a text node (<hask>isXText</hask>) and returns all the text nodes.<br />
There are two other traversal operators <hask>deepest</hask> and <hask>multi</hask>,<br />
In this case, where the selected nodes are all leaves, these would give the same result.<br />
<br />
=== Selecting text and ALT attribute values ===<br />
<br />
Let's take a bit more complex task: We want to select all text, but also the values of the <tt>alt</tt> attributes<br />
of image tags.<br />
<br />
<haskell><br />
selectAllTextAndAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndAltValues<br />
= deep<br />
( isXText -- (1)<br />
<+><br />
( isElem >>> hasName "img" -- (2)<br />
>>><br />
getAttrValue "alt" -- (3)<br />
>>><br />
mkText -- (4)<br />
)<br />
)<br />
</haskell><br />
<br />
The whole tree is searched for text nodes (1) and for image elements (2), from the image elements<br />
the alt attribute values are selected as plain text (3), this text is transformed into a text node (4).<br />
<br />
=== Selecting text and ALT attributes values (2) ===<br />
<br />
Let's refine the above filter one step further. The text from the alt attributes shall be marked in the output<br />
by surrounding double square brackets. Empty alt values shall be ignored.<br />
<br />
<haskell><br />
selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTree<br />
selectAllTextAndRealAltValues<br />
= deep<br />
( isXText<br />
<+><br />
( isElem >>> hasName "img"<br />
>>><br />
getAttrValue "alt"<br />
>>><br />
isA significant -- (1)<br />
>>><br />
arr addBrackets -- (2)<br />
>>><br />
mkText<br />
)<br />
)<br />
where<br />
significant :: String -> Bool<br />
significant = not . all (`elem` " \n\r\t")<br />
<br />
addBrackets :: String -> String<br />
addBrackets s<br />
= " [[ " ++ s ++ " ]] "<br />
</haskell><br />
<br />
This example shows two combinators for building arrows from pure functions.<br />
The first one <hask>isA</hask> removes all empty or whitespace values from alt attributes (1),<br />
the other <hask>arr</hask> lifts the editing function to the arrow level (2).<br />
<br />
== Document construction examples ==<br />
<br />
=== The ''Hello World'' document ===<br />
<br />
The first document, of course, is a ''Hello World'' document:<br />
<br />
<haskell><br />
helloWorld :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld<br />
= mkelem "html" [] -- (1)<br />
[ mkelem "head" []<br />
[ mkelem "title" []<br />
[ txt "Hello World" ] -- (2)<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ] -- (3)<br />
[ mkelem "h1" []<br />
[ txt "Hello World" ] -- (4)<br />
]<br />
]<br />
</haskell><br />
<br />
The main arrows for document construction are <hask>mkelem</hask><br />
and it's variants (<hask>selem, aelem, eelem</hask>) for element creation, <hask>attr</hask> and <hask>sattr</hask> for attributes and <hask>mktext</hask><br />
and <hask>txt</hask> for text nodes. <hask>mkelem</hask> takes three arguments, the element name (or tag name), a list of arrows for the construction of attributes, not empty in (3), and a list of arrows for the contents. Text content is generated in (2) and (4).<br />
<br />
To write this document to a file use the following arrow<br />
<br />
<haskell><br />
root [] [helloWorld] -- (1)<br />
>>><br />
writeDocument [(a_indent, v_1)] "hello.xml" -- (2)<br />
</haskell><br />
<br />
When this arrow is executed, the <hask>helloWorld</hask><br />
document is wrapped into a so called root node (1). This complete<br />
document is written to "hello.xml" (2).<br />
<hask>writeDocument</hask> and its variants always expect<br />
a whole document tree with such a root node. Before writing, the document is<br />
indented (<hask>(a_indent, v_1)</hask>)) by inserting extra whitespace<br />
text nodes, and an XML declaration with version and encoding is added. If the indent option is not given, the whole document would appears on a single line:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="UTF-8"?><br />
<html><br />
<head><br />
<title>Hello World</title><br />
</head><br />
<body class="haskell"><br />
<h1>Hello World</h1><br />
</body><br />
</html><br />
</pre><br />
<br />
The code can be shortened a bit by using some of the<br />
convenient functions:<br />
<br />
<haskell><br />
helloWorld2 :: ArrowXml a => a XmlTree XmlTree<br />
helloWorld2<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Hello World" ]<br />
]<br />
, mkelem "body"<br />
[ sattr "class" "haskell" ]<br />
[ selem "h1"<br />
[ txt "Hello World" ]<br />
]<br />
]<br />
</haskell><br />
<br />
In the above two examples the arrow input is totally ignored, because<br />
of the use of the constant arrow <hask>txt "..."</hask>.<br />
<br />
=== A page about all images within a HTML page ===<br />
<br />
A bit more interesting task is the construction of a page<br />
containg a table of all images within a page inclusive image URLs, geometry and ALT attributes.<br />
<br />
The program for this has a frame similar to the <hask>helloWorld</hask> program,<br />
but the rows of the table must be filled in from the input document.<br />
In the first step we will generate a table with a single column containing<br />
the URL of the image.<br />
<br />
<haskell><br />
imageTable :: ArrowXml a => a XmlTree XmlTree<br />
imageTable<br />
= selem "html"<br />
[ selem "head"<br />
[ selem "title"<br />
[ txt "Images in Page" ]<br />
]<br />
, selem "body"<br />
[ selem "h1"<br />
[ txt "Images in Page" ]<br />
, selem "table"<br />
[ collectImages -- (1)<br />
>>><br />
genTableRows -- (2)<br />
]<br />
]<br />
]<br />
where<br />
collectImages -- (1)<br />
= deep ( isElem<br />
>>><br />
hasName "img"<br />
)<br />
genTableRows -- (2)<br />
= selem "tr"<br />
[ selem "td"<br />
[ getAttrValue "src" >>> mkText ]<br />
]<br />
</haskell><br />
<br />
With (1) the image elements are collected, and with (2)<br />
the HTML code for an image element is built.<br />
<br />
Applied to <tt>http://www.haskell.org/</tt> we get the following result<br />
(at the time writing this page):<br />
<br />
<pre><br />
<html><br />
<head><br />
<title>Images in Page</title><br />
</head><br />
<body><br />
<h1>Images in Page</h1><br />
<table><br />
<tr><br />
<td>/haskellwiki_logo.png</td><br />
</tr><br />
<tr><br />
<td>/sitewiki/images/1/10/Haskelllogo-small.jpg</td><br />
</tr><br />
<tr><br />
<td>/haskellwiki_logo_small.png</td><br />
</tr><br />
</table><br />
</body><br />
</html><br />
</pre><br />
<br />
When generating HTML, often there are constant parts within the page,<br />
in the example e.g. the page header. It's possible to write these<br />
parts as a string containing plain HTML and then read this with<br />
a simple XML contents parser called <hask>xread</hask>.<br />
<br />
The example above could then be rewritten as<br />
<br />
<haskell><br />
imageTable<br />
= selem "html"<br />
[ pageHeader<br />
, ...<br />
]<br />
where<br />
pageHeader<br />
= constA "<head><title>Images in Page</title></head>"<br />
>>><br />
xread<br />
...<br />
</haskell><br />
<br />
<hask>xread</hask> is a very primitive arrow. It does not run in the<br />
IO monad, so it can be used in any context, but therefore the error handling<br />
is very limited. <hask>xread</hask> parses an XML element content.<br />
<br />
=== A page about all images within a HTML page: 1. Refinement ===<br />
<br />
The next refinement step is the extension of the table such that<br />
it contains four columns, one for the image itself, one for the URL,<br />
the geometry and the ALT text. The extended <hask>getTableRows</hask><br />
has the following form:<br />
<br />
<haskell><br />
genTableRows<br />
= selem "tr"<br />
[ selem "td" -- (1)<br />
[ this -- (1.1)<br />
]<br />
, selem "td" -- (2)<br />
[ getAttrValue "src"<br />
>>><br />
mkText<br />
>>><br />
mkelem "a" -- (2.1)<br />
[ attr "href" this ]<br />
[ this ]<br />
]<br />
, selem "td" -- (3)<br />
[ ( getAttrValue "width"<br />
&&& -- (3.1)<br />
getAttrValue "height"<br />
)<br />
>>><br />
arr2 geometry -- (3.2)<br />
>>><br />
mkText<br />
]<br />
, selem "td" -- (4)<br />
[ getAttrValue "alt"<br />
>>><br />
mkText<br />
]<br />
]<br />
where<br />
geometry :: String -> String -> String<br />
geometry "" ""<br />
= ""<br />
geometry w h<br />
= w ++ "x" ++ h<br />
</haskell><br />
<br />
In (1) the identity arrow <hask>this</hask> is used for<br />
inserting the whole image element (<hask>this</hask> value) into the first column.<br />
(2) is the column from the previous example but the URL has been made active<br />
by embedding the URL in an A-element (2.1). In (3) there are two<br />
new combinators, <hask>(&&&)</hask> (3.1) is an arrow for applying two<br />
arrows to the same input and combine the results into a pair. <hask>arr2</hask><br />
works like <hask>arr</hask> but it lifts a binary function into an arrow<br />
accepting a pair of values. <hask>arr2 f</hask> is a shortcut for<br />
<hask>arr (uncurry f)</hask>. So width and height are combined into an X11 like<br />
geometry spec. (4) adds the ALT-text.<br />
<br />
=== A page about all images within a HTML page: 2. Refinement ===<br />
<br />
The generated HTML page is not yet very useful, because it usually<br />
contains relativ HREFs to the images, so the links do not work.<br />
We have to transform the SRC attribute values into absolute URLs.<br />
This can be done with the following code:<br />
<br />
<haskell><br />
imageTable2 :: IOStateArrow s XmlTree XmlTree<br />
imageTable2<br />
= ...<br />
...<br />
, selem "table"<br />
[ collectImages<br />
>>><br />
mkAbsImageRef -- (1)<br />
>>><br />
genTableRows<br />
]<br />
...<br />
<br />
mkAbsImageRef :: IOStateArrow s XmlTree XmlTree -- (1)<br />
mkAbsImageRef<br />
= processAttrl ( mkAbsRef -- (2)<br />
`when`<br />
hasName "src" -- (3)<br />
)<br />
where<br />
mkAbsRef -- (4)<br />
= replaceChildren<br />
( xshow getChildren -- (5)<br />
>>><br />
( mkAbsURI `orElse` this ) -- (6)<br />
>>><br />
mkText -- (7)<br />
)<br />
</haskell><br />
<br />
The <hask>imageTable2</hask> is extended by an arrow <hask>mkAbsImageRef</hask><br />
(1). This arrow uses the global system state of HXT, in which the base URL<br />
of a document is stored. For editing the SRC attribute value, the attribute list<br />
of the image elements is processed with <hask>processAttrl</hask>.<br />
With the <hask>`when` hasName "src"</hask> only SRC attributes are manipulated (3). The real work is done in (4): The URL is selected with <hask>getChildren</hask>, a text node, and converted into a string (<hask>xshow</hask>), the URL is transformed into an absolute URL<br />
with <hask>mkAbsURI</hask> (6). This arrow may fail, e.g. in case of illegal<br />
URLs. In this case the URL remains unchanged (<hask>`orElse` this</hask>).<br />
The resulting String value is converted into a text node forming the new<br />
attribute value node (7).<br />
<br />
Because of the use of the use of the global HXT state in <hask>mkAbsURI</hask><br />
<hask>mkAbsRef</hask> and <hask>imageTable2</hask> need to have the more specialized signature <hask>IOStateArrow s XmlTree XmlTree</hask>.<br />
<br />
== Transformation examples ==<br />
<br />
=== Decorating external references of an HTML document ===<br />
<br />
In the following examples, we want to decorate the external references<br />
in an HTML page by a small icon, like it's done in many wikis.<br />
For this task the document tree has to be traversed, all parts<br />
except the intersting A-Elements remain unchanged. At the end of the list of children of an A-Element we add an image element.<br />
<br />
Here is the first version:<br />
<br />
<haskell><br />
addRefIcon :: ArrowXml a => a XmlTree XmlTree<br />
addRefIcon<br />
= processTopDown -- (1)<br />
( addImg -- (2)<br />
`when`<br />
isExternalRef -- (3)<br />
)<br />
where<br />
isExternalRef -- (4)<br />
= isElem<br />
>>><br />
hasName "a"<br />
>>><br />
hasAttr "href"<br />
>>><br />
getAttrValue "href"<br />
>>><br />
isA isExtRef<br />
where<br />
isExtRef -- (4.1)<br />
= isPrefixOf "http:" -- or something more precise<br />
<br />
addImg<br />
= replaceChildren -- (5)<br />
( getChildren -- (6)<br />
<+><br />
imgElement -- (7)<br />
)<br />
<br />
imgElement<br />
= mkelem "img" -- (8)<br />
[ sattr "src" "/icons/ref.png" -- (9)<br />
, sattr "alt" "external ref"<br />
] [] -- (10)<br />
</haskell><br />
<br />
The traversal is done with <hask>processTopDown</hask> (1).<br />
This arrow applies an arrow to all nodes of the whole document tree.<br />
The transformation arrow applies the <hask>addImg</hask> (2) to<br />
all A-elements (3),(4). This arrow uses a bit simplified test (4.1)<br />
for external URLs.<br />
<hask>addImg</hask> manipulates all children (5) of the A-elements by<br />
selecting the current children (6) and adding an image element (7).<br />
The image element is constructed with <hask>mkelem</hask> (8). This takes<br />
an element name, a list of arrows for computing the attributes and a<br />
list of arrows for computing the contents. The content of the image element is<br />
empty (10). The attributes are constructed with <hask>sattr</hask> (9).<br />
<hask>sattr</hask> ignores the arrow input and builds an attribute form<br />
the name value pair of arguments.<br />
<br />
=== Transform external references into absolute references ===<br />
<br />
In the following example we will develop a program for<br />
editing a HTML page such that all references to external documents<br />
(images, hypertext refs, style refs, ...) become absolute references.<br />
We will see some new, but very useful combinators in the solution.<br />
<br />
The task seems to be rather trivial. In a tree travaersal<br />
all references are edited with respect to the document base.<br />
But in HTML there is a BASE element, allowed in the content of HEAD<br />
with a HREF attribute, which defines the document base. Again this<br />
href can be a relative URL.<br />
<br />
We start the development with the editing arrow. This gets<br />
the real document base as argument.<br />
<br />
<haskell><br />
mkAbsHRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsHRefs base<br />
= processTopDown editHRef -- (1)<br />
where<br />
editHRef<br />
= processAttrl -- (3)<br />
( changeAttrValue (absHRef base) -- (5)<br />
`when`<br />
hasName "href" -- (4)<br />
)<br />
`when`<br />
( isElem >>> hasName "a" ) -- (2)<br />
where<br />
<br />
absHRef :: String -> String -> String -- (5)<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
The tree is traversed (1) and for every A element the attribute<br />
list is processed (2). All HREF attribute values (4) are manipulated<br />
by <hask>changeAttrValue</hask> called with a string function (5).<br />
<hask>expandURIString</hask> is a pure function defined in HXT for computing<br />
an absolut URI.<br />
In this first step we only edit A-HREF attribute values. We will refine this<br />
later.<br />
<br />
The second step is the complete computation of the base URL.<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( isElem >>> hasName "html" -- (0)<br />
>>><br />
getChildren -- (1)<br />
>>><br />
isElem >>> hasName "head" -- (2)<br />
>>><br />
getChildren -- (3)<br />
>>><br />
isElem >>> hasName "base" -- (4)<br />
>>><br />
getAttrValue "href" -- (5)<br />
)<br />
&&&<br />
getBaseURI -- (6)<br />
)<br />
>>> expandURI -- (7)<br />
)<br />
`orElse` getBaseURI -- (8)<br />
</haskell><br />
<br />
Input to this arrow is the HTML element, (0) to (5) is the arrow for selecting<br />
the BASE elements HREF value, parallel to this the system base URL is read<br />
with <hask>getBaseURI</hask> (6) like in examples above. The resulting <br />
pair of strings is piped into <hask>expandURI</hask> (7), the arrow version of<br />
<hask>expandURIString</hask>. This arrow ((1) to (7)) fails in the absense<br />
of a BASE element. in this case we take the plain document base (8).<br />
The selection of the BASE elements is not yet very handy. We will define<br />
a more general and elegant function later, allowing an element path as selection argument.<br />
<br />
In the third step, we will combine the to arrows. For this we will use<br />
a new combinator <hask>($<)</hask>. The need for this new combinator<br />
is the following: We need the arrow input (the document) two times,<br />
once for computing the document base, and second for editing the<br />
whole document, and we want to compute the extra string parameter<br />
for editing of course with the above defined arrow.<br />
<br />
The combined arrow, our main arrow, looks like this<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= mkAbsHRefs $< computeBaseRef -- (1)<br />
</haskell><br />
<br />
In (1) first the arrow input is piped into <hask>computeBaseRef</hask>,<br />
this result is used in <hask>mkAbsHRefs</hask> as extra string parameter<br />
when processing the document. Internally the <hask>($<)</hask> combinator<br />
is defined by the basic combinators <hask>(&&&), (>>>)</hask> and <hask>app</hask>, but in a bit more complex computations,<br />
this pattern occurs rather frequently, so ($<) becomes very useful.<br />
<br />
Programming with arrows is one style of point free programming. Point free<br />
programming often becomes unhandy when values are used more than once.<br />
One solution is the special arrow syntax supported by ghc and others, similar to the do notation for monads. But for many simple cases the <hask>($<)</hask> combinator and it's variants <hask>($<<), ($<<<), ($<<<<), ($<$)</hask><br />
is sufficient.<br />
<br />
To complete the development of the example, a last step is neccessary:<br />
The removal of the redundant BASE element.<br />
<br />
<haskell><br />
toAbsRefs :: IOStateArrow s XmlTree XmlTree<br />
toAbsRefs<br />
= ( mkAbsHRefs $< computeBaseRef )<br />
>>><br />
removeBaseElement<br />
<br />
removeBaseElement :: ArrowXml a => a XmlTree XmlTree<br />
removeBaseElement<br />
= processChildren<br />
( processChildren<br />
( none -- (1)<br />
`when`<br />
( isElem >>> hasName "base" )<br />
)<br />
`when`<br />
( isElem >>> hasName "head" )<br />
)<br />
</haskell><br />
<br />
In this function the children of the HEAD element are searched for<br />
a BASE element. This is removed by aplying the null arrow <hask>none</hask><br />
to the input, returning always the empty list.<br />
<hask>none `when` ...</hask> is the pattern for deleting nodes from a tree.<br />
<br />
The <hask>computeBaseRef</hask> function defined above contains an arrow pattern<br />
for selecting the right subtree that is rather common in HXT applications<br />
<br />
<haskell><br />
isElem >>> hasName n1<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName n2<br />
...<br />
>>><br />
getChildren<br />
>>><br />
isElem >>> hasName nm<br />
</haskell><br />
<br />
For this pattern we will define a convenient function creating the<br />
arrow for selection<br />
<br />
<haskell><br />
getDescendents :: ArrowXml a => [String] -> a XmlTree XmlTree<br />
getDescendents<br />
= foldl1 (\ x y -> x >>> getChildren >>> y) -- (1)<br />
.<br />
map (\ n -> isElem >>> hasName n) -- (2)<br />
</haskell><br />
<br />
The name list is mapped to the element checking arrow (2),<br />
the resulting list of arrows is folded with <hask>getChildren</hask><br />
into a single arrow. <hask>computeBaseRef</hask> can then be simplified<br />
and becomes more readable:<br />
<br />
<haskell><br />
computeBaseRef :: IOStateArrow s XmlTree String<br />
computeBaseRef<br />
= ( ( ( getDescendents ["html","head","base"] -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
...<br />
</haskell><br />
<br />
An even more general and flexible technic are the XPath expressions<br />
available for selection of document parts defined in the module<br />
<hask>Text.XML.HXT.Arrow.XmlNodeSet</hask>.<br />
<br />
With XPath <hask>computeBaseRef</hask> can be simplified to<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( ( getXPathTrees "/html/head/base" -- (1)<br />
>>><br />
getAttrValue "href" -- (2)<br />
)<br />
...<br />
</haskell><br />
<br />
Even the attribute selection can be expressed by XPath,<br />
so (1) and (2) can be combined into<br />
<br />
<haskell><br />
computeBaseRef<br />
= ( ( xshow (getXPathTrees "/html/head/base@href")<br />
...<br />
</haskell><br />
<br />
The extra <hask>xshow</hask> is here required to convert the<br />
XPath result, an XmlTree, into a string.<br />
<br />
XPath defines a<br />
full language for selecting parts of an XML document.<br />
Sometimes it's rather comfortable to make selections of this<br />
type, but the XPath evaluation in general is more expensive<br />
in time and space than a simple combination of arrows, like we've<br />
seen it in <hask>getDescendends</hask>.<br />
<br />
=== Transform external references into absolute references: Refinement ===<br />
<br />
In the above example only A-HREF URLs are edited. Now we extend this<br />
to other element-attribute combinations.<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown ( editRef "a" "href" -- (2)<br />
>>><br />
editRef "img" "src" -- (3)<br />
>>><br />
editRef "link" "href" -- (4)<br />
>>><br />
editRef "script" "src" -- (5)<br />
)<br />
where<br />
editRef en an -- (1)<br />
= processAttrl ( changeAttrValue (absHRef base)<br />
`when`<br />
hasName an<br />
)<br />
`when`<br />
( isElem >>> hasName en )<br />
where<br />
absHRef :: String -> String -> String<br />
absHRef base url<br />
= fromMaybe url . expandURIString url $ base<br />
</haskell><br />
<br />
<hask>editRef</hask> is parameterized by the element and attribute names.<br />
The arrow applied to every element is extended to a sequence of<br />
<hask>editRef</hask>'s ((2)-(5)). Notice that the document is still traversed only once.<br />
To process all possible HTML elements,<br />
this sequence should be extended by further element-attribute pairs.<br />
<br />
This can further be simplified into<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= foldl (>>>) this<br />
.<br />
map (\ (en, an) -> editRef en an)<br />
$<br />
[ ("a", "href")<br />
, ("img", "src")<br />
, ("link", "href")<br />
, ("script", "src") -- and more<br />
]<br />
editRef<br />
= ...<br />
</haskell><br />
<br />
The <hask>foldl (>>>) this</hask> is defined in HXT as <hask>seqA</hask>,<br />
so the above code can be simplified to<br />
<br />
<haskell><br />
mkAbsRefs :: ArrowXml a => String -> a XmlTree XmlTree<br />
mkAbsRefs base<br />
= processTopDown editRefs<br />
where<br />
editRefs<br />
= seqA . map (uncurry editRef)<br />
$<br />
...<br />
</haskell><br />
<br />
== More complex examples ==<br />
<br />
''' to be done '''<br />
<br />
=== Automatic read/writing between xml and Haskell data types ===<br />
<br />
'''Question''': is there any way to write/read Haskell types to/from XML in HXT? HaXml has readXml and showXml, but I can't find any similar mechanism in HXT. Help! -- AlsonKemp<br />
<br />
==== Serializing to Xml ====<br />
<br />
We can create an HXT tree from a single-layer data class as follows:<br />
<br />
<haskell><br />
import IO<br />
import Char<br />
import Text.XML.HXT.Arrow<br />
import Data.Generics<br />
<br />
-- our data class we'll convert into xml<br />
data Config = <br />
Config { username :: String,<br />
logNumDays :: Int,<br />
oleDbString :: String }<br />
deriving (Show, Typeable,Data)<br />
<br />
-- helper function adapted from http://www.defmacro.org/ramblings/haskell-web.html<br />
-- (gshow replaced by gshow')<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- function to create HXT tree arrow from single-layer Haskell data type:<br />
createHxtArrow object = runLA( constA ( xmlSerialize object ) >>> xread)<br />
<br />
-- create a config object to serialize:<br />
<br />
createConfig = Config { username = "test", logNumDays = 3, oleDbString = "qsdf" }<br />
<br />
-- test function, using our Config data type<br />
testConversion = createHxtArrow( createConfig ) ()<br />
</haskell><br />
<br />
-- hughperkins<br />
<br />
==== Deserializing from Xml ====<br />
<br />
Here's a solution to deserialize a simple haskell data type containing Strings and Ints.<br />
<br />
It's not really pretty, but it works.<br />
<br />
Basically, we just convert the incoming xml into gread-compatible format, then use gread :-D<br />
<br />
Currently it works for a simple single-layer Haskell data type containing Ints and Strings. You can add new child data types by adding to the case statement in xmlToGShowFormat.<br />
<br />
If someone has a more elegant solution, please let me know ( hughperkins@gmail.com )<br />
<br />
<haskell><br />
module ParseXml<br />
where<br />
<br />
import IO<br />
import Char<br />
import List<br />
import Maybe<br />
import Data.Generics hiding (Unit)<br />
import Text.XML.HXT.Arrow hiding (when)<br />
<br />
data Config = Config{ name :: String, age :: Int } <br />
--data Config = Config{ age :: Int } <br />
deriving( Data, Show, Typeable, Ord, Eq, Read )<br />
<br />
createConfig = Config "qsdfqsdf" 3<br />
--createConfig = Config 3<br />
gshow' :: Data a => a -> String<br />
gshow' t = fromMaybe (showConstr(toConstr t)) (cast t)<br />
<br />
-- helper function from http://www.defmacro.org/ramblings/haskell-web.html<br />
introspectData :: Data a => a -> [(String, String)]<br />
introspectData a = zip fields (gmapQ gshow' a)<br />
where fields = constrFields $ toConstr a<br />
<br />
-- function to create xml string from single-layer Haskell data type<br />
xmlSerialize object = "<" ++ show(toConstr object) ++ ">" ++ <br />
foldr (\(a,b) x -> x ++ "<" ++ a ++ ">" ++ b ++ "</" ++ a ++ ">") "" ( introspectData object )<br />
++ "</" ++ show(toConstr object) ++ ">"<br />
<br />
-- parse xml to HXT tree, and obtain the value of node "fieldname"<br />
-- returns a string<br />
getValue xml fieldname | length(resultlist) > 0 = Just (head resultlist)<br />
| otherwise = Nothing<br />
where resultlist = (runLA ( constA xml >>> xread >>> deep ( hasName fieldname ) >>> getChildren >>> getText ))[]<br />
<br />
-- parse templateobject to get list of field names<br />
-- apply these to xml to get list of values<br />
-- return (fieldnames list, value list)<br />
xmlToGShowFormat :: Data a => String -> a -> String<br />
xmlToGShowFormat xml templateobject = <br />
go<br />
where mainconstructorname = (showConstr $ toConstr templateobject)<br />
fields = constrFields $ toConstr templateobject<br />
values = map ( \fieldname -> getValue xml fieldname ) fields<br />
datatypes = gmapQ (dataTypeOf) templateobject<br />
constrs = gmapQ (toConstr) templateobject<br />
datatypereps = gmapQ (dataTypeRep . dataTypeOf) templateobject<br />
fieldtogshowformat (value,datatyperep) = case datatyperep of<br />
IntRep -> "(" ++ fromJust value ++ ")"<br />
_ -> show(fromJust value)<br />
formattedfieldlist = map (fieldtogshowformat) (zip values datatypereps)<br />
go = "(" ++ mainconstructorname ++ " " ++ (concat $ intersperse " " formattedfieldlist ) ++ ")"<br />
<br />
xmlDeserialize xml templateobject = fst $ head $ gread( xmlToGShowFormat xml templateobject)<br />
<br />
dotest = xmlDeserialize (xmlSerialize createConfig) createConfig :: Config<br />
dotest' = xmlDeserialize ("<Config><age>12</age><name>test name!</name></Config>") createConfig :: Config<br />
</haskell><br />
<br />
-- hughperkins</div>Joehttps://wiki.haskell.org/index.php?title=Performance/Laziness&diff=12966Performance/Laziness2007-05-07T00:47:24Z<p>Joe: fixed typos</p>
<hr />
<div>{{Performance infobox}}<br />
[[Category:Performance|Laziness]]<br />
== Laziness: Procrastinating for Profit ==<br />
<br />
To look at how laziness works in Haskell, and how to make it do efficient work, we'll implement a merge sort function. It will have the type:<br />
<br />
merge_sort :: (Ord a) => [a] -> [a]<br />
<br />
We'll also need a function to split the list in two, I'll call this cleaving, and it will look like this:<br />
<br />
cleave :: [a] -> ([a],[a])<br />
<br />
Let's start by implementing the cleaving function. The conventional way to split a list in merge sort is to take the first N/2 elements off the front, and the remaining elements after this number. The problem is that finding the length of a list in haskell is expensive. So instead, we'll take pairs of elements off the front. Define two functions:<br />
<br />
evens [] = []<br />
evens [x] = [x]<br />
evens (x:_:xs) = x : evens xs<br />
<br />
odds [] = []<br />
odds [x] = []<br />
odds (_:x:xs) = x : odds xs<br />
<br />
and use them to implement cleave:<br />
<br />
cleave xs = (evens xs, odds xs)<br />
<br />
Experience in a strictly evaluation language like SML or Objective CAML may lead you to write alternate versions using an [[Performance/Accumulating_Parameters | accumulating parameter]]. Assuming that reversing the order of the elements doesn't matter, you could use this function to split the list into even and odd elements and implement the cleave function as follows:<br />
<br />
cleave = cleave' ([],[]) where<br />
cleave' (eacc,oacc) [] = (eacc,oacc)<br />
cleave' (eacc,oacc) [x] = (x:eacc,oacc)<br />
cleave' (eacc,oacc) (x:x':xs) = cleave' (x:eacc,x':oacc) xs<br />
<br />
This appears to be a better implementation. It's tail recursive, and by either strictness analysis or explicitly making the accumulating parameters strict, it won't blow the stack up. Believe it or not, our first implementation was better.<br />
<br />
In order to produce the first element of either list, you need to process the entire list. In a non-strict language, we could encounter an infinite list, and we'd like our function to work nicely on them. Consider the effect of:<br />
<br />
head $ fst $ cleave [0..10000000]<br />
<br />
With our first definition, we'll get 0 in constant time. With our second, we'll get it in O(N) time, and our calculation will diverge on an infinite list like [0..].<br />
<br />
Why is our first version better? Let's look at how evens works and how lists are represented in Haskell. Lists are represented as either an empty list, or a "cons" cell that consists of an element and the remaining list. In pseudo-Haskell, we might write:<br />
<br />
data [a] = [] | a : [a]<br />
<br />
In a lazy language, an expression is only evaluated when needed. The machinery used to implement this is called a thunk. It's essentially a value with two possible states: either a computed value, or the process to compute that value. When we assign a value in Haskell, we create a thunk with the instructions to compute the value of the expression we've assigned. When this thunk is forced, these instructions are used to compute a value which is stored in the thunk. The next time the value is required, this computed value is retrieved. Lazyness can be implemented in languages like SML using this method together with mutable references.<br />
<br />
So in the recursive case of evens, we produce a thunk that contains a list cons cell. This cons cell contains two thunks, one of the element value, and one of the rest of the list. The thunk for the element is taken from the list the function is operating on, and the thunk for the rest of the list consists of instructions to compute the rest of the list using evens. We'd say that evens and odds are lazy in their input: they consume only enough value to produce the value. As an example of how lazy functions work, consider:<br />
<br />
head ( 5 : undefined )<br />
tail [undefined, 5]<br />
evens (5 : undefined : 3 : undefined : 1 : undefined : [])<br />
take 3 $ evens (5 : undefined : 3 : undefined : 1 : undefined : undefined)<br />
<br />
Despite how all the inputs contain partially undefined values, all the values of the function applications are valid values. A lazy function will only diverge when a required value for its computation diverges. If you're wondering why we have two undefineds at the end of the list, recall how evens was implemented. We need two undefined cells to make sure the third case is selected: the one with two elements followed by a remainder list. Having only one undefined means that after the 1 element, the remainder of the list is undefined. Then we can't decide between the 2nd and 3rd cases. Now let's look at what happens with lazy evaluation and diverging values. Consider:<br />
<br />
tail (5 : undefined)<br />
head [undefined, 5]<br />
odds (5 : undefined : 3 : undefined : 1 : undefined : [])<br />
take 4 $ evens ( 5 : undefined : 3 : undefined : 1 : undefined : undefined)<br />
<br />
So the application of evens to a non-trivial list results in a thunk being returned immediately. And when we ask for the first element of the list evens produces, we only evaluate the value thunk. This is why we can apply evens or odds, (and cleave for that matter,) to an infinite list. We'll implement merge_sort using cleave:<br />
<br />
merge_sort [] = []<br />
merge_sort [x] = [x]<br />
merge_sort lst = let (e,o) = cleave lst in merge (merge_sort e) (merge_sort o) where<br />
merge :: (Ord a) => [a] -> [a] -> [a]<br />
merge xs [] = xs<br />
merge [] ys = ys<br />
merge xxs@(x:xs) yys@(y:ys) =<br />
| x <= y = x : merge xs yys<br />
| otherwise = y : merge xxs ys<br />
<br />
You can see that this function isn't lazy. It begins by cleaving the list recursively until it is left with trivial lists, ones with zero or one elements. These are obviously already sorted. It then uses the nested function merge, which combines two ordered lists and preserves their order. The act of partitioning the list into trivial lists before reassembly can begin means the entire list needs to be processed before we can begin merging them and assembling the list. In this case, we can't make a lazier solution, one that would work on an infinite list. If this is surprising, think of this: in sorting a list, the first element out should be the least (or greatest) how are we to find this element without examining the entire list? We'd say that merge_sort is strict in the array to be sorted: if an infinite list is supplied, the computation will diverge in the sense that output will never be provided. There are some operations that cannot be done lazily, for instance, sorting a list.<br />
<br />
We've seen the difference between a lazy function and a strict function. Lazy computing has two major appeals. The first is that only enough work is done to compute a value. The second is that we can operate in the presence of infinite and undefined data structures, as long as we don't examine the undefined parts or try to process the infinity of values.</div>Joehttps://wiki.haskell.org/index.php?title=Yi&diff=6918Yi2006-10-12T15:01:13Z<p>Joe: fixed typo</p>
<hr />
<div>== Yi ideas ==<br />
<br />
This page is meant to gather ideas people have for<br />
[http://www.cse.unsw.edu.au/~dons/yi.html Yi], an extensible editor<br />
written in Haskell.<br />
<br />
Coming from an Emacs background, the current version of Yi lacks a few<br />
things I think are essential, mainly the introspection capabilities<br />
of Emacs. One of the main problems is that Yi is based on purely<br />
compiled code --- there is little or no interaction with the run-time<br />
system.<br />
<br />
Ideally, the next version of Yi would be based on a (modified?)<br />
version of GHCi, maybe taking advantage of package GHC. <br />
<br />
=== Emacs goodness ===<br />
<br />
The following are things I like about Emacs, as an extensible<br />
environment:<br />
; Really good online documentation<br />
: Emacs can tell you a lot about a function or variable with a<br />
: keypress--- the current value, where it is declared, and a hypertext<br />
: information string<br />
; Extensibility<br />
: All (good) apps allow users to extend, through, e.g., hooks --- a<br />
: list of functions that are run before/after some event (like saving<br />
: a file)<br />
; Integration<br />
: It is really easy in Emacs to have one package interact with<br />
: another. Thus, I can, e.g., insert a new appointment from my mail app into<br />
: the diary. <br />
; Everything is Lisp<br />
: Ignoring the actual language, everything is handled in a uniform<br />
: language --- from binding keys to writing apps.<br />
; Easy to start hacking<br />
: I can start playing with the system from the second I start up, and<br />
: things pretty much work as expected. I.e., I can type a bit of code<br />
: in, execute it, and the result is displayed in the minibuffer. The<br />
: good docs help immeasurably.<br />
; Written for the frequent user<br />
: Lots of key shortcuts (and famous for it). There are still menus,<br />
: for those who like em, but you aren't forced to pretend you just<br />
: started using it.<br />
; A tonne of code<br />
: Well, Haskell has this to some degree. Haskell is (IMHO) much<br />
: easier to write than ELisp, so maybe people will be encouraged to contribute.<br />
<br />
=== Emacs badness ===<br />
<br />
So, why replace it?:<br />
; ELisp<br />
: Dynamically scoped, Dynamically typed, ugly, old. 'Nuff said<br />
; What's a Parser?<br />
: A lot of apps in emacs do stuff with text, usually text that is in<br />
: some language. There is no standard parser (like, e.g. parsec), so<br />
: a lot of it is ugly handwritten spaghetti. This also means that<br />
: adding analysis tools isn't really done (or done nicely).<br />
; ELisp again<br />
: Haskell is a lot cleaner to write, especially because of the large<br />
: number of libraries.<br />
<br />
=== Emacs maybeness (?) ===<br />
<br />
Some things that are sometimes bad, sometimes good:<br />
; Everything is a buffer<br />
: Makes some sense, but sometimes doesn't. It is nice to have uniform<br />
: key bindings do the right thing (e.g., C-Space sets the mark, and<br />
: the region can then be used, e.g. to delete a sequence of emails in Wl)<br />
: Sometimes, however, you just want some sort of GUI widget.<br />
:<br />
: OTOH, having the minibuffer be a special kind of buffer is a good idea.<br />
; Properties<br />
: It is possible to associate arbitrary properties with symbols. This<br />
: means you can annotate a symbol and then use that information at a<br />
: later date<br />
<br />
=== Ideas ===<br />
<br />
An extension to GHCi to support documentation of symbols. <br />
<br />
- This seems to be (reasonably) straightforward, as GHCi already has :info. It would mean hacking the type environment (what about values?) to add documentation information. The main problem would seem to be populating this --- maybe hack haddocl to produce something from the library docs? I assume that using package GHC uses the parent RTS (package GHC seems to be the way to go, but more investigation is required --- don?)<br />
<br />
Intermixed compiled/interpreted code (for speed/hacking)<br />
<br />
GUI abstraction --- want it to work on terminals as well as X<br />
<br />
Views on data? Rather than just editing a file, you would open a view<br />
onto the file, i.e. there is no longer a 1-1 correspondence between<br />
buffers and files. Why? Well, for aggregate buffers (i.e.,<br />
editing multiple files in the one view), or for multiple views of a<br />
file (e.g. AST and source-level). There would be some primitive ops<br />
for editing a buffer (insertChar, delete, etc.), which would then<br />
call update functions on anything observing that file.<br />
<br />
Remote attach so I can work from home, but still use a remote machine<br />
<br />
Haddock documentation (no brainer), maybe associate with .hi files for<br />
binaries. <br />
<br />
A class MiniBufferRead (or PromptingRead) which allows the user to<br />
invoke a function similar to M-x in Emacs, but without requiring<br />
(interactive)<br />
<br />
Maybe a class YiShow, which all config items must be a member of? This is to emulate describe-variable<br />
<br />
== Implementation ==<br />
<br />
Considerations:<br />
; Configuration <br />
: Per mode/file/buffer/whatever Monads, or reload/recompile? Or some hybrid? How does this interact with the documentation aspects? Do we want to have separate sorts of symbols ala emacs (describe-function, describe-variable), or is everything a function? I would think that configuration info doesn't change that frequently --- is this globally true though?<br />
; Interface to the runtime<br />
: The scheduler, docs, etc.<br />
; Introspection of e.g. what processes are running.<br />
: There are already libraries in Haskell for processes, but they don't give Yi any extra information --- we really want a layer on top. <br />
<br />
...<br />
<br />
[[User:Sjw|Sjw]] 09:15, 2 June 2006 (UTC)<br />
<br />
[[Category:Applications]]</div>Joe