Difference between revisions of "HXT/Practical/Simple1"

From HaskellWiki
< HXT‎ | Practical
Jump to: navigation, search
(Use block markup for multiline code)
 
Line 45: Line 45:
 
certain language extensions. The <code>Arrows</code> option provides the special Arrow syntax, and the <code>NoMonomorphismRestriction</code> option eliminates the need for explicit type signatures on our filters.
 
certain language extensions. The <code>Arrows</code> option provides the special Arrow syntax, and the <code>NoMonomorphismRestriction</code> option eliminates the need for explicit type signatures on our filters.
   
<hask>
 
  +
<haskell>
 
  +
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
> {-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
 
  +
import Text.XML.HXT.Core
> import Text.XML.HXT.Core
 
  +
</haskell>
 
</hask>
 
   
 
The XML will be parsed directly into this data-structure:
 
The XML will be parsed directly into this data-structure:
   
<hask>
 
  +
<haskell>
 
  +
data Guest = Guest { firstName, lastName :: String }
> data Guest = Guest { firstName, lastName :: String }
 
  +
deriving (Show, Eq)
> deriving (Show, Eq)
 
  +
</haskell>
 
</hask>
 
   
 
I find it helpful to get a feel for the combinators at the GHCi
 
I find it helpful to get a feel for the combinators at the GHCi
Line 65: Line 61:
 
the code so far. Then you can use commands like:
 
the code so far. Then you can use commands like:
   
<hask>
 
  +
<haskell>
 
Main> runX (readDocument [ withValidate no] "simple1.xml"
 
Main> runX (readDocument [ withValidate no] "simple1.xml"
 
>>> deep (isElem >>> hasName "guest"))
 
>>> deep (isElem >>> hasName "guest"))
</hask>
+
</haskell>
   
 
to see the XML structures inside the guest tags. The <hask>deep</hask>
 
to see the XML structures inside the guest tags. The <hask>deep</hask>
Line 80: Line 76:
 
be done conveniently.
 
be done conveniently.
   
<hask>
 
  +
<haskell>
 
  +
getGuest = deep (isElem >>> hasName "guest") >>>
> getGuest = deep (isElem >>> hasName "guest") >>>
 
  +
proc x -> do
> proc x -> do
 
  +
fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
> fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
+
lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
> lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
+
returnA -< Guest { firstName = fname, lastName = lname }
> returnA -< Guest { firstName = fname, lastName = lname }
+
</haskell>
 
</hask>
 
   
 
If you are familiar with monadic do-syntax, you've probably noticed some similarities already.
 
If you are familiar with monadic do-syntax, you've probably noticed some similarities already.
Line 94: Line 90:
 
If you squint a bit you'll notice that there seems to be a bit of an "arrow" feel to the syntax:
 
If you squint a bit you'll notice that there seems to be a bit of an "arrow" feel to the syntax:
   
<hask>
 
  +
<haskell>
 
... <- ... <<< ... <<< ... -< ...
 
... <- ... <<< ... <<< ... -< ...
</hask>
+
</haskell>
   
 
That's done purposefully, you can think of the XML structures flowing through the combinators
 
That's done purposefully, you can think of the XML structures flowing through the combinators
Line 104: Line 100:
 
Test it out in GHCi:
 
Test it out in GHCi:
   
<hask>
 
  +
<haskell>
 
Main> runX (readDocument [withValidate no] "simple1.xml" >>> getGuest)
 
Main> runX (readDocument [withValidate no] "simple1.xml" >>> getGuest)
</hask>
+
</haskell>
   
 
There is some repetition in the above code. Let's factor it out into useful combinators.
 
There is some repetition in the above code. Let's factor it out into useful combinators.
   
<hask>
 
  +
<haskell>
 
  +
atTag tag = deep (isElem >>> hasName tag)
> atTag tag = deep (isElem >>> hasName tag)
 
  +
text = getChildren >>> getText
> text = getChildren >>> getText
 
  +
</haskell>
 
</hask>
 
   
 
And rewrite the example, much cleaner.
 
And rewrite the example, much cleaner.
   
<hask>
 
  +
<haskell>
 
  +
getGuest2 = atTag "guest" >>>
> getGuest2 = atTag "guest" >>>
 
  +
proc x -> do
> proc x -> do
 
  +
fname <- text <<< atTag "fname" -< x
> fname <- text <<< atTag "fname" -< x
+
lname <- text <<< atTag "lname" -< x
> lname <- text <<< atTag "lname" -< x
+
returnA -< Guest { firstName = fname, lastName = lname }
> returnA -< Guest { firstName = fname, lastName = lname }
+
</haskell>
 
</hask>
 
   
 
Hopefully, at this point it should be easy to follow the code, with the more appropriately named
 
Hopefully, at this point it should be easy to follow the code, with the more appropriately named
 
functions.
 
functions.
   
<hask>
 
  +
<haskell>
 
  +
main = do
> main = do
 
  +
guests <- runX (readDocument [withValidate no] "simple1.xml"
> guests <- runX (readDocument [withValidate no] "simple1.xml"
 
  +
>>> getGuest2)
> >>> getGuest2)
 
  +
print guests
> print guests
 
  +
</haskell>
 
</hask>
 

Latest revision as of 16:11, 11 October 2011

The Data

Save this data to "simple1.xml"

<guestbook>
  <guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
  </guest>
  <guest>
    <fname>Henry</fname>
    <lname>Ford</lname>
  </guest>
  <guest>
    <fname>Andrew</fname>
    <lname>Carnegie</lname>
  </guest>
  <guest>
    <fname>Anton</fname>
    <lname>Chekhov</lname>
  </guest>
  <guest>
    <fname>George</fname>
    <lname>Washington</lname>
  </guest>
  <guest>
    <fname>William</fname>
    <lname>Shakespeare</lname>
  </guest>
  <guest>
    <fname>Nathaniel</fname>
    <lname>Hawthorne</lname>
  </guest>
</guestbook>

An unlikely list, but it will suffice for our purposes.

The Code

First, a quick overview of the necessary imports and compiler declarations:

The LANGUAGE pragma allows us to specifically enable certain language extensions. The Arrows option provides the special Arrow syntax, and the NoMonomorphismRestriction option eliminates the need for explicit type signatures on our filters.

{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
import Text.XML.HXT.Core

The XML will be parsed directly into this data-structure:

data Guest = Guest { firstName, lastName :: String }
  deriving (Show, Eq)

I find it helpful to get a feel for the combinators at the GHCi prompt. At this point, you may want to start GHCi and load the code so far. Then you can use commands like:

Main> runX (readDocument [ withValidate no] "simple1.xml" 
               >>> deep (isElem >>> hasName "guest"))

to see the XML structures inside the guest tags. The deep filter traverses the XML structures recursively and applies the filter in its parameter to all the underlying structures. withValidate no turns off validation -- since the XML technically isn't well formed.

With the "guest" structures in hand, the filter can be refined to pick out the first and last name text and construct a Guest value. The proc syntax is introduced here to show how it can be done conveniently.

getGuest = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
    lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }

If you are familiar with monadic do-syntax, you've probably noticed some similarities already. There is a do, and there is a <-. But there are some new operators too: -<, returnA, and this introduced variable x.

If you squint a bit you'll notice that there seems to be a bit of an "arrow" feel to the syntax:

... <- ... <<< ... <<< ... -< ...

That's done purposefully, you can think of the XML structures flowing through the combinators in the direction the arrow is pointing. Binding is still done with <-. returnA returns the value to the Arrow much like return does to a monad.

Test it out in GHCi:

Main> runX (readDocument [withValidate no] "simple1.xml" >>> getGuest)

There is some repetition in the above code. Let's factor it out into useful combinators.

atTag tag = deep (isElem >>> hasName tag)
text = getChildren >>> getText

And rewrite the example, much cleaner.

getGuest2 = atTag "guest" >>>
  proc x -> do
    fname <- text <<< atTag "fname" -< x
    lname <- text <<< atTag "lname" -< x
    returnA -< Guest { firstName = fname, lastName = lname }

Hopefully, at this point it should be easy to follow the code, with the more appropriately named functions.

main = do
  guests <- runX (readDocument [withValidate no] "simple1.xml" 
                    >>> getGuest2)
  print guests