HXT/Practical/Ebay1

From HaskellWiki
< HXT‎ | Practical
Revision as of 07:31, 20 August 2007 by Paolino (talk | contribs) (Munging from an ebay feedback page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

{-# LANGUAGE Arrows, NoMonomorphismRestriction, ParallelListComp #-} module Main where import Text.XML.HXT.Arrow hiding (deep) import Data.List (nub,sort,isPrefixOf,transpose,groupBy) deep f = f `orElse` (getChildren >>> deep f) -- deep redefinition to allow a broader signature split = map (dropWhile p) . groupBy (const (not . p)) where p = (=='/') through = (getChildren >>>) . foldr1 (/>). map hasName . split -- contains = (getChildren >>>). foldr1 (</). (map hasName) mkReport = mkelem "p" [] . map constA {- The datas we are munging is unstructured - Every feedback is spanned on two contigous rows of a big table - We cannot catch the all data in a match, so we use listA to have the two single-row lists - and then zip them to rebuild the data. -} getFeedbackAndValue = hasName "table" >>> hasAttrValue "class" (=="fbOuter") /> hasName "tbody" >>> proc table -> do feedbacks <- listA (through "tr/td/img") -< table values <- listA (through "tr/td" /> hasText (isPrefixOf "EUR")) -< table catA (map mkReport $ transpose [values,feedbacks]) -<< () src = "feedback.example.html" dst = "feedback.report.html" unicoding= (a_encoding, unicodeString) nowarnings = (a_issue_warnings,v_0) main = runX ( readDocument [(a_parse_html, v_1),unicoding,nowarnings] src >>> root [] [deep getFeedbackAndValue] >>> writeDocument [(a_indent,v_1),unicoding,nowarnings] dst )