https://wiki.haskell.org/api.php?action=feedcontributions&user=MichalTerepeta&feedformat=atomHaskellWiki - User contributions [en]2020-04-04T22:18:23ZUser contributionsMediaWiki 1.27.4https://wiki.haskell.org/index.php?title=CamHac/PostHackathonReport&diff=41653CamHac/PostHackathonReport2011-08-16T18:41:58Z<p>MichalTerepeta: </p>
<hr />
<div>= Post-Hackathon Report =<br />
<br />
This page is for listing what was done during the Hackathon. Please add a short description of what you worked on, with links to relevant blog posts, hackage packages, commits, etc.<br />
<br />
== fclabels 1.0 release ==<br />
<br />
New release of the '''fclabels''' package. The new package has a lot of code and documentation cleanups, support for partial labels in the case of multi-constructor datatypes and is about 20x as fast for setting and modifying as the previous version. Thanks everyone for helping me out!<br />
<br />
Hackage: http://hackage.haskell.org/package/fclabels-1.0.1<br />
<br />
Github: http://github.com/sebastiaanvisser/fclabels<br />
<br />
== GHC and base library improvements ==<br />
<br />
* [http://hackage.haskell.org/trac/ghc/ticket/5413 Add primops for bit population count]. These primops compile down to `POPCNT` instructions where available and fast fallbacks (implemented in C) otherwise.<br />
<br />
* [http://hackage.haskell.org/trac/ghc/ticket/5414 Add unchecked left and right bit shifts]: The Data.Bits.shift method uses a branch to check if the shift amount is larger than the word size and returns 0 in these cases. This extra safety makes performance worse in bit twiddling code.<br />
<br />
* Discussed unpacking of enums in GHC (not yet implemented).<br />
<br />
* Discussed refactoring compiler/util/Digraph.lhs into a reusable graph combinator library, along with porting over hoopl's control-flow-graph traversals (not yet implemented).<br />
<br />
* [http://hackage.haskell.org/trac/ghc/ticket/2132 Optimise nested comparisons]: started implementing (currently as a plugin) a new Core-to-Core pass.<br />
<br />
== Context synonym families ==<br />
<br />
Started working on context synonym families and indexed context synonym families. We make this work by giving evidence the new kind Fact, and then allowing any type of kind Fact to appear on the left of a => arrow.<br />
<br />
(Max Bolingbroke, Dominic Orchard and Nicolas Wu)<br />
<br />
== Darcs ==<br />
<br />
New contributors:<br />
<br />
* use red text to report when <font color="red">we have a conflict</font> ([http://bugs.darcs.net/issue1681 issue1681],[http://bugs.darcs.net/patch646 patch646], Jeff Foster)<br />
* support 'since' in English dates parser<br />
* filter SSH output ([http://bugs.darcs.net/issue845 issue845], Jeff Foster and Sebastian Korten)<br />
* support arbitrary darcs command in darcs-test (Alexander Njemz)<br />
* output ISO dates in darcs changes? ([http://bugs.darcs.net/issue140 issue140], Alexander N, may be not a good idea)<br />
* add a last regrets prompt to interactive patch selection ([http://bugs.darcs.net/issue1920 issue1920], [http://bugs.darcs.net/patch655 patch655], Johannes Weiß)<br />
* [in-progress] support removing changes in amend-record ([http://bugs.darcs.net/issue1470 issue1470], Johannes Weiß)<br />
<br />
== wxHaskell ==<br />
<br />
* a Windows build fix ([https://sourceforge.net/mailarchive/forum.php?thread_name=CAA5%3D7kb%2BCmJ178tvrxtOnuswWBeBxYpSotiQWRXuKE3m_sByXA%40mail.gmail.com&forum_name=wxhaskell-devel patch])<br />
* a fix for [https://sourceforge.net/tracker/?func=detail&aid=3019730&group_id=73133&atid=536845 an issue with colorDialog] ([https://sourceforge.net/mailarchive/forum.php?thread_name=20110813150933.GA758%40dewdrop.local&forum_name=wxhaskell-devel patch])<br />
<br />
== hCole-server ==<br />
<br />
* A Snap-based web application that interacts with the COLE (see http://portal.acm.org/citation.cfm?id=1356080 and http://portal.acm.org/citation.cfm?id=1772965) framework for exploring compiler optimisation levels. The purpose of the web app is that collaborators can submit optimisation sequences to the COLE backend and retrieve the results when they are available after measuring.<br />
* Git repository of the web application can be found at https://github.com/itkovian/hcole-server<br />
<br />
== GObject Introspection ==<br />
<br />
* Work-in-progress binding generator for GObject-based libraries such as Gtk+ 3.<br />
* Started switching to [http://hackage.haskell.org/package/haskell-src-exts-1.11.1 haskell-src-exts] for code generation.<br />
* Patches currently on the ''camhac'' branch on [https://gitorious.org/haskell-gi/haskell-gi gitorious].<br />
<br />
== Snap Framework ==<br />
<br />
* Snap 0.5.3 and 0.5.3.1 released!<br />
<br />
* Several bugs squashed or nearly-squashed: [https://github.com/snapframework/snap-core/issues/2 #2] (IPv6 support, from Vlad Hanciuta), [https://github.com/snapframework/snap-core/issues/77 #77], [https://github.com/snapframework/snap-core/issues/78 #78], and [https://github.com/snapframework/snap-core/issues/79 #79].<br />
<br />
* File upload code: replaced "openBinaryTempFile" with something less dumb (mkstemp) on unix, and fixed some iteratee cleanup problems using an MVar finalizer<br />
<br />
* Removed the template haskell code from Data.Concurrent.HashMap in snap-server<br />
<br />
* Some work has been done on the authentication Snaplet, including an (incomplete) HDBC backend for it. An early work-in-progress can be found here: https://github.com/norm2782/snap<br />
* An example application which uses Snap 0.6 has been improved to use the authentication Snaplet. Another work-in-progress: https://github.com/norm2782/snap-guestbook<br />
<br />
== Data.Text ==<br />
<br />
* Further benchmarking, bug fixing to support the UTF-8 port<br />
* Progress can be found in the ''utf8'' branch [http://github.com/jaspervdj/text here]<br />
* The [http://jaspervdj.be/files/text.html GSoC project] is basically done, next up is writing a summary report of benchmark results and what advantages and disadvantages come with the port<br />
<br />
== hs-poker ==<br />
<br />
* A "redneck naive" poker hand evaluator. Code is on github (https://github.com/fffej/HS-Poker). Hopefully intend to turn this into a poker bot playground for Haskell (Jeff / Sebastian)<br />
<br />
== haskell-mpi ==<br />
* New version 1.1.0 uploaded to hackage, including support for more MPI implementations, bugfixes and general awesomness<br />
* Upcoming Monad Reader would feature and article about parallel programming with MPI, written during the course of the hackathon (Dmitry Astapov)<br />
<br />
== HTTP ==<br />
<br />
Some work was done on setting up tests for HTTP. Additionally, some bugs were fixed, code was cleaned up, warnings removed and a start was made on improving the Network.Browser module.<br />
<br />
== EchoNest API ==<br />
<br />
A very nascent API for accessing the EchoNest Music API http://developer.echonest.com/docs/v4/index.html . Coming to a GitHub server in the near future, as soon as it stops looking so ugly (Ian Knopke / Jose Calderon).<br />
<br />
== TagSoup/Derive/HLint ==<br />
<br />
All the above packages got upgraded to the latest GHC, along with a few bug fixes (Derive now deals with singleton constructors with no fields, HLint now supports an ANSI CPP flag) (Neil Mitchell)<br />
<br />
== Hoogle ==<br />
<br />
The current Hoogle parser for user queries is old, doesn't parse everything correctly, and in particular doesn't deal well with partial queries (when the user is still typing their search). We discussed lots of edge cases, and started implementing a new version (Jacek Generowicz, with guidance from Neil Mitchell)<br />
<br />
== CmdArgs ==<br />
<br />
The CmdArgs package lets you concisely specific command line arguments. I ported the package to GHC 7.2.1, did some cleanups, and fixed some bugs (you can now use Int64 etc). I then started working on two new features: 1) Given a CmdArgs program (such as Hoogle) you can specify you want to enter the arguments via a GUI hosted in the web browser. Currently the GUI is a simple textbox with realtime validation, but in future it will become a structured command line argument editor based on the options to your tool. 2) Adding automatic bash autocompletion - some of the work has been done, but the details are not yet finished. (Neil Mitchell)<br />
<br />
== Hackage server ==<br />
<br />
Further refactoring work, simplification of HTTP basic/digest authentication code. Started work on serving package changelogs. Improvements to admin pages to make various features more discoverable (Duncan Coutts, Stefan Wehr, Ben Millwood).<br />
<br />
Hackathon branch of the code is at (not all patches have been submitted yet):<br />
<br />
darcs get http://code.haskell.org/~duncan/camhac/hackage-server/<br />
<br />
== Bittorrent DHT ==<br />
<br />
Initial work on implementing [http://www.bittorrent.org/beps/bep_0005.html BEP 0005] in Haskell. Some core data structures seem to be working (although untested) and I'm currently working on the protocol. I will probably merge this into Haskell Torrent when everything is working, but I intend to keep the library available separately as I see potential uses for the network other than Bittorrent. (Alex Horsman)<br />
<br />
https://github.com/aninhumer/haskell-dht<br />
<br />
== LambdaHack ==<br />
<br />
Discussed gameplay and hacks, annotated the code with the discussion results and finally accepted [https://github.com/kosmikus/LambdaHack/pull/11 the pull request] from [http://hackage.haskell.org/package/Allure Allure of the Stars] to [http://hackage.haskell.org/package/LambdaHack LambdaHack].<br />
<br />
== HaskelSSHClient ==<br />
<br />
Hacked around a bit on the Haskel SSH Client. Work was started on making it able to do re-keying after the initial key exchange, but it is unfinished for now... (Bart Coppens)<br />
<br />
https://github.com/bcoppens/HaskellSshClient<br />
<br />
== Haskell Test Framework (HTF) ==<br />
<br />
* New '--quiet' flag: only produces output for failing tests (FINISHED)<br />
* Output a diff if an assertEqual fails (not yet finished)<br />
<br />
== Stack traces in GHC ==<br />
<br />
I (Simon M.) was working on adding better support for source code annotations in GHC's Core language, and unifying the way we handle profiling cost centres, coverage annotations, and breakpoints in the GHCi debugger. I'm working on a new semantics for cost centre stacks, which will allow us to track call stacks at runtime. This will not only give us more accurate profiling, but also let us report stack traces on errors (e.g. head []), perhaps enabled by a compile-time option.<br />
<br />
== TLS ==<br />
<br />
lots of various misc improvements: re-work records for more type safety, support for compression, initial support of version 1.2. (Vincent Hanquez)<br />
<br />
https://github.com/vincenthz/hs-tls<br />
<br />
== Pattern Synonyms ==<br />
Simon PJ, Simon M, and I (Lennart Augustsson) hashed out a design for pattern synonyms. I've barely begun implementation, so don't expect anything just yet.<br />
<br />
http://hackage.haskell.org/trac/ghc/wiki/PatternSynonyms<br />
<br />
== Frequent Subgraph Algorithm ==<br />
I (Stephen Lavelle) implemented the vSiGraM algorithm for searching for frequent vertex-disjoint subgraphs of large graphs. Doesn't really work for large graphs, though - 'tis quite slow right now.<br />
<br />
https://github.com/increpare/vsigram<br />
<br />
== Data.ByteString.Lazy.Builder ==<br />
<br />
Duncan Coutts and Johan Tibell reviewed my (Simon Meier) work on a lazy bytestring builder for the 'bytestring' library; based on blaze-builder. The API and implementation is now complete. Documentation is there, but needs lots of polishing. I also gave a talk on when and how to use the bytestring builder. The slides and example code are available from [https://github.com/meiersi/bytestring/blob/master/bench/CSV.hs github].<br />
<br />
== Error reporting for JsonGrammar ==<br />
<br />
Some progress on better error reporting for the JSON conversion functions in the [http://hackage.haskell.org/package/JsonGrammar JsonGrammar package] was made. Now <hask>fromJson</hask> no longer says <hask>Nothing</hask> when there is an error in the input JSON. Progress can be [https://github.com/MedeaMelana/JsonGrammar/tree/errors tracked on GitHub]. When finished it will be merged back into the master branch and released in a new version on hackage. (Martijn van Steenbergen)</div>MichalTerepetahttps://wiki.haskell.org/index.php?title=CamHac&diff=40327CamHac2011-06-02T07:58:45Z<p>MichalTerepeta: </p>
<hr />
<div>Haskell Hackaton in Cambridge, UK, '''August 12-14, 2011'''<br />
<br />
== About ==<br />
<br />
Come and spend a weekend in Cambridge hacking Haskell code in great surroundings with fantastic company! Haskell Hackathons are a tradition where everyone is welcome; we get together, work on projects with others or just do your own thing, the overall goal being to improve the Haskell ecosystem.<br />
<br />
CamHac will be held from 12-14 August 2011, at [http://www.homertonconference.com/ Homerton College] in Cambridge. As with previous Hackathons, all are welcome -- you do not have to be a Haskell guru. All you need is a basic knowledge of Haskell, a willingness to learn, and a project you're excited to help with (or a project of your own to work on).<br />
<br />
There will be lots of hacking, good food, and, of course, fun! <br />
<br />
* Organiser: [mailto:marlowsd@gmail.com Simon Marlow] (<tt>JaffaCake</tt> on IRC)<br />
* Mailing list: [http://www.haskell.org/mailman/listinfo/hackathon hackathon@haskell.org]<br />
* IRC channel: #ghc on FreeNode<br />
<br />
Many thanks to [http://research.microsoft.com/en-us/labs/cambridge/default.aspx Microsoft Research Cambridge] for agreeing to sponsor the event.<br />
<br />
== Registration ==<br />
<br />
'''Registration deadline''': Friday 15th July 2011<br />
<br />
Registration is free. To register, please email [mailto:msrcevnt@microsoft.com msrcevnt@microsoft.com] stating that you would like to register for the "Haskell Hackathon", with the following information<br />
<br />
Full name:<br />
Which days you are attending on:<br />
day 1: yes/no<br />
day 2: yes/no<br />
day 3: yes/no<br />
Dietary requirements:<br />
<br />
The venue is '''limited to 50 (edit: now 72!) people''', and registration is first-come first-served, so register quickly to reserve your place! (but only register if you definitely intend to come, and please let us know if you find you cannot make it for any reason after you have registered, so we can re-allocate your place).<br />
<br />
Some people will probably want to travel on Friday morning and join us later on that day - that's absolutely fine.<br />
<br />
== Venue ==<br />
<br />
We're in the [http://www.homertonconference.com/Leah-Manning.html Leah Manning Room] of [http://www.homertonconference.com/ Homerton Conference Centre]. It is about [http://www.google.co.uk/maps?f=d&source=s_d&saddr=United+Kingdom+(Cambridge,+Railway+Station+(Stop+B))&daddr=CB2+8PH&hl=en&geocode=FehrHAMdjhUCACHpLU_p7S-CNg%3BFc5LHAMdNhMCACmn-uB8eXrYRzFlrDhff7fJ9A&mra=iwd&dirflg=w&sll=52.190667,0.134583&sspn=0.021547,0.040598&ie=UTF8&z=16 15 minutes walk from the train station], and Cambridge town centre is about 30 minutes walk.<br />
<br />
'''Times''': we have the room booked all day for the three days, and we'll probably start around 10am and finish around 6pm. Exact time details to be confirmed later. <br />
<br />
There will be WiFi access.<br />
<br />
There will be a projector for giving talks/demos. We will probably reserve a part of the time for talks and demos.<br />
<br />
== Food ==<br />
<br />
Tea and coffee will be supplied. We will have to go out to find lunch, but there are various places to eat and buy food at the [http://www.cambridge-x.co.uk Cambridge Leisure Park] a few minutes walk towards Cambridge town centre. In the evening we will probably head towards the town where there are plenty of good restaurants.<br />
<br />
== Local arrangements ==<br />
<br />
=== Getting to Cambridge ===<br />
<br />
==== By Plane ====<br />
<br />
* [http://www.stanstedairport.com/ Stansted Airport]: Stansted is the nearest of the London-area airports to Cambridge. It is mostly served by flights to and from mainland Europe, Ireland, and elsewhere in the UK. <br />
<br />
* [http://www.heathrowairport.com/ Heathrow Airport]: Heathrow is the principal London-area airport and one of the busiest in Europe with a wide range of national, European, and international services. <br />
<br />
* [http://www.gatwickairport.com/ Gatwick Airport]: Gatwick is the second "London" airport with a wide range of national, European and international services. <br />
<br />
* Other airports: [http://www.london-luton.co.uk/ Luton Airport], [http://www.norwichairport.co.uk/ Norwich airport], and [http://www.southendairport.com/ Southend airport] are other regional airports in the East Anglia region. If you use these, car or taxi is the best option for travel to Cambridge. <br />
<br />
==== Trains from London ====<br />
<br />
London has two train lines into Cambridge, London Kings Cross and London Liverpool Street. There is a regular service on both lines and duration is under an hour on the direct trains. Go to [http://www.nationalrail.co.uk National Rail] to check train times<br />
<br />
=== Getting to the venue ===<br />
<br />
[http://www.google.co.uk/maps?f=d&source=s_d&saddr=United+Kingdom+(Cambridge,+Railway+Station+(Stop+B))&daddr=CB2+8PH&hl=en&geocode=FehrHAMdjhUCACHpLU_p7S-CNg%3BFc5LHAMdNhMCACmn-uB8eXrYRzFlrDhff7fJ9A&mra=iwd&dirflg=w&sll=52.190667,0.134583&sspn=0.021547,0.040598&ie=UTF8&z=16 Walk from the train station] (about 15 minutes)<br />
<br />
[http://www.homertonconference.com/How-to-find-us.html How to find the venue]<br />
<br />
'''Local Taxis''': Panther Taxis 01223 715715<br />
<br />
=== Accommodation ===<br />
<br />
[http://www.visitcambridge.org/VisitCambridge/WhereToStay.aspx VisitCambridge: Where to Stay in Cambridge]<br />
<br />
The nearest hotels to the venue seem to be:<br />
<br />
* [http://www2.travelodge.co.uk/ Travelodge] (Cambridge Central) is just a few minutes walk from the venue. It is currently charging £65.80 per night for 11-14 August.<br />
* [http://www.helenhotel.co.uk/index.htm Helen Hotel]<br />
* [http://www.bandbincambridgeshire.co.uk/ Bridge Guest House]<br />
* [http://www.cheapguesthouses.com/ Fairways Guest House]<br />
* [http://www.abbeyfieldguesthouse.com/ Abbeyfield Guest House]<br />
* [http://rockviewguesthouse.co.uk/default.aspx Rock View Guest House]<br />
* [http://alingtonhouse.com/default.aspx Alington House Guest House]<br />
* [http://www.yha.org.uk/find-accommodation/east-of-england/hostels/cambridge/index.aspx Cambridge Youth Hostel]<br />
* [http://www.cambridgerooms.co.uk/ Stay in Cambridge Colleges]<br />
<br />
If you contact any of the above and find they're booked up, please remove them from the list.<br />
<br />
Microsoft Research recommends the following hotels to visitors, these are closer to the city centre but are probably a lot more expensive than those above:<br />
<br />
* [http://www.hilton.co.uk/cambridgegardenhouse Double Tree by Hilton Garden House Cambridge]<br />
* [http://www.ichotelsgroup.com/h/d/cp/1/en/hotel/cbguk Crowne Plaza Cambridge]<br />
* [http://www.devere.co.uk/our-locations/university-arms.html De Vere University Arms]<br />
<br />
== Projects ==<br />
<br />
Use this space to list projects you are interested in working on, and add your name to projects you are interested in helping with.<br />
<br />
* General hacking away at Snap Framework (exact goals TBD), perhaps adding/improving documentation/tutorials at the same time. (Jurriën Stutterheim)<br />
* Darcs<br />
* Something games/3d related? (Stephen L)<br />
<br />
== Attendees ==<br />
<br />
# Simon Marlow<br />
# Jurriën Stutterheim<br />
# Neil Mitchell<br />
# Jasper Van der Jeugt<br />
# Max Bolingbroke<br />
# Ben Millwood<br />
# Roman Leshchinskiy<br />
# Gregory Collins<br />
# Martijn van Steenbergen<br />
# Sjoerd Visscher<br />
# Sebastiaan Visser<br />
# Tom Lokhorst<br />
# Erik Hesselink<br />
# Jeff Foster<br />
# Sebastian Korten<br />
# Alessandro Vermeulen<br />
# Vlad Hanciuta<br />
# Ganesh Sittampalam<br />
# Eric Kow<br />
# Alexander Njemz<br />
# Mikolaj Konarski<br />
# Ian Lynagh<br />
# Andres Löh<br />
# Jeroen Janssen<br />
# Nicolas Wu<br />
# Duncan Coutts<br />
# Dominic Orchard<br />
# Jacek Generowicz<br />
# Owen Stephens<br />
# Benedict Eastaugh<br />
# Stephen Lavelle<br />
# Sam Martin<br />
# Alex Horsman<br />
# Andy Georges<br />
# Niklas Larsson<br />
# Raeez Lorgat<br />
# Maryna Strelchuk<br />
# Vincent Hanquez<br />
# Chris Done<br />
# Tomas Petricek<br />
# Thomas Schilling<br />
# Dragos Ionita<br />
# Simon Meier<br />
# Will Thompson<br />
# Sergii Strelchuk<br />
# Lennart Kolmodin<br />
# Philippa Cowderoy<br />
# Steven Keuchel<br />
# Michal Terepeta<br />
<br />
* Add your name here, once registered...</div>MichalTerepetahttps://wiki.haskell.org/index.php?title=Parsing_a_simple_imperative_language&diff=33160Parsing a simple imperative language2010-01-19T15:40:20Z<p>MichalTerepeta: Remember that Token is a qualified import.</p>
<hr />
<div>This tutorial will present how to parse a subset of a simple imperative<br />
programming language called W<small>HILE</small> (introduced in a book<br />
"Principles of Program Analysis" by Nielson, Nielson and Hankin). It includes<br />
only a few statements and basic boolean/arithmetic expressions, which makes it<br />
a nice material for a tutorial.<br />
<br />
== Imports ==<br />
<br />
First let's specify the name of the module:<br />
<br />
<haskell><br />
<br />
> module ParseWhile where<br />
<br />
</haskell><br />
<br />
And then import the necessary libraries:<br />
<br />
<haskell><br />
<br />
> import System.IO<br />
> import Control.Monad<br />
> import Text.ParserCombinators.Parsec<br />
> import Text.ParserCombinators.Parsec.Expr<br />
> import Text.ParserCombinators.Parsec.Language<br />
> import qualified Text.ParserCombinators.Parsec.Token as Token<br />
<br />
</haskell><br />
<br />
== The language ==<br />
<br />
The grammar for expressions is defined as follows:<br />
<br />
<tt><br />
<br />
''a'' ::= ''x'' | ''n'' | - ''a'' | ''a'' ''opa'' ''a''<br />
<br />
''b'' ::= true | false | not ''b'' | ''b'' ''opb'' ''b'' | ''a'' ''opr'' ''a''<br />
<br />
''opa'' ::= + | - | * | /<br />
<br />
''opb'' ::= and | or<br />
<br />
''opr'' ::= > | <<br />
<br />
</tt><br />
<br />
Note that we have three groups of operators - arithmetic, booloan and<br />
relational ones.<br />
<br />
And now the definition of statements:<br />
<br />
<tt><br />
<br />
''S'' ::= x := ''a'' | skip | ''S1''; ''S2'' | ''( S )'' | if ''b'' then ''S1'' else ''S2'' | while ''b'' do ''S''<br />
<br />
</tt><br />
<br />
We probably want to parse that into some internal representation of the<br />
language (abstract syntax tree). Therefore we need to define the data<br />
structures for the expressions and statements.<br />
<br />
== Data structures ==<br />
<br />
We need to take care of boolean and arithmetic expressions and the<br />
appropriate operators. First let's look at the boolean expressions:<br />
<br />
<haskell><br />
<br />
> data BExpr = BoolConst Bool<br />
> | Not BExpr<br />
> | BBinary BBinOp BExpr BExpr<br />
> | RBinary RBinOp AExpr AExpr<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
Binary booloan operators:<br />
<br />
<haskell><br />
<br />
> data BBinOp = And | Or deriving (Show)<br />
<br />
</haskell><br />
<br />
Relational operators:<br />
<br />
<haskell><br />
<br />
> data RBinOp = Greater | Less deriving (Show)<br />
<br />
</haskell><br />
<br />
Now we define the types for arithmetic expressions:<br />
<br />
<haskell><br />
<br />
> data AExpr = Var String<br />
> | IntConst Integer<br />
> | Neg AExpr<br />
> | ABinary ABinOp AExpr AExpr<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
And arithmetic operators:<br />
<br />
<haskell><br />
<br />
> data ABinOp = Add<br />
> | Subtract<br />
> | Multiply<br />
> | Divide<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
Finally let's take care of the statements:<br />
<br />
<haskell><br />
<br />
> data Stmt = Seq [Stmt]<br />
> | Assign String AExpr<br />
> | If BExpr Stmt Stmt<br />
> | While BExpr Stmt<br />
> | Skip<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
== Lexer ==<br />
<br />
Having all the data structures we can go on with writing the code to do actual<br />
parsing. First of all we create the language definition using Haskell's record<br />
syntax and the constructor <hask>emptyDef</hask> (from<br />
<hask>Text.ParserCombinators.Parsec.Language</hask>):<br />
<br />
<haskell><br />
<br />
> languageDef =<br />
> emptyDef { Token.commentStart = "/*"<br />
> , Token.commentEnd = "*/"<br />
> , Token.commentLine = "//"<br />
> , Token.identStart = letter<br />
> , Token.identLetter = alphaNum<br />
> , Token.reservedNames = [ "if"<br />
> , "then"<br />
> , "else"<br />
> , "while"<br />
> , "do"<br />
> , "skip"<br />
> , "true"<br />
> , "false"<br />
> , "not"<br />
> , "and"<br />
> , "or"<br />
> ]<br />
> , Token.reservedOpNames = ["+", "-", "*", "/", ":="<br />
> , "<", ">", "and", "or", "not"<br />
> ]<br />
> }<br />
<br />
</haskell><br />
<br />
This creates a language definition that accepts the C-style comments, requires<br />
that the identifiers start with a letter, and end with alphanumeric<br />
characters. Moreover there is a number of reserved names, that cannot be used<br />
by the identifiers.<br />
<br />
Having the above definition we can create a lexer:<br />
<br />
<haskell><br />
<br />
> lexer = Token.makeTokenParser languageDef<br />
<br />
</haskell><br />
<br />
<tt>lexer</tt> contains a number of lexical parsers, that we can us to parse<br />
identifiers, reserved words/operations, etc. Now we can select/extract them in<br />
the following way:<br />
<br />
<haskell><br />
<br />
> identifier = Token.identifier lexer -- parses an identifier<br />
> reserved = Token.reserved lexer -- parses a reserved name<br />
> reservedOp = Token.reservedOp lexer -- parses an operator<br />
> parens = Token.parens lexer -- parses surrounding parenthesis:<br />
> -- parens p<br />
> -- takes care of the parenthesis and<br />
> -- uses p to parse what's inside them<br />
> integer = Token.integer lexer -- parses an integer<br />
> semi = Token.semi lexer -- parses a semicolon<br />
> whiteSpace = Token.whiteSpace lexer -- parses whitespace<br />
<br />
</haskell><br />
<br />
This isn't really necessary, but should make the code much more readable (also<br />
this is the reason why we used the qualified import of<br />
<hask>Text.ParserCombinators.Parsec.Token</hask>). Now we can use them to<br />
parse the source code at the token level. One of the nice features of these<br />
parsers is that they take care of all whitespace after the tokens.<br />
<br />
== Main parser ==<br />
<br />
As already mentioned a program in this language is simply a statement, so the<br />
main parser should basically only parse a statement. But remember to take care of<br />
initial whitespace - our parsers only get rid of whitespace after the tokens!<br />
<br />
<haskell><br />
<br />
> whileParser :: Parser Stmt<br />
> whileParser = whiteSpace >> statement<br />
<br />
</haskell><br />
<br />
Now because any statement might be actually a sequence of statements separated<br />
by semicolon, we use <hask>sepBy1</hask> to parse at least one statement. The<br />
result is a list of statements. We also allow grouping statements by the<br />
parenthesis, which is useful, for instance, in the <tt>while</tt> loop.<br />
<br />
<haskell><br />
<br />
> statement :: Parser Stmt<br />
> statement = parens statement<br />
> <|> sequenceOfStmt<br />
<br />
> sequenceOfStmt =<br />
> do list <- (sepBy1 statement' semi)<br />
> -- If there's only one statement return it without using Seq.<br />
> return $ if length list == 1 then head list else Seq list<br />
<br />
</haskell><br />
<br />
Now a single statement is quite simple, it's either an if conditional, a while<br />
loop, an assignment or simply a skip statement. We use <hask><|></hask> to<br />
express choice. So <hask>a <|> b</hask> will first try parser <hask>a</hask><br />
and if it fails (but without actually consuming any input) then parser<br />
<hask>b</hask> will be used. Note: this means that the order is important.<br />
<br />
<haskell><br />
<br />
> statement' :: Parser Stmt<br />
> statement' = ifStmt<br />
> <|> whileStmt<br />
> <|> skipStmt<br />
> <|> assignStmt<br />
<br />
</haskell><br />
<br />
If you have a parser that might fail after consuming some input, and you still<br />
want to try the next parser, you should look into <hask>try</hask> combinator.<br />
For instance <hask>try p <|> q</hask> will try parsing with <hask>p</hask> and<br />
if it fails, even after consuming the input, the <hask>q</hask> parser will be<br />
used as if nothing has been consumed by <hask>p</hask>.<br />
<br />
Now let's define the parsers for all the possible statements. This is quite<br />
straightforward as we just use the parsers from the lexer and then use all the<br />
necessary information to create appropriate data structures.<br />
<br />
<haskell><br />
<br />
> ifStmt :: Parser Stmt<br />
> ifStmt =<br />
> do reserved "if"<br />
> cond <- bExpression<br />
> reserved "then"<br />
> stmt1 <- statement<br />
> reserved "else"<br />
> stmt2 <- statement<br />
> return $ If cond stmt1 stmt2<br />
<br />
> whileStmt :: Parser Stmt<br />
> whileStmt =<br />
> do reserved "while"<br />
> cond <- bExpression<br />
> reserved "do"<br />
> stmt <- statement<br />
> return $ While cond stmt<br />
<br />
> assignStmt :: Parser Stmt<br />
> assignStmt =<br />
> do var <- identifier<br />
> reservedOp ":="<br />
> expr <- aExpression<br />
> return $ Assign var expr<br />
<br />
> skipStmt :: Parser Stmt<br />
> skipStmt = reserved "skip" >> return Skip<br />
<br />
</haskell><br />
<br />
== Expressions ==<br />
<br />
What's left is to parse the expressions. Fortunately Parsec provides a very<br />
easy way to do that. Let's define the arithmetic and boolean expressions:<br />
<br />
<haskell><br />
<br />
> aExpression :: Parser AExpr<br />
> aExpression = buildExpressionParser aOperators aTerm<br />
<br />
> bExpression :: Parser BExpr<br />
> bExpression = buildExpressionParser bOperators bTerm<br />
<br />
</haskell><br />
<br />
Now we have to define the lists with operator precedence, associativity and<br />
what constructors to use in each case.<br />
<br />
<haskell><br />
<br />
> aOperators = [ [Prefix (reservedOp "-" >> return (Neg )) ]<br />
> , [Infix (reservedOp "*" >> return (ABinary Multiply)) AssocLeft]<br />
> , [Infix (reservedOp "/" >> return (ABinary Divide )) AssocLeft]<br />
> , [Infix (reservedOp "+" >> return (ABinary Add )) AssocLeft]<br />
> , [Infix (reservedOp "-" >> return (ABinary Subtract)) AssocLeft]<br />
> ]<br />
<br />
> bOperators = [ [Prefix (reservedOp "not" >> return (Not )) ]<br />
> , [Infix (reservedOp "and" >> return (BBinary And )) AssocLeft]<br />
> , [Infix (reservedOp "or" >> return (BBinary Or )) AssocLeft]<br />
> ]<br />
<br />
</haskell><br />
<br />
In case of Prefix operators it is enough to specify which one should be parsed<br />
and what is the associated data constructor. Infix operators are defined<br />
similarly, but it's necessary to add information about associativity. Note<br />
that the operator precedence depends only on the order of the elements in the<br />
list.<br />
<br />
Finally we have to define the terms. In case of arithmetic expressions, it is<br />
quite simple:<br />
<br />
<haskell><br />
<br />
> aTerm = parens aExpression<br />
> <|> liftM Var identifier<br />
> <|> liftM IntConst integer<br />
<br />
</haskell><br />
<br />
However, the term in a boolean expression is a bit more tricky. In this case,<br />
a term can also be an expression with relational operator consisting of<br />
arithmetic expressions.<br />
<br />
<haskell><br />
<br />
> bTerm = parens bExpression<br />
> <|> (reserved "true" >> return (BoolConst True ))<br />
> <|> (reserved "false" >> return (BoolConst False))<br />
> <|> rExpression<br />
<br />
</haskell><br />
<br />
Therefore we have to define a parser for relational expressions:<br />
<br />
<haskell><br />
<br />
> rExpression =<br />
> do a1 <- aExpression<br />
> op <- relation<br />
> a2 <- aExpression<br />
> return $ RBinary op a1 a2<br />
<br />
> relation = (reservedOp ">" >> return Greater)<br />
> <|> (reservedOp "<" >> return Less)<br />
<br />
</haskell><br />
<br />
And that's it. We have a quite simple parser able to parse a few statements and<br />
arithmetic/boolean expressions.<br />
<br />
== Notes ==<br />
<br />
If you want to experiment with the parser inside ghci, these functions might be<br />
handy:<br />
<br />
<haskell><br />
<br />
> parseString :: String -> Stmt<br />
> parseString str =<br />
> case parse whileParser "" str of<br />
> Left e -> error $ show e<br />
> Right r -> r<br />
<br />
> parseFile :: String -> IO Stmt<br />
> parseFile file =<br />
> do program <- readFile file<br />
> case parse whileParser "" program of<br />
> Left e -> print e >> fail "parse error"<br />
> Right r -> return r<br />
<br />
</haskell><br />
<br />
Now you can simply load the module in ghci and then do<br />
<hask>ast <- parseFile "<filename>"</hask> to parse a file and get the<br />
result if parsing was successful. If you already have a string with<br />
the program, you can use <hask>parseString</hask>.<br />
<br />
[[Category:How to]]</div>MichalTerepetahttps://wiki.haskell.org/index.php?title=Parsing_a_simple_imperative_language&diff=32600Parsing a simple imperative language2009-12-15T21:09:19Z<p>MichalTerepeta: Initial version.</p>
<hr />
<div>This tutorial will present how to parse a subset of a simple imperative<br />
programming language called W<small>HILE</small> (introduced in a book<br />
"Principles of Program Analysis" by Nielson, Nielson and Hankin). It includes<br />
only a few statements and basic boolean/arithmetic expressions, which makes it<br />
a nice material for a tutorial.<br />
<br />
== Imports ==<br />
<br />
First let's specify the name of the module:<br />
<br />
<haskell><br />
<br />
> module ParseWhile where<br />
<br />
</haskell><br />
<br />
And then import the necessary libraries:<br />
<br />
<haskell><br />
<br />
> import System.IO<br />
> import Control.Monad<br />
> import Text.ParserCombinators.Parsec<br />
> import Text.ParserCombinators.Parsec.Expr<br />
> import Text.ParserCombinators.Parsec.Language<br />
> import qualified Text.ParserCombinators.Parsec.Token as Token<br />
<br />
</haskell><br />
<br />
== The language ==<br />
<br />
The grammar for expressions is defined as follows:<br />
<br />
<tt><br />
<br />
''a'' ::= ''x'' | ''n'' | - ''a'' | ''a'' ''opa'' ''a''<br />
<br />
''b'' ::= true | false | not ''b'' | ''b'' ''opb'' ''b'' | ''a'' ''opr'' ''a''<br />
<br />
''opa'' ::= + | - | * | /<br />
<br />
''opb'' ::= and | or<br />
<br />
''opr'' ::= > | <<br />
<br />
</tt><br />
<br />
Note that we have three groups of operators - arithmetic, booloan and<br />
relational ones.<br />
<br />
And now the definition of statements:<br />
<br />
<tt><br />
<br />
''S'' ::= x := ''a'' | skip | ''S1''; ''S2'' | ''( S )'' | if ''b'' then ''S1'' else ''S2'' | while ''b'' do ''S''<br />
<br />
</tt><br />
<br />
We probably want to parse that into some internal representation of the<br />
language (abstract syntax tree). Therefore we need to define the data<br />
structures for the expressions and statements.<br />
<br />
== Data structures ==<br />
<br />
We need to take care of boolean and arithmetic expressions and the<br />
appropriate operators. First let's look at the boolean expressions:<br />
<br />
<haskell><br />
<br />
> data BExpr = BoolConst Bool<br />
> | Not BExpr<br />
> | BBinary BBinOp BExpr BExpr<br />
> | RBinary RBinOp AExpr AExpr<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
Binary booloan operators:<br />
<br />
<haskell><br />
<br />
> data BBinOp = And | Or deriving (Show)<br />
<br />
</haskell><br />
<br />
Relational operators:<br />
<br />
<haskell><br />
<br />
> data RBinOp = Greater | Less deriving (Show)<br />
<br />
</haskell><br />
<br />
Now we define the types for arithmetic expressions:<br />
<br />
<haskell><br />
<br />
> data AExpr = Var String<br />
> | IntConst Integer<br />
> | Neg AExpr<br />
> | ABinary ABinOp AExpr AExpr<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
And arithmetic operators:<br />
<br />
<haskell><br />
<br />
> data ABinOp = Add<br />
> | Subtract<br />
> | Multiply<br />
> | Divide<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
Finally let's take care of the statements:<br />
<br />
<haskell><br />
<br />
> data Stmt = Seq [Stmt]<br />
> | Assign String AExpr<br />
> | If BExpr Stmt Stmt<br />
> | While BExpr Stmt<br />
> | Skip<br />
> deriving (Show)<br />
<br />
</haskell><br />
<br />
== Lexer ==<br />
<br />
Having all the data structures we can go on with writing the code to do actual<br />
parsing. First of all we create the language definition using Haskell's record<br />
syntax and the constructor <hask>emptyDef</hask> (from<br />
<hask>Text.ParserCombinators.Parsec.Language</hask>):<br />
<br />
<haskell><br />
<br />
> languageDef =<br />
> emptyDef { commentStart = "/*"<br />
> , commentEnd = "*/"<br />
> , commentLine = "//"<br />
> , Token.identStart = letter<br />
> , Token.identLetter = alphaNum<br />
> , Token.reservedNames = [ "if"<br />
> , "then"<br />
> , "else"<br />
> , "while"<br />
> , "do"<br />
> , "skip"<br />
> , "true"<br />
> , "false"<br />
> , "not"<br />
> , "and"<br />
> , "or"<br />
> ]<br />
> , Token.reservedOpNames = ["+", "-", "*", "/", ":="<br />
> , "<", ">", "and", "or", "not"<br />
> ]<br />
> }<br />
<br />
</haskell><br />
<br />
This creates a language definition that accepts the C-style comments, requires<br />
that the identifiers start with a letter, and end with alphanumeric<br />
characters. Moreover there is a number of reserved names, that cannot be used<br />
by the identifiers.<br />
<br />
Having the above definition we can create a lexer:<br />
<br />
<haskell><br />
<br />
> lexer = Token.makeTokenParser languageDef<br />
<br />
</haskell><br />
<br />
<tt>lexer</tt> contains a number of lexical parsers, that we can us to parse<br />
identifiers, reserved words/operations, etc. Now we can select/extract them in<br />
the following way:<br />
<br />
<haskell><br />
<br />
> identifier = Token.identifier lexer -- parses an identifier<br />
> reserved = Token.reserved lexer -- parses a reserved name<br />
> reservedOp = Token.reservedOp lexer -- parses an operator<br />
> parens = Token.parens lexer -- parses surrounding parenthesis:<br />
> -- parens p<br />
> -- takes care of the parenthesis and<br />
> -- uses p to parse what's inside them<br />
> integer = Token.integer lexer -- parses an integer<br />
> semi = Token.semi lexer -- parses a semicolon<br />
> whiteSpace = Token.whiteSpace lexer -- parses whitespace<br />
<br />
</haskell><br />
<br />
This isn't really necessary, but should make the code much more readable (also<br />
this is the reason why we used the qualified import of<br />
<hask>Text.ParserCombinators.Parsec.Token</hask>). Now we can use them to<br />
parse the source code at the token level. One of the nice features of these<br />
parsers is that they take care of all whitespace after the tokens.<br />
<br />
== Main parser ==<br />
<br />
As already mentioned a program in this language is simply a statement, so the<br />
main parser should basically only parse a statement. But remember to take care of<br />
initial whitespace - our parsers only get rid of whitespace after the tokens!<br />
<br />
<haskell><br />
<br />
> whileParser :: Parser Stmt<br />
> whileParser = whiteSpace >> statement<br />
<br />
</haskell><br />
<br />
Now because any statement might be actually a sequence of statements separated<br />
by semicolon, we use <hask>sepBy1</hask> to parse at least one statement. The<br />
result is a list of statements. We also allow grouping statements by the<br />
parenthesis, which is useful, for instance, in the <tt>while</tt> loop.<br />
<br />
<haskell><br />
<br />
> statement :: Parser Stmt<br />
> statement = parens statement<br />
> <|> sequenceOfStmt<br />
<br />
> sequenceOfStmt =<br />
> do list <- (sepBy1 statement' semi)<br />
> -- If there's only one statement return it without using Seq.<br />
> return $ if length list == 1 then head list else Seq list<br />
<br />
</haskell><br />
<br />
Now a single statement is quite simple, it's either an if conditional, a while<br />
loop, an assignment or simply a skip statement. We use <hask><|></hask> to<br />
express choice. So <hask>a <|> b</hask> will first try parser <hask>a</hask><br />
and if it fails (but without actually consuming any input) then parser<br />
<hask>b</hask> will be used. Note: this means that the order is important.<br />
<br />
<haskell><br />
<br />
> statement' :: Parser Stmt<br />
> statement' = ifStmt<br />
> <|> whileStmt<br />
> <|> skipStmt<br />
> <|> assignStmt<br />
<br />
</haskell><br />
<br />
If you have a parser that might fail after consuming some input, and you still<br />
want to try the next parser, you should look into <hask>try</hask> combinator.<br />
For instance <hask>try p <|> q</hask> will try parsing with <hask>p</hask> and<br />
if it fails, even after consuming the input, the <hask>q</hask> parser will be<br />
used as if nothing has been consumed by <hask>p</hask>.<br />
<br />
Now let's define the parsers for all the possible statements. This is quite<br />
straightforward as we just use the parsers from the lexer and then use all the<br />
necessary information to create appropriate data structures.<br />
<br />
<haskell><br />
<br />
> ifStmt :: Parser Stmt<br />
> ifStmt =<br />
> do reserved "if"<br />
> cond <- bExpression<br />
> reserved "then"<br />
> stmt1 <- statement<br />
> reserved "else"<br />
> stmt2 <- statement<br />
> return $ If cond stmt1 stmt2<br />
<br />
> whileStmt :: Parser Stmt<br />
> whileStmt =<br />
> do reserved "while"<br />
> cond <- bExpression<br />
> reserved "do"<br />
> stmt <- statement<br />
> return $ While cond stmt<br />
<br />
> assignStmt :: Parser Stmt<br />
> assignStmt =<br />
> do var <- identifier<br />
> reservedOp ":="<br />
> expr <- aExpression<br />
> return $ Assign var expr<br />
<br />
> skipStmt :: Parser Stmt<br />
> skipStmt = reserved "skip" >> return Skip<br />
<br />
</haskell><br />
<br />
== Expressions ==<br />
<br />
What's left is to parse the expressions. Fortunately Parsec provides a very<br />
easy way to do that. Let's define the arithmetic and boolean expressions:<br />
<br />
<haskell><br />
<br />
> aExpression :: Parser AExpr<br />
> aExpression = buildExpressionParser aOperators aTerm<br />
<br />
> bExpression :: Parser BExpr<br />
> bExpression = buildExpressionParser bOperators bTerm<br />
<br />
</haskell><br />
<br />
Now we have to define the lists with operator precedence, associativity and<br />
what constructors to use in each case.<br />
<br />
<haskell><br />
<br />
> aOperators = [ [Prefix (reservedOp "-" >> return (Neg )) ]<br />
> , [Infix (reservedOp "*" >> return (ABinary Multiply)) AssocLeft]<br />
> , [Infix (reservedOp "/" >> return (ABinary Divide )) AssocLeft]<br />
> , [Infix (reservedOp "+" >> return (ABinary Add )) AssocLeft]<br />
> , [Infix (reservedOp "-" >> return (ABinary Subtract)) AssocLeft]<br />
> ]<br />
<br />
> bOperators = [ [Prefix (reservedOp "not" >> return (Not )) ]<br />
> , [Infix (reservedOp "and" >> return (BBinary And )) AssocLeft]<br />
> , [Infix (reservedOp "or" >> return (BBinary Or )) AssocLeft]<br />
> ]<br />
<br />
</haskell><br />
<br />
In case of Prefix operators it is enough to specify which one should be parsed<br />
and what is the associated data constructor. Infix operators are defined<br />
similarly, but it's necessary to add information about associativity. Note<br />
that the operator precedence depends only on the order of the elements in the<br />
list.<br />
<br />
Finally we have to define the terms. In case of arithmetic expressions, it is<br />
quite simple:<br />
<br />
<haskell><br />
<br />
> aTerm = parens aExpression<br />
> <|> liftM Var identifier<br />
> <|> liftM IntConst integer<br />
<br />
</haskell><br />
<br />
However, the term in a boolean expression is a bit more tricky. In this case,<br />
a term can also be an expression with relational operator consisting of<br />
arithmetic expressions.<br />
<br />
<haskell><br />
<br />
> bTerm = parens bExpression<br />
> <|> (reserved "true" >> return (BoolConst True ))<br />
> <|> (reserved "false" >> return (BoolConst False))<br />
> <|> rExpression<br />
<br />
</haskell><br />
<br />
Therefore we have to define a parser for relational expressions:<br />
<br />
<haskell><br />
<br />
> rExpression =<br />
> do a1 <- aExpression<br />
> op <- relation<br />
> a2 <- aExpression<br />
> return $ RBinary op a1 a2<br />
<br />
> relation = (reservedOp ">" >> return Greater)<br />
> <|> (reservedOp "<" >> return Less)<br />
<br />
</haskell><br />
<br />
And that's it. We have a quite simple parser able to parse a few statements and<br />
arithmetic/boolean expressions.<br />
<br />
== Notes ==<br />
<br />
If you want to experiment with the parser inside ghci, these functions might be<br />
handy:<br />
<br />
<haskell><br />
<br />
> parseString :: String -> Stmt<br />
> parseString str =<br />
> case parse whileParser "" str of<br />
> Left e -> error $ show e<br />
> Right r -> r<br />
<br />
> parseFile :: String -> IO Stmt<br />
> parseFile file =<br />
> do program <- readFile file<br />
> case parse whileParser "" program of<br />
> Left e -> print e >> fail "parse error"<br />
> Right r -> return r<br />
<br />
</haskell><br />
<br />
Now you can simply load the module in ghci and then do<br />
<hask>ast <- parseFile "<filename>"</hask> to parse a file and get the<br />
result if parsing was successful. If you already have a string with<br />
the program, you can use <hask>parseString</hask>.<br />
<br />
[[Category:How to]]</div>MichalTerepeta