# Difference between revisions of "Applications and libraries/Linguistics"

EndreyMark (talk | contribs) (Moving some of the contents of Libraries and tools ,,Natural language processing'' section to here, leaving only the directly Haskell-related materials there) |
(fix link to "being lazy with class") |
||

(66 intermediate revisions by 21 users not shown) | |||

Line 1: | Line 1: | ||

__TOC__ |
__TOC__ |
||

− | = Portals and other huge |
+ | == Portals and other huge resources == |

− | Peter Ljunglöf's many [http://www. |
+ | Peter Ljunglöf's many [http://www.cse.chalmers.se/~peb/bibliography.html publications] on natural language processing, parsing, formal semantics. Many of them use Haskell, and there are [http://www.ling.gu.se/~peb/software.html downloadable] Haskell sources too. |

+ | [http://homepages.cwi.nl/~jve/index.html Jan van Eijck's page] contains a huge amount of materials on logic and language: |
||

+ | * computational linguistics |
||

+ | * logics (e.g. dynamic epistemic modelling) |
||

+ | The [http://projects.haskell.org/nlp Haskell NLP projects] provides a mailing list for Haskellers doing NLP work, as well as a community wiki and darcs repository. Come join us! |
||

− | = Natural language processing and combinatory logic = |
||

+ | [https://github.com/nlpwp/book-old Natural Language Processing for The Working Programmer] is a book that provides an introduction to Haskell and NLP. |
||

− | == Applicative universal grammars == |
||

+ | There are many Haskell resources, too. |
||

− | * A Haskell application for natural language parsing, based on ''Applicative Universal Grammar'' (AUG) is described in Mark P. Jones', Paul Hudak's and Sebastian Shaumyan's [http://citeseer.ist.psu.edu/jones95using.html Using Types to Parse Natural Language]. The Haskell source code given by the article is full, it can be run by Gofer, and after a few modification, by GHC too (''transpose'' must be explictly imported from standard library module ''Data.List'', and class ''Text'' renamed to ''Show''). |
||

− | * A more detailed description of the topic of this previous article described in Sebastian Shaumyan and Paul Hudak's [http://citeseer.ist.psu.edu/510871.html Linguistic, Philosophical, and Pragmatic Aspects of Type-Directed Natural Language Parsing] |
||

− | * Bernard Paul Sypniewski's [http://elvis.rowan.edu/~bps/ling/introAUG.pdf An Introduction to Applicative Universal Grammar] (this link seems now broken, I hope only temporarily). As an article describing what AUG is, see also Shaumyan's [http://citeseer.ist.psu.edu/shaumyan98two.html Two Paradigms of Linguistics: The Semiotic versus Non-Semiotic Paradigm]. |
||

+ | == Tools and libraries == |
||

+ | * [http://www.w3.org/wiki/Cypher Cypher] is one of the first software program available which generates the metadata representation of natural language input. Cypher produces RDF graph and SeRQL query representations of sentences, clauses, phrases and questions. The Cypher framework provides a set of robust definition languages, which can be used to extend and create grammars and lexicons. Cypher programming is fun to learn and easy to use, and the specifications are designed to allow a novice to quickly and easily build transcoders for processing highly complex sentences and phrases of any natural language, and to cover any vocabulary |
||

+ | * [http://trac.loria.fr/~geni GenI] is a surface realiser for Tree Adjoining Grammars. Surface realisation can be seen as the last stage in a natural language generation pipeline. GenI in particular takes an FB-LTAG grammar and an input semantics (a conjunction of first order terms), and produces the set of sentences associated to the input semantics by the grammar. See also [http://www.loria.fr/~kow/ Eric Kow]'s recent publications on it. |
||

+ | * [http://grammaticalframework.org/ Grammatical Framework] (GF) is a compiler and grammatical programming environment written entirely in Haskell, with an interactive interpreter and two GUI interfaces, one written in Fudgets and another written in Java. GF grammars are written in a subset of Haskell and compile into an internal GF format that may be used as embedded parsers in Haskell, parsers in Java (with an embedded Java interpreter gfc2java.jar) and subsequently converted to applets ([http://www.cs.chalmers.se/~markus/gramlets/ Gramlets]). (GF-Haskell to Java translation is performed through an Open Agent Architecture--the original .NET, see [http://www.cs.chalmers.se/~bringert/gf/gf-oaa.html GF OAA].) The GF grammatical formalism handles linguistic entities (morphemes, etc.) using type theory: an approach especially suited to machine translation of controlled natural languages. The [http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/index.html Grammar Resource Library], a set of basic grammars for Danish, English, Finnish, French, German, Italian, Norwegian, Russian, Spanish and Swedish, is available as a separate download. GF has been used to translate a fragment of C code to JVM (see [http://www.cs.chalmers.se/~aarne/GF/doc/gfcc.pdf GFCC (PDF document)]). |
||

+ | * [http://www.cs.chalmers.se/~markus/FM/index.html Functional Morphology] - a toolkit for morphology development. Has been used for Swedish, Spanish, Urdu and more. |
||

+ | * '''Saxophone''' is a fun translator from German to the Saxon dialect. It is part of the [https://sourceforge.net/projects/parallelweb ParallelWeb] project which aims at translating Web pages including all of their links. |
||

+ | === Data interfaces=== |
||

+ | * [http://www.umiacs.umd.edu/~hal/HWordNet/ HWordNet] - A Haskell interface to WordNet by Hal Daumé III. |
||

+ | * [https://hackage.haskell.org/package/penn-treebank penn-treebank] parser |
||

− | == |
+ | === Tokenisation === |

+ | * [https://hackage.haskell.org/package/tokenize tokenize] is a simple tokenizer for English text |
||

+ | * [https://hackage.haskell.org/package/fullstop fullstop] is a simple tokenizer for English text |
||

+ | |||

+ | === Evaluation === |
||

+ | * [https://hackage.haskell.org/package/nlp-scores nlp-scores] provides scoring functions commonly used for evaluation in NLP and IR |
||

+ | |||

+ | === Morphology === |
||

+ | * [https://hackage.haskell.org/package/brillig brillig] is an incomplete implementation of Brill's transformation rule tagger (see also [http://www.nlpwp.org/book/chap-tagging.xhtml the chapter in Haskell NLP for the Working Programmer]) |
||

+ | * [https://hackage.haskell.org/package/morfette morfette] is a tool for supervised learning of inflectional morphology (tags and lemmas); see also [https://hackage.haskell.org/package/delta-h delta-h] and watch the prolific author's page at http://grzegorz.chrupala.me/#section3 |
||

+ | |||

+ | * [https://bitbucket.org/gchrupala/hiera hiera] implements an agglomerative/hierarchical word class clustering algorithm |
||

+ | |||

+ | * [http://hackage.haskell.org/package/HaLeX HaLeX is a library of datatypes and functions implemented in Haskell that allows us to model, manipulate and animate regular languages as finite state machines] (originally for educational purposes) |
||

+ | |||

+ | * [https://hackage.haskell.org/package/bytestring-trie bytestring-trie] is an efficient and fast way of storing and looking up strings as keys, assigned to values of some arbitrary type |
||

+ | |||

+ | === Learning algorithms === |
||

+ | * [https://hackage.haskell.org/package/sequor sequor] is a sequence labeler based on Collins's sequence perceptron, useful for Part-of-Speech tagging |
||

+ | * [https://hackage.haskell.org/package/progressive progressive] is a multilabel classification model which learns sequentially (online) |
||

+ | * [https://hackage.haskell.org/package/lda lda] is an Online Gibbs sampler for Latent Dirichlet Allocation, useful for probabilistic soft word class induction, see also [https://hackage.haskell.org/package/colada colada] |
||

+ | * [https://hackage.haskell.org/package/vowpal-utils vowpal-utils] is a wrapper around vw (Utilities for interpreting models produced by Vowpal Wabbit) |
||

+ | * [https://github.com/HuwCampbell/grenade grenade] is a dependently typed, practical, and pretty quick neural network library for concise and precise specifications of complex networks in Haskell (using lapack) |
||

+ | * [https://github.com/tensorflow/haskell tensorflow] has Haskell bindings (Tensorflow is a library by Google for numerical computation using data flow graphs) |
||

+ | |||

+ | ===Machine translation=== |
||

+ | * [https://hackage.haskell.org/package/hs-gizapp hs-gizapp] is a wrapper around GIZA++ |
||

+ | * [http://grammaticalframework.org/ Grammatical Framework] is used for machine translation, among many other things |
||

+ | |||

+ | == Natural language processing and combinatory logic == |
||

+ | |||

+ | [[Combinatory logic]] contributed to develop powerful theories in linguistics.. |
||

+ | |||

+ | === Applicative universal grammar === |
||

+ | |||

+ | Now it has got [[/Applicative universal grammar|its own HaskellWiki page]]. |
||

+ | |||

+ | === Categorial grammar === |
||

A general summary of modern semantic theories developed in the century |
A general summary of modern semantic theories developed in the century |
||

− | is provided by [http://citeseer. |
+ | is provided by [http://citeseer.ist.psu.edu/blackburn97logical.html Logical Aspects of Computational Linguistics: an introduction]. |

[http://www-unix.oit.umass.edu/~gmhwww/ Gary Hardegree]'s portal-rich page provides a lot of materials on logic and linguistics, among them |
[http://www-unix.oit.umass.edu/~gmhwww/ Gary Hardegree]'s portal-rich page provides a lot of materials on logic and linguistics, among them |
||

Line 29: | Line 76: | ||

On natural languages relating to combinatory logic, see also |
On natural languages relating to combinatory logic, see also |
||

− | * |
+ | * Mark Steedman's [http://citeseer.ist.psu.edu/steedman97does.html Does Grammar Make Use of Bound Variables?] |

* Mark Hepple: [http://citeseer.ist.psu.edu/hepple90grammar.html The Grammar and Processing of Order and Dependency: a Categorial Approach] |
* Mark Hepple: [http://citeseer.ist.psu.edu/hepple90grammar.html The Grammar and Processing of Order and Dependency: a Categorial Approach] |
||

+ | === Type-Logical Grammar === |
||

− | = Game theoretic semantics = |
||

+ | |||

+ | Matteo Capelletti's [http://www.let.uu.nl/users/Matteo.Capelletti/personal/Home.html home page] contains a parser based on the Non-associative Lambek calculus. It supports hypothetical reasoning and Montague style semantics. |
||

+ | |||

+ | === Tree Adjoining Grammar === |
||

+ | |||

+ | * See [http://trac.loria.fr/~geni GenI], mentioned above. |
||

+ | |||

+ | == Game theoretic semantics == |
||

Game theoretic semantics presents an interesting concept of ''truth'' -- in another way than that of Tarski. |
Game theoretic semantics presents an interesting concept of ''truth'' -- in another way than that of Tarski. |
||

Line 39: | Line 94: | ||

Chiaki Ohkura's [http://acl.ldc.upenn.edu/W/W03/W03-1408.pdf The Semantics of Metaphor in the Game Theoretic Semantics with at Least Two Coordination Equilibria] article tries to catch the concept of ''metaphor''. |
Chiaki Ohkura's [http://acl.ldc.upenn.edu/W/W03/W03-1408.pdf The Semantics of Metaphor in the Game Theoretic Semantics with at Least Two Coordination Equilibria] article tries to catch the concept of ''metaphor''. |
||

− | == Relatedness to linear logic == |
+ | === Relatedness to linear logic === |

The Wikipedia article mentions also the relatedness of game theoretic semantics to ''linear logic''. |
The Wikipedia article mentions also the relatedness of game theoretic semantics to ''linear logic''. |
||

[http://homepages.inf.ed.ac.uk/wadler/ Philip Wadler]'s page on [http://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.html linear logic] describes the topic and its relatedness to many concepts concerning Haskell. [http://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.html#lineartaste A taste of linear logic] can serve as an introductory article. |
[http://homepages.inf.ed.ac.uk/wadler/ Philip Wadler]'s page on [http://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.html linear logic] describes the topic and its relatedness to many concepts concerning Haskell. [http://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.html#lineartaste A taste of linear logic] can serve as an introductory article. |
||

− | = Parsing natural languages = |
+ | == Parsing natural languages == |

+ | ===Parsing Natural Language with X-SAIGA parser=== |
||

+ | The goal of the [http://www.cs.uwindsor.ca/~hafiz/proHome.html X-SAIGA] project is to create algorithms and implementations which enable the construction of language processors (recognizers, parsers, interpreters, translators, etc.) to be constructed as modular and efficient embedded executable specifications of grammars. The syntax analysis is done with a set of parser combinators by overcoming some long standing limitations - |
||

+ | # the simple implementations of parser combinators require [[exponential]] time and space when parsing an ambiguous context free grammar. |
||

+ | # like any top-down recursive descent parsing, the conventional parser combinators won't terminate while processing a left-recursive grammar (i.e. <code>s ::= s *> s *> term 'x'|empty</code>). |
||

+ | |||

+ | As a part of the X-SAIGA project's syntax analysis, a [http://cs.uwindsor.ca/~hafiz/p46-frost.pdf recognition algorithm] that accommodates ambiguous grammars with direct [[left recursion|left-recursive]] rules is described by Frost and Hafiz in 2006. The algorithm curtails the otherwise ever-growing left-recursive parse by imposing depth restrictions. That algorithm was extended to a complete [http://cs.uwindsor.ca/~hafiz/iwpt-07.pdf parsing algorithm] to accommodate indirect as well as direct left-recursion in [[polynomial]] time, and to generate compact polynomial-size representations of the potentially-exponential number of parse trees for highly-ambiguous grammars by Frost, Hafiz and Callaghan in 2007. This extended algorithm accommodates indirect left-recursion by comparing its 'computed-context' with 'current-context'. The same authors also [http://cs.uwindsor.ca/~hafiz/PADL_PAPER_FINAL.pdf described their implementation of a set of parser combinators] written in the [[Haskell]] programming language based on the same algorithm in [http://www.ist.unomaha.edu/padl2008/ PADL 08]. The [http://www.cs.uwindsor.ca/~hafiz/proHome.html X-SAIGA] site has more about the algorithms, implementation details experimental results. |
||

+ | |||

+ | ===Monadic Compositional Parsing=== |
||

Gordon J. Pace: [http://www.cs.um.edu.mt/~csaw/CSAW04/Proceedings/08.pdf Monadic Compositional Parsing with Context Using Maltese as a Case Study], see its [http://www.cs.um.edu.mt/~csaw/CSAW04/ context] too. |
Gordon J. Pace: [http://www.cs.um.edu.mt/~csaw/CSAW04/Proceedings/08.pdf Monadic Compositional Parsing with Context Using Maltese as a Case Study], see its [http://www.cs.um.edu.mt/~csaw/CSAW04/ context] too. |
||

+ | |||

+ | == Other functional or Haskell-related approaches to linguistics == |
||

+ | |||

+ | * [http://cs.uwindsor.ca/~richard/PUBLICATIONS/NLI_LFP_SURVEY_DRAFT.pdf A Survey on the Use of Haskell in Natural-Language Processing] (Report by Richard A. Frost). It is also a part of Haskell Communities and Activities Report, [http://www.haskell.org/communities/11-2006/html/report.html Eleventh edition – November 30, 2006]. |
||

+ | * [https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/history.pdf A History of Haskell: Being Lazy With Class] (2007) has a section (11.5 on page 40) with material contributed by Paul Callaghan on applications of Haskell to natural-language processing. Perhaps, there are some projects mentioned there which are not (yet) listed here on this page; so take a look. |
||

+ | * From [http://www.cs.chalmers.se/~aarne/ Aarne Ranta's homepage] |
||

+ | ** [http://www.cs.chalmers.se/~aarne/course-langtech/ Natural Language Technology], with (among others) [http://www.cs.chalmers.se/~aarne/course-langtech/lectures/lectures.html online course slides]. They give huge insights, for example, see the slide example which discusses [[Dependent type#Type theory|the concept of dependent type and Curry Howard isomorphism]] in lingustical context. |
||

+ | * The [http://sanskrit.inria.fr/ZEN/ Zen Computational Linguistics Toolkit] has tools for efficiently processing linguistic data structures, like trees and automata. It's written in Literate O'Caml, though a Haskell port shouldn't be very hard to do. |
||

+ | * The [http://nlpers.blogspot.com/ natural language processing blog] written by [http://www.isi.edu/~hdaume/ Hal Daume III]. |
||

+ | |||

+ | == Specific topics == |
||

+ | |||

+ | === Lojban === |
||

+ | |||

+ | Lojban, an artificial language ([[Lojban|see a separate HaskellWiki page on it with references]].) “Lojban was not designed primarily to be an international language, however, but rather as a linguistic tool for studying and understanding language. Its linguistic and computer applications make Lojban unique among international languages...” (NC:WhLoj, page 15 par 1) |
||

+ | |||

+ | === Continuations in natural languages === |
||

+ | |||

+ | Some phenomena in natural languages can be grasped with the notion of [[continuation]]. For details, see |
||

+ | Chris Barker's paper [http://www.cs.bham.ac.uk/~hxt/cw04/barker.pdf Continuations in Natural Language]. It is quite accessible to non-linguists. |
||

+ | == References == |
||

+ | |||

+ | ;barker2004cnl |
||

+ | :Barker, Chris: [http://www.cs.bham.ac.uk/~hxt/cw04/barker.pdf Continuations in Natural Language] (pdf), 2004 |
||

+ | ;nicholas2003wl |
||

+ | :Nicholas, Nick and Cowan, John (ed.): What is Lojban? [http://www.lojban.org/ Logical Language Group], 2003. Available also [http://www.lojban.org/tiki/tiki-index.php?page=What+Is+Lojban%3F%2C+The+Book&bl online]. |
||

+ | ;frost2006rnl |
||

+ | :Frost, Richard: [http://cs.uwindsor.ca/~richard/PUBLICATIONS/NLI_LFP_SURVEY_DRAFT.pdf Realization of natural language interfaces using lazy functional programming] (pdf), 2006 |
||

+ | ;frost2008pal |
||

+ | :Frost, Richard; Hafiz, Rahmatullah and Callaghan, Paul: [http://cs.uwindsor.ca/~hafiz/PADL_PAPER_FINAL.pdf Parser Combinators for Ambiguous Left-Recursive Grammars.] Proceedings of the 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN. January 2008, San Francisco, USA. |
||

+ | ;xsaiga2008exg |
||

+ | :Frost, Richard; Hafiz, Rahmatullah and Callaghan, Paul: [http://cs.uwindsor.ca/~hafiz/proHome.html X-SAIGA] website - eXecutable SpecificAtIons of GrAmmars. |
||

+ | |||

+ | [[Category:Theoretical foundations]] |

## Latest revision as of 16:44, 1 August 2021

## Portals and other huge resources

Peter Ljunglöf's many publications on natural language processing, parsing, formal semantics. Many of them use Haskell, and there are downloadable Haskell sources too.

Jan van Eijck's page contains a huge amount of materials on logic and language:

- computational linguistics
- logics (e.g. dynamic epistemic modelling)

The Haskell NLP projects provides a mailing list for Haskellers doing NLP work, as well as a community wiki and darcs repository. Come join us!

Natural Language Processing for The Working Programmer is a book that provides an introduction to Haskell and NLP.

There are many Haskell resources, too.

## Tools and libraries

- Cypher is one of the first software program available which generates the metadata representation of natural language input. Cypher produces RDF graph and SeRQL query representations of sentences, clauses, phrases and questions. The Cypher framework provides a set of robust definition languages, which can be used to extend and create grammars and lexicons. Cypher programming is fun to learn and easy to use, and the specifications are designed to allow a novice to quickly and easily build transcoders for processing highly complex sentences and phrases of any natural language, and to cover any vocabulary
- GenI is a surface realiser for Tree Adjoining Grammars. Surface realisation can be seen as the last stage in a natural language generation pipeline. GenI in particular takes an FB-LTAG grammar and an input semantics (a conjunction of first order terms), and produces the set of sentences associated to the input semantics by the grammar. See also Eric Kow's recent publications on it.
- Grammatical Framework (GF) is a compiler and grammatical programming environment written entirely in Haskell, with an interactive interpreter and two GUI interfaces, one written in Fudgets and another written in Java. GF grammars are written in a subset of Haskell and compile into an internal GF format that may be used as embedded parsers in Haskell, parsers in Java (with an embedded Java interpreter gfc2java.jar) and subsequently converted to applets (Gramlets). (GF-Haskell to Java translation is performed through an Open Agent Architecture--the original .NET, see GF OAA.) The GF grammatical formalism handles linguistic entities (morphemes, etc.) using type theory: an approach especially suited to machine translation of controlled natural languages. The Grammar Resource Library, a set of basic grammars for Danish, English, Finnish, French, German, Italian, Norwegian, Russian, Spanish and Swedish, is available as a separate download. GF has been used to translate a fragment of C code to JVM (see GFCC (PDF document)).
- Functional Morphology - a toolkit for morphology development. Has been used for Swedish, Spanish, Urdu and more.
**Saxophone**is a fun translator from German to the Saxon dialect. It is part of the ParallelWeb project which aims at translating Web pages including all of their links.

### Data interfaces

- HWordNet - A Haskell interface to WordNet by Hal Daumé III.
- penn-treebank parser

### Tokenisation

### Evaluation

- nlp-scores provides scoring functions commonly used for evaluation in NLP and IR

### Morphology

- brillig is an incomplete implementation of Brill's transformation rule tagger (see also the chapter in Haskell NLP for the Working Programmer)
- morfette is a tool for supervised learning of inflectional morphology (tags and lemmas); see also delta-h and watch the prolific author's page at http://grzegorz.chrupala.me/#section3

- hiera implements an agglomerative/hierarchical word class clustering algorithm

- HaLeX is a library of datatypes and functions implemented in Haskell that allows us to model, manipulate and animate regular languages as finite state machines (originally for educational purposes)

- bytestring-trie is an efficient and fast way of storing and looking up strings as keys, assigned to values of some arbitrary type

### Learning algorithms

- sequor is a sequence labeler based on Collins's sequence perceptron, useful for Part-of-Speech tagging
- progressive is a multilabel classification model which learns sequentially (online)
- lda is an Online Gibbs sampler for Latent Dirichlet Allocation, useful for probabilistic soft word class induction, see also colada
- vowpal-utils is a wrapper around vw (Utilities for interpreting models produced by Vowpal Wabbit)
- grenade is a dependently typed, practical, and pretty quick neural network library for concise and precise specifications of complex networks in Haskell (using lapack)
- tensorflow has Haskell bindings (Tensorflow is a library by Google for numerical computation using data flow graphs)

### Machine translation

- hs-gizapp is a wrapper around GIZA++
- Grammatical Framework is used for machine translation, among many other things

## Natural language processing and combinatory logic

Combinatory logic contributed to develop powerful theories in linguistics..

### Applicative universal grammar

Now it has got its own HaskellWiki page.

### Categorial grammar

A general summary of modern semantic theories developed in the century is provided by Logical Aspects of Computational Linguistics: an introduction.

Gary Hardegree's portal-rich page provides a lot of materials on logic and linguistics, among them

- The Axiomatic Theory of Truth grasping concepts like truth, quotations, paradoxes, liar's paradox
- Courses ranging from the introductory level to developed topics, e.g. Basic Categorial Grammar.

The Combinatory Categorial Grammar Site contains links, papers (both introductory and developed) and software (OpenNLP open source projects, related to natural language processing, and OpenCCG)

On natural languages relating to combinatory logic, see also

- Mark Steedman's Does Grammar Make Use of Bound Variables?
- Mark Hepple: The Grammar and Processing of Order and Dependency: a Categorial Approach

### Type-Logical Grammar

Matteo Capelletti's home page contains a parser based on the Non-associative Lambek calculus. It supports hypothetical reasoning and Montague style semantics.

### Tree Adjoining Grammar

- See GenI, mentioned above.

## Game theoretic semantics

Game theoretic semantics presents an interesting concept of *truth* -- in another way than that of Tarski.
Its connections to computer science and computer languages is described in Wikipedia's Game semantics article. Merlijn Sevenster's Game theoretical semantics and -logic is a good introductory material too.

Chiaki Ohkura's The Semantics of Metaphor in the Game Theoretic Semantics with at Least Two Coordination Equilibria article tries to catch the concept of *metaphor*.

### Relatedness to linear logic

The Wikipedia article mentions also the relatedness of game theoretic semantics to *linear logic*.
Philip Wadler's page on linear logic describes the topic and its relatedness to many concepts concerning Haskell. A taste of linear logic can serve as an introductory article.

## Parsing natural languages

### Parsing Natural Language with X-SAIGA parser

The goal of the X-SAIGA project is to create algorithms and implementations which enable the construction of language processors (recognizers, parsers, interpreters, translators, etc.) to be constructed as modular and efficient embedded executable specifications of grammars. The syntax analysis is done with a set of parser combinators by overcoming some long standing limitations -

- the simple implementations of parser combinators require exponential time and space when parsing an ambiguous context free grammar.
- like any top-down recursive descent parsing, the conventional parser combinators won't terminate while processing a left-recursive grammar (i.e.
`s ::= s *> s *> term 'x'|empty`

).

As a part of the X-SAIGA project's syntax analysis, a recognition algorithm that accommodates ambiguous grammars with direct left-recursive rules is described by Frost and Hafiz in 2006. The algorithm curtails the otherwise ever-growing left-recursive parse by imposing depth restrictions. That algorithm was extended to a complete parsing algorithm to accommodate indirect as well as direct left-recursion in polynomial time, and to generate compact polynomial-size representations of the potentially-exponential number of parse trees for highly-ambiguous grammars by Frost, Hafiz and Callaghan in 2007. This extended algorithm accommodates indirect left-recursion by comparing its 'computed-context' with 'current-context'. The same authors also described their implementation of a set of parser combinators written in the Haskell programming language based on the same algorithm in PADL 08. The X-SAIGA site has more about the algorithms, implementation details experimental results.

### Monadic Compositional Parsing

Gordon J. Pace: Monadic Compositional Parsing with Context Using Maltese as a Case Study, see its context too.

- A Survey on the Use of Haskell in Natural-Language Processing (Report by Richard A. Frost). It is also a part of Haskell Communities and Activities Report, Eleventh edition – November 30, 2006.
- A History of Haskell: Being Lazy With Class (2007) has a section (11.5 on page 40) with material contributed by Paul Callaghan on applications of Haskell to natural-language processing. Perhaps, there are some projects mentioned there which are not (yet) listed here on this page; so take a look.
- From Aarne Ranta's homepage
- Natural Language Technology, with (among others) online course slides. They give huge insights, for example, see the slide example which discusses the concept of dependent type and Curry Howard isomorphism in lingustical context.

- The Zen Computational Linguistics Toolkit has tools for efficiently processing linguistic data structures, like trees and automata. It's written in Literate O'Caml, though a Haskell port shouldn't be very hard to do.
- The natural language processing blog written by Hal Daume III.

## Specific topics

### Lojban

Lojban, an artificial language (see a separate HaskellWiki page on it with references.) “Lojban was not designed primarily to be an international language, however, but rather as a linguistic tool for studying and understanding language. Its linguistic and computer applications make Lojban unique among international languages...” (NC:WhLoj, page 15 par 1)

### Continuations in natural languages

Some phenomena in natural languages can be grasped with the notion of continuation. For details, see Chris Barker's paper Continuations in Natural Language. It is quite accessible to non-linguists.

## References

- barker2004cnl
- Barker, Chris: Continuations in Natural Language (pdf), 2004
- nicholas2003wl
- Nicholas, Nick and Cowan, John (ed.): What is Lojban? Logical Language Group, 2003. Available also online.
- frost2006rnl
- Frost, Richard: Realization of natural language interfaces using lazy functional programming (pdf), 2006
- frost2008pal
- Frost, Richard; Hafiz, Rahmatullah and Callaghan, Paul: Parser Combinators for Ambiguous Left-Recursive Grammars. Proceedings of the 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN. January 2008, San Francisco, USA.
- xsaiga2008exg
- Frost, Richard; Hafiz, Rahmatullah and Callaghan, Paul: X-SAIGA website - eXecutable SpecificAtIons of GrAmmars.