Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Haskell
Wiki community
Recent changes
Random page
HaskellWiki
Search
Search
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Regular expressions
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Special pages
Page information
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== regex-tdfa === Chris Kuklewicz has just released <code>regex-tdfa</code>, (Tagged Deterministic Finite Automata), a new library that works with GHC, the most recent being ghc-6.10.1. It is POSIX compliant and tested against [http://www2.research.att.com/~astopen/testregex/testregex.html the AT&T tests]. This is available on hackage at [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-tdfa regex-tdfa] and via [http://darcs.haskell.org/packages/regex-unstable/regex-tdfa/ darcs]. The [http://darcs.haskell.org/packages/regex-unstable/regex-tdfa/doc/html/regex-tdfa/Text-Regex-TDFA.html haddock documentation] is also on the darcs site. This uses a tagged DFA like the TRE c-library to provide efficient Posix matching. It also defaults to true Posix submatch capture (including ambiguous *-operator subpatterns), but this extra effort can be disabled. The versions from 0.90 and up use mutable ST arrays to keep track of data during matching and have thus have both decent speed and decent memory performance. Previous versions drove the memory use too high, overworking the garbage collector. The versions from 1.0.0 and up improve the algorithm. The search time is now O(N) for text of length N in the worst case, while still providing correct POSIX capturing and while running in bounded space. By disabling submatch capture (see the <code>captureGroups</code> field of <code>ExecOptions</code>) this library avoids the extra work and should run faster ("non capture" case, this is also used if there are no parenthesis in the regular expression). By running in single line mode (see the <code>CompOptions</code>) and with a leading ^ anchor this library also avoids extra work and should run faster ("front achored" case). Doing both optimization should run faster still. Just testing for a match stops at the shortest found match and should be fast (using matchTest or match/mathM for a Bool output), and this also tries to optimize for the "front anchored" case. The major advantage over pcre is avoidance of exponential blowup for certain patterns: asymptotically, the time required to match a pattern against a string is always linear in length of the string. This O(N) scaling is [http://archive.fo/LUPTs now achieved] even in the worst case and when returning the correct Posix captures. As of version 1.1.1 the following GNU extensions are recognized, all anchors: <pre> \` at beginning of entire text \' at end of entire text \< at beginning of word \> at end of word \b at either beginning or end of word \B at neither beginning nor end of word </pre> The above are controlled by the 'newSyntax' Bool in 'CompOption'.
Summary:
Please note that all contributions to HaskellWiki are considered to be released under simple permissive license (see
HaskellWiki:Copyrights
for details). If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION!
Cancel
Editing help
(opens in new window)
Toggle limited content width