Difference between revisions of "BerkeleyDBXML"

From HaskellWiki
Jump to navigation Jump to search
m
(Deleting page that hasn't been updated for over 10 years)
Line 1: Line 1:
== Introduction ==
 
 
If you are using Berkeley DB, and not Berkeley DB XML, then please skip to the Berkeley DB section.
 
 
Berkeley DB XML is a powerful, fully transactional, XML-based database that uses
 
[http://www.w3.org/XML/Query/ XQuery] (a W3C standard) as its query language. (Berkeley DB XML does NOT use SQL.)
 
 
This page is an introduction/tutorial on writing a multi-threaded Berkeley DB XML application in Haskell. It is intended for Haskell programmers who are new to Berkeley DB XML.
 
 
I hope you will consider the advantages of using an XML database over the traditional SQL database in your application. However, note that ''Berkeley DB and DB XML are non-free for commercial use''.
 
 
=== Obtaining and building the packages ===
 
 
Downloads
 
* [http://www.oracle.com/database/berkeley-db/xml/index.html Berkeley DB XML]
 
* [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/BerkeleyDB Haskell binding for Berkeley DB]
 
* [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/BerkeleyDBXML Haskell binding for Berkeley DB XML]
 
 
Berkeley DB XML is easy to build. On Unix, the ./buildall.sh script will build everything
 
for you, including Berkeley DB, and put the resulting image into the 'install'
 
directory. You can then copy this directory's contents to an install location
 
of your choice.
 
 
On a GNU/Linux system, you may want to add the 'lib' directory of this install
 
location under /etc/ld.so.conf.d/ then run "ldconfig". This will allow the system
 
to find the Berkeley DB XML libraries. If you don't do this, you will have to
 
set the environment variable LD_LIBRARY_PATH.
 
 
If you are using a Unix system, your system may already have a sufficiently recent
 
version of Berkeley DB. In this case, it is better to use this and build Berkeley
 
DB XML only. The commands for this are as follows:
 
 
<pre>
 
./buildall.sh --build-one=xerces
 
./buildall.sh --build-one=xqilla
 
./buildall.sh --build-one=dbxml --with-berkeleydb-prefix=/usr
 
</pre>
 
 
To test your installation, see if you can run the 'dbxml' command from the install
 
image's bin directory. This is an interactive utility that allows you to run
 
database queries and view the results.
 
 
The Berkeley DB XML binding for Haskell is a standard Cabal package. Its README
 
file gives installation instructions.
 
 
=== The binding ===
 
 
This tutorial uses a Haskell binding for DB XML that sticks closely to Berkeley
 
DB XML's C++ interface, so we are programming at a fairly low level.
 
 
DB XML would lend itself to the development of higher-level wrappers. For
 
example, someone could write a drop-in replacement for STM (Software
 
Transactional Memory) that uses DBXML to give persistent storage.
 
 
=== Adventure game example ===
 
 
In the Berkeley DB XML binding distribution, you will find a tiny adventure game
 
under examples/adventure/. This
 
tutorial will refer to the code in this example.
 
 
Note that this game is multi-user and the game world, including player locations,
 
is stored persistently, so it survives a re-start of the adventure server.
 
 
Here is an example session:
 
 
<pre>
 
blackh@amentet:~/temp/BerkeleyDBXML-0.5/examples/adventure$ ./adventure
 
Adventure server - please telnet into port 1888
 
tidying up 0 cadavers
 
Creating the game world...
 
</pre>
 
 
<pre>
 
blackh@amentet:~$ telnet localhost 1888
 
Trying 127.0.0.1...
 
Connected to localhost.
 
Escape character is '^]'.
 
Welcome to 'DB/XML Haskell binding' adventure by Stephen Blackheath
 
Please enter your name.
 
> Stephen
 
Welcome for the first time, Stephen.
 
 
For help, please type "help".
 
 
You are on a wide, white sandy beach. A bright blue ocean stretches to the
 
horizon. Along the beach to the north you can see some large rocks. There is
 
thick jungle to the west.
 
You can see
 
a starfish
 
> get starfish
 
You pick up a starfish.
 
> west
 
You are in a dense jungle.
 
You can see
 
a tall, twisty tree
 
> drop starfish
 
You drop a starfish.
 
> look
 
You are in a dense jungle.
 
You can see
 
a starfish
 
a tall, twisty tree
 
>
 
</pre>
 
 
== Berkeley DB ==
 
 
DB XML is built on top of Berkeley DB. This section describes the concepts
 
specific to DB.
 
 
=== DB Environment ===
 
 
Berkeley DB is not client-server like most SQL databases. It accesses its
 
database files in a local directory.
 
 
An "environment" is a directory with various odd-looking files such as
 
__db.001 and log.0000000001, as well as various database files that your
 
application has created. An application would normally have only one
 
environment, where it would store all its databases. Database transactions
 
operate within an environment, so a single transaction can span multiple
 
database files in the same environment. This also means that you can update
 
both Berkeley DB files and Berkeley DB XML files in a single transaction.
 
 
When DB_THREAD is enabled, the environment will work safely with multi-threaded
 
applications, and also multiple processes accessing the databases at the same
 
time. This means you can run the 'dbxml' command-line utility while your
 
application is running.
 
 
When the DB_INIT_LOG flag is enabled, the environment contains a transaction log that
 
forms part of the databases. Do not delete these files, or you will
 
corrupt your databases. Also when DB_INIT_LOG is enabled, you cannot move your database files
 
from one environment to another. The recommended way to do this is to use Berkeley's
 
dbxml_dump/dbxml_load for DB XML files and db_dump/db_load for DB files.
 
 
It is safe, however, to delete databases you have created without deleting the
 
environment. The environment will detect this and adjust accordingly. You can,
 
of course, start with a clean slate by deleting all the environment files and
 
databases.
 
 
A production application must periodically call dbEnv_txn_checkpoint to clear old
 
data from the log.* files. (The adventure game example does not do this.)
 
 
Here is some example code which will open an existing environment, or create it
 
if it doesn't exist. These are the flags to use for a transactional, multi-threaded
 
application:
 
 
<haskell>
 
dbenv <- dbEnv_create []
 
 
-- Enable automatic deadlock detection.
 
dbEnv_set_lk_detect dbenv DB_LOCK_DEFAULT
 
 
dbEnv_open dbenv "." [DB_CREATE,DB_INIT_LOCK,DB_INIT_LOG,DB_INIT_MPOOL,
 
DB_INIT_TXN,DB_THREAD,DB_RECOVER] 0
 
</haskell>
 
 
=== Deadlock detection ===
 
 
Berkeley DB will automatically detect deadlocks for you, allowing you to
 
re-start the deadlocked transaction. Because of the way Berkeley DB has been engineered, deadlock detection is '''not optional''' in
 
multi-threaded applications. It is absolutely impossible to avoid deadlocks by
 
the traditional method of carefully controlling the order of locking, because Berkeley DB will lock whole pages, which means it will unpredictably lock more than you told it to.
 
 
Your application needs one and only one lock detector thread or process running per environment.
 
dbEnv_set_lk_detect is an easy way to spawn one such thread. See the Berkeley DB
 
documentation for other ways.
 
 
If your application has more than one process, you can't do it the way this example does it. You
 
would need to manage things so only one lock detector was running.
 
 
Because of deadlock detection, your code must detect deadlocks and re-start the
 
transaction if they are found. Here is some code to do this:
 
 
<haskell>
 
-- Execute the specified code within a database transaction, automatically
 
-- re-trying if a deadlock is detected.
 
inTransaction :: XmlManager -> (XmlTransaction -> IO a) -> IO a
 
inTransaction mgr code = inTransaction_ mgr code 0
 
where
 
inTransaction_ mgr code retryCount = do
 
trans <- xmlManager_createTransaction mgr []
 
catch
 
(do
 
result <- code trans
 
xmlTransaction_commit trans
 
return result
 
)
 
(\exc -> do
 
hPutStrLn stderr $ "EXCEPTION "++show exc
 
xmlTransaction_abort trans
 
case fromException exc of
 
Just (DbException _ DB_LOCK_DEADLOCK) | retryCount < 20 -> do
 
hPutStrLn stderr "<<retry deadlocked thread>>"
 
inTransaction_ mgr code (retryCount+1)
 
_ -> throwIO exc)
 
</haskell>
 
 
Remember that the code above pre-supposes that you have started a deadlock detector. If this hasn't happened, the application will stall and never throw DB_LOCK_DEADLOCK.
 
 
Because your transaction can be re-started, you should not do any normal I/O inside your transaction. It would be even better if (like in Software Transactional Memory) the transactional code runs in a monad of its own that prevents normal access to the IO monad.
 
 
=== Environment recovery ===
 
 
Before you start your application, you must run a database recovery to return
 
the database to a consistent state, in case of a dirty shutdown. This can either
 
be done with the db_recover command line utility, or by specifying the DB_RECOVER
 
flag to dbEnv_open.
 
 
An environment recovery must run without any other processes accessing the database environment.
 
Therefore it must be performed before you start your application.
 
 
Because we are using the DB_RECOVER flag to do our recovery, we could not run multiple processes of 'adventure'
 
at the same time unmodified. If we wanted this application to work with
 
multiple processes, both the DB_RECOVER flag and the dbEnv_set_lk_detect
 
call would need to be removed and run separately before the application was
 
started.
 
 
== Berkeley DB XML ==
 
 
All the important topics are covered in "Getting Started with
 
Berkeley DB XML" guide that comes with the Berkeley DB XML distribution, so I will only cover more Haskell-specific things here.
 
 
Berkeley DBXML returns its document contents as a strict ByteString containing XML text. You need to use an XML library of some kind to handle these. The Haskell binding leaves you free to choose your own XML library. (I am also the developer of the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hexpat hexpat] and [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hexpat-pickle hexpat-pickle] packages, so you might consider using them since I designed them to fit nicely with BerkeleyDBXML.)
 
 
Please take a look at the source code of the adventure example included with the DB XML binding distribution. Here are some examples from it:
 
 
=== Example 1: Querying documents ===
 
 
Here is an example that covers a lot of ground: The "query" function from the adventure game:
 
 
<haskell>
 
collectM :: Monad m => m (Maybe a) -> m [a]
 
collectM valueM = do
 
value <- valueM
 
case value of
 
Just item -> do
 
rest <- collectM valueM
 
return (item:rest)
 
Nothing -> do
 
return []
 
 
query_ :: (XmlManager, XmlContainer, XmlTransaction) -> PU [UNode String] p
 
-> String -> [(String, XmlValue)] -> [DbXmlFlag] -> IO [(XmlDocument, p)]
 
query_ (mgr, cont, trans) pickler queryText params flags = do
 
qctx <- xmlManager_createQueryContext mgr LiveValues Eager
 
let collection = xmlContainer_getName cont
 
xmlQueryContext_setDefaultCollection qctx collection
 
forM params $ \(name, value) -> do
 
xmlQueryContext_setVariableValue qctx name value
 
res <- xmlManager_query mgr (Just trans) queryText qctx flags
 
docs <- collectM (xmlResults_next res)
 
records <- forM docs $ \doc -> do
 
text <- xmlDocument_getContent doc
 
value <- case unpickleXML' defaultParserOptions (xpRoot pickler) text of
 
Left err -> fail $ "unpickle failed: "++err
 
Right value -> return value
 
return (doc, value)
 
return records
 
 
query :: XmlPickler [UNode String] p => (XmlManager, XmlContainer, XmlTransaction) -> PU [UNode String] p
 
-> String -> [(String, XmlValue)] -> IO [p]
 
query ctx pickler queryText params = liftM (map snd) $ query_ ctx pickler queryText params []
 
</haskell>
 
 
The 'query' function is a helper that calls 'query_' and returns the results as Haskell data structures only, discarding the XmlDocument objects. (XmlDocuments are useful as a reference to a document for updating or deleting.)
 
 
Now look at 'query_'. First, we create a query context. This holds the variable assignments used in the XQuery. For example, if we call 'query' like this...
 
 
<haskell>
 
items <- query db xpItem "collection()/item[@location=$loc]"
 
[("loc", xmlString loc)]
 
</haskell>
 
 
...then we push "loc" and its value into the query context, so the XQuery parser can resolve the variable $loc. This query says "give me all documents with a top-level tag of <item> containing a 'location' attribute matching $loc".
 
 
xmlQueryContext_setDefaultCollection allows the XQuery to refer our document container as just "collection()" rather than having to name it explicitly in the XQuery string.
 
 
Then we run the query, and use a helper called 'collectM' to extract the results from the XmlResults object and return them as a list of XmlDocument objects.
 
 
The last step is iterate over the returned documents, using hexpat-pickle's unpickle functionality to translate the XML document into Haskell data structures.
 
 
=== Example 2: Updating a document ===
 
 
<haskell>
 
-- | Query with write lock. Returned document allows the document to be updated
 
-- without having to specify its document name.
 
queryUpdate :: XmlPickler [UNode String] p => (XmlManager, XmlContainer, XmlTransaction) -> PU [UNode String] p
 
-> String -> [(String, XmlValue)] -> IO [(XmlDocument, p)]
 
queryUpdate ctx pickler queryText params = query_ ctx pickler queryText params [DB_FLAG DB_RMW]
 
 
update :: forall p . XmlPickler [UNode String] p =>
 
(XmlManager, XmlContainer, XmlTransaction)
 
-> XmlDocument
 
-> p
 
-> IO ()
 
update (mgr, cont, trans) doc p = do
 
xmlDocument_setContent doc (pickleXML' (xpRoot xpickle :: PU (UNode String) p) p)
 
uctx <- xmlManager_createUpdateContext mgr
 
xmlContainer_updateDocument cont (Just trans) doc uctx
 
</haskell>
 
 
'queryUpdate' works like 'query' in the previous example, except that it sets the DB_RMW flag, which means you get a write (exclusive) lock instead of the default read (non-exclusive) lock. Using a write lock makes no difference to the semantics of the transaction: It is just as atomic if you use a read lock. But, write locks can reduce the probability of transaction re-starts due to deadlocks, and so they improve efficiency when updating.
 
 
A caller would pass the XmlDocument returned by queryUpdate to 'update', along with a modified version of the Haskell data structure p.
 
 
'update' then pickles p, and proceeds to stuff the resulting XML string into the XmlDocument that queryUpdate gave us, and then issues the update.
 
 
=== Unicode ===
 
 
UTF-8 encoding is used throughout to encode Unicode text.
 
All String arguments and return values in the binding are in Unicode, except for XML text, which is returned as 8-bit characters in a String data type. Your XML library will convert this to Unicode for you.
 
 
The function xmlValue_asString is a case where the caller has to make the
 
right choice. xmlValue_asString converts an XmlValue to a Unicode Haskell
 
String. This is appropriate if you are fetching the text contents of an XML
 
tag, for instance.
 
 
However, if you are fetching XML text, you will want to
 
call xmlValue_asString8Bit. This leaves out the conversion from UTF-8 to
 
Unicode, so you can let your XML library convert this to Unicode.
 
 
== Conclusion ==
 
 
I hope this gets you started writing DB XML applications. If you have
 
any questions (so I can improve this page), or wish to report bugs in the Haskell
 
binding, please contact me at [http://blacksapphire.com/antispam/ Stephen Blackheath's anti-spam page].
 
 
--[[User:Blackh|Blackh]] 10:28, 1 October 2008 (UTC)
 

Revision as of 14:47, 6 February 2021