Difference between revisions of "Internationalization of Haskell programs using gettext"

From HaskellWiki
Jump to navigation Jump to search
(First version of text, copied from http://progandprog.blogspot.com/2009/03/i18n-and-haskell.html)
 
m (Internationalization of Haskell programs moved to Internationalization of Haskell programs using gettext: Allowing for listing other approaches to i18n on the general page.)
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Most common in the GNU world approach to internationalization (i18n) of software is to use [http://www.gnu.org/software/gettext/ GNU gettext] utilities. In this tutorial we will create simple "Hello world" program, with multilingual support.
+
The approach I'll talk about is based on GNU [http://www.gnu.org/software/gettext/ gettext] utility. All my experience on this utility is taken from internationalizing Python applications. So I adapted this experience to the Haskell world.
   
==Prepare program to internationalization==
+
=== Prepare program for internationalization ===
   
Consider we want to make the following program multilingual (file '''Main.hs'''):
+
Let's start with an example. Suppose that we want to make the following program multilingual:
  +
  +
<haskell>module Main where
   
<haskell>
 
module Main where
 
 
 
import IO
 
import IO
   
Line 16: Line 15:
 
</haskell>
 
</haskell>
   
  +
Using these
First of all, wrap all strings, you want to translate in function <hask>__</hask>:
 
  +
[http://www.gnu.org/software/gettext/manual/gettext.html#Preparing-Strings recomendations],
  +
prepare strings and wrap them to some 'translation' function '__':
  +
  +
<haskell>module Main where
   
<haskell>
 
module Main where
 
 
 
import IO
 
import IO
  +
import Text.Printf
   
 
__ = id
 
__ = id
Line 28: Line 29:
 
putStrLn (__ "Please enter your name:")
 
putStrLn (__ "Please enter your name:")
 
name <- getLine
 
name <- getLine
putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")
+
printf (__ "Hello, %s, how are you?") name
 
</haskell>
 
</haskell>
  +
We will return to the definition of '__' a bit later; for now we will leave the function empty (<hask>id</hask>).
   
  +
=== Translate ===
   
  +
The next step is to generate a POT file (a template which contains all strings to needed to be translated). For Python, C, C++ and Scheme there is the xgettext utility, but it doesn't support Haskell. So I created simple utility, that does the same thing for haskell files --- '''hgettext'''. You could find it on Hackage.
We will return to the definition of <hask>__</hask> a bit later, now live this function empty (<hask>id</hask>)
 
   
  +
Now, from the directory that contains your project, run this command:
==Translate==
 
   
  +
<pre>hgettext -k __ -o messages.pot Main.hs</pre>
The next step is to generate POT file (template, which contain all strings to needed to be translated). For Python, C, C++ and Scheme languages there is xgettext utility, but it doesn't support Haskell. On [[Hackage]] you could download [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hgettext hgettext] library and utility, which process haskell source files in the same way as xgettext C/C++ files:
 
  +
It will gather all strings containing the function '__' from the Main.hs and write everything to messages.pot.
 
<tt> cabal install --global hgettext </tt>
 
 
Now run from the directory, where your project is:
 
 
<tt>hgettext -k __ -o messages.pot Main.hs</tt>
 
 
Shortly, it gather all strings marked by function <tt>__</tt> from the <tt>Main.hs</tt> and writes everything to <tt>messages.pot</tt>.
 
   
 
Now look at the resulting pot file:
 
Now look at the resulting pot file:
   
  +
<pre># Translation file
<tt>
 
   
<pre>
 
# Translation file
 
 
 
msgid ""
 
msgid ""
 
msgstr ""
 
msgstr ""
  +
 
 
"Project-Id-Version: PACKAGE VERSION\n"
 
"Project-Id-Version: PACKAGE VERSION\n"
 
"Report-Msgid-Bugs-To: \n"
 
"Report-Msgid-Bugs-To: \n"
Line 71: Line 64:
   
 
#: Main.hs:0
 
#: Main.hs:0
msgid "Hello, "
+
msgid "Hello, %s, how are you?\n"
msgstr ""
 
 
#: Main.hs:0
 
msgid ", how are you?"
 
 
msgstr ""
 
msgstr ""
 
</pre>
 
</pre>
  +
We are interested in the last part of this file -- the parts beginning with <tt>#: Main.hs:...</tt>. Each is followed by a pair of lines beginning with <tt>msgid</tt> and <tt>msgstr</tt>. <tt>msgid</tt> is the original text from the code, and <tt>msgstr</tt> is the translated string. Each language should have its own translation file. I will create two translations: German and English.
</tt>
 
 
 
We are interested in the bottom part of this file (started from <tt>'#: Main.hs:...'</tt>). Here we can see pairs of lines: <tt>msgid</tt> and <tt>msgstr</tt>: <tt>msgid</tt> is the original text from the code, and <tt>msgstr</tt> is the translaged string. Each language, should have its own translation file. I will create two translations: German and English.
 
 
To create a PO file for specific locale we should use <tt>msginit</tt> utility:
 
 
To generate German translations template run:
 
 
<tt>msginit --input=messages.pot --locale=de.UTF-8</tt>
 
   
  +
To create a PO file for specific locale we should use the <tt>msginit</tt> utility.<br />
And for English translation run:
 
  +
To generate the German translation template run:
   
<tt>msginit --input=messages.pot --locale=en.UTF-8</tt>
+
<pre>msginit --input=messages.pot --locale=de.UTF-8</pre>
  +
And for English translations run:
   
  +
<pre>msginit --input=messages.pot --locale=en.UTF-8</pre>
If we look at the generated files (<tt>en.po</tt> and <tt>de.po</tt>), we will see, that English translation is completelly filled, we have only to edit German PO file. So fill it with following strings:
 
  +
If we look at the generated files (<tt>en.po</tt> and <tt>de.po</tt>), we will see that English translation is completely filled, only the German PO file needs to be edited. So we fill it with following strings:
   
<tt>
 
 
<pre>
 
<pre>
 
#: Main.hs:0
 
#: Main.hs:0
Line 102: Line 84:
   
 
#: Main.hs:0
 
#: Main.hs:0
msgid "Hello, "
+
msgid "Hello, %s, how are you?\n"
msgstr "Hallo, "
+
msgstr "Hallo, %s, wie geht es Ihnen?\n"
 
#: Main.hs:0
 
msgid ", how are you?"
 
msgstr ", wie geht es Ihnen?"
 
 
</pre>
 
</pre>
  +
=== Install translation files ===
</tt>
 
   
  +
Now we have to create directories where these translations should be placed. Originally all translation files are placed in the folder <tt>/usr/share/locale/</tt> , but you are free to select a different place. Run:
==Install translation files==
 
   
  +
<pre>mkdir -p {de,en}/LC_MESSAGES</pre>
Now we have to create directories, where these translations should be placed. Originally all translation files are places on <tt>/usr/share/locale/</tt> folder, but we are free to select different place. Run:
 
  +
This will create two sub-directories 'de' and 'en', each containing <tt>LC_MESSAGES</tt>, in the current directory. Now we use the <tt>msgfmt</tt> tool to encode our po files to mo files (binary translation files):
   
<tt>mkdir -p {de,en}/LC_MESSAGES</tt>
+
<pre>msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
  +
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po</pre>
  +
=== Turn on internationalization in the code ===
   
  +
Ok, now the preparatory tasks are done. The final step is to modify the code to support the internationalization:
It will create two directories <tt>de</tt> and <tt>en</tt>, that contain <tt>LC_MESSAGES</tt>, in the current directory. Now use <tt>msgfmt</tt> tool, to encode our <tt>po</tt> files to <tt>mo</tt> files (binary translation files):
 
   
  +
<haskell>module Main where
<tt>
 
<pre>
 
msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
 
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po
 
</pre>
 
</tt>
 
 
==Enable internationalization in the code==
 
 
As the final step we have to modify code, to support the internationalization:
 
 
<haskell>
 
module Main where
 
 
 
 
import IO
 
import IO
Line 148: Line 117:
 
putStrLn (__ "Please enter your name:")
 
putStrLn (__ "Please enter your name:")
 
name <- getLine
 
name <- getLine
putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")
+
printf (__ "Hello, %s, how are you?\n") name
 
</haskell>
 
</haskell>
 
 
 
Here we added three initialization strings:
 
Here we added three initialization strings:
   
<haskell>
+
<haskell>setLocale LC_ALL (Just "")
  +
bindTextDomain "hello" "."
setLocale LC_ALL (Just "")
 
bindTextDomain "hello" "."
+
textDomain "hello" </haskell>
  +
You'll have to download the <tt>setlocale</tt> package to enable the first function: it sets the current locale to the default value. The next two functions tell <tt>gettext</tt> to take the "hello.mo" message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).
textDomain "hello"
 
</haskell>
 
 
 
The first one (you'll have to download [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/setlocale setlocale] package to enable this function), sets the current locale to default value. Next two functions tells <tt>gettext</tt> to take '''"hello.mo"''' message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).
 
   
The final step define function <hask>__</hask>. It simply call <hask>getText</hask> from the module <hask>Text.I18N.GetText</hask>, but its type is <hask>String -> IO String</hask> so here is used <hask>unsafePerformIO</hask> to make it call more simpler.
+
The final step is to define the function '__'. It simply calls <hask>getText</hask> from the module <hask>Text.I18N.GetText</hask>. Its type is <hask>String -> IO String</hask> so I used <hask>unsafePerformIO</hask> to make it simpler the.
   
==Run the program==
+
=== Run and test the program ===
   
Now you can build and try this program in different locales:
+
Now you can build and try the program in different locales:
   
  +
<pre>user> ghc --make Main.hs
<tt>
 
  +
[1 of 1] Compiling Main ( Main.hs, Main.o )
<pre>
 
user> ghc --make Main.hs
 
[1 of 1] Compiling Main ( Main.hs, Main.o )
 
 
Linking Main ...
 
Linking Main ...
   
Line 185: Line 146:
 
Hallo, Bond, wie geht es Ihnen?
 
Hallo, Bond, wie geht es Ihnen?
   
user>
+
user></pre>
  +
  +
=== Distribute internationalized cabal package ===
  +
  +
From the version 0.1.5 of [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hgettext hgettext] package, there is included module, that teaches ''Cabal'' to install language files.
  +
  +
==== Create directory structure ====
  +
  +
Currently we have following files:
  +
  +
<dl>
  +
<dt><tt>Main.hs</tt></dt>
  +
<dd><pre>The `hello` program itself.</pre></dd>
  +
<dt><tt>messages.pot</tt></dt>
  +
<dd><pre>Template file, which contain all strings to be translated. This file
  +
should be included into the distribution to allow other users to
  +
generate translation file for their language.</pre></dd>
  +
<dt><tt>en.po, de.po</tt></dt>
  +
<dd><pre>Translations to the English and German
  +
languages. These files should be installed to the `locale` folder and
  +
our program has to be able to find them (has to know where they going
  +
to be installed)</pre></dd></dl>
  +
  +
Any other files could be generated from the previous, so they shouldn't be included to the distribution package.
  +
  +
Let's create the directory structure for our project. This is simple project, so directory structure should be simple too. Here it is:
  +
  +
<pre>hello\
  +
|
  +
|-po\
  +
| |
  +
| |-messages.pot
  +
| |-en.po
  +
| |-de.po
  +
|
  +
|-src\
  +
|
  +
|-Main.hs
 
</pre>
 
</pre>
  +
==== Create install script ====
</tt>
 
   
  +
In order to create a cabal package, we have to add only two files. The first is <tt>hello.cabal</tt>:
==Distribute internationalized cabal package==
 
  +
  +
<pre>Name: hello
  +
Version: 0.1.3
  +
Cabal-Version: >= 1.6
  +
  +
License: BSD3
  +
  +
Author: James Bond
  +
Maintainer: James.Bond@MI6.bi
  +
Copyright: 2009 James Bond
  +
Category: Hello
  +
  +
Synopsis: Internationalized Hello sample
  +
Build-Type: Simple
  +
  +
Extra-Source-Files: po/*.po po/*.pot
  +
  +
x-gettext-po-files: po/*.po
  +
x-gettext-domain-name: hs-hello
  +
  +
Executable hello
  +
Main-Is: Main.hs
  +
Hs-Source-Dirs: src
  +
Build-Depends: base,hgettext >= 0.1.5, setlocale
  +
</pre>
  +
This is standard <tt>.cabal</tt> file, but there we added two more lines:
  +
  +
<dl>
  +
<dt><tt>x-gettext-po-files</tt></dt>
  +
<dd><pre>Tells cabal where ar PO files to install</pre></dd>
  +
<dt><tt>x-gettext-domain-name</tt></dt>
  +
<dd><pre>Sets the domain name, under which files will be installed </pre></dd></dl>
  +
  +
For other details see documentation for [http://hackage.haskell.org/packages/archive/hgettext hgettext] <hask>Distribution.Simple.I18N.GetText</hask> module.
  +
  +
Note that we also enumerated <tt>*.po</tt> files in the <tt>extra-source-files</tt> section to add them to the distribution package.
  +
  +
The second file to create --- <tt>Setup.hs</tt>:
  +
  +
<haskell>import Distribution.Simple.I18N.GetText
  +
  +
main = gettextDefaultMain
  +
</haskell>
  +
The <hask>gettextDefaultMain</hask> function substitutes the <tt>defaultMain</tt> function, but also adds several install hooks to the cabal package, to handle internationalization stuff.
  +
  +
==== Update the program code ====
  +
  +
So our installer knows where to put the <tt>*.po</tt> files and the domain name for them. Our code should know it too --- to make proper initialization. It is not Haskell way to duplicate same information twice, so let's modify the code to get the initialization parameters directly from the installer:
  +
  +
<haskell>module Main where
  +
  +
import Text.Printf
  +
import Text.I18N.GetText
  +
import System.Locale.SetLocale
  +
import System.IO.Unsafe
  +
  +
__ :: String -> String
  +
__ = unsafePerformIO . getText
  +
  +
main = do
  +
setLocale LC_ALL (Just "")
  +
bindTextDomain __MESSAGE_CATALOG_DOMAIN__ (Just __MESSAGE_CATALOG_DIR__)
  +
textDomain __MESSAGE_CATALOG_DOMAIN__
  +
  +
putStrLn (__ "Please enter your name:")
  +
name <- getLine
  +
printf (__ "Hello, %s, how are you?\n") name
  +
</haskell>
  +
So, the only lines were changed are:
  +
  +
<haskell> bindTextDomain __MESSAGE_CATALOG_DOMAIN__ (Just __MESSAGE_CATALOG_DIR__)
  +
textDomain __MESSAGE_CATALOG_DOMAIN__</haskell>
  +
Nice. <hask>__MESSAGE_CATALOG_DOMAIN__</hask> and <hask>__MESSAGE_CATALOG_DIR__</hask> are macro definitions, whose hold configured strings from the Cabal.
  +
  +
==== Build, install and run ====
  +
  +
Now you could configure, build and install newly created package by invoking commands:
  +
  +
<pre>runhaskell Setup.hs configure
  +
runhaskell Setup.hs build
  +
runhaskell Setup.hs install</pre>
  +
And test it:
  +
  +
<pre>user> LOCALE=en_US.UTF-8 hello
  +
Please enter your name:
  +
Bond
  +
Hello, Bond, how are you?
  +
  +
user> LOCALE=de_DE.UTF-8 hello
  +
Wie heißen Sie?
  +
Bond
  +
Hallo, Bond, wie geht es Ihnen?
   
  +
user></pre>
TBD
 

Revision as of 09:45, 3 October 2011

The approach I'll talk about is based on GNU gettext utility. All my experience on this utility is taken from internationalizing Python applications. So I adapted this experience to the Haskell world.

Prepare program for internationalization

Let's start with an example. Suppose that we want to make the following program multilingual:

module Main where

import IO 

main = do
  putStrLn "Please enter your name:"
  name <- getLine
  putStrLn $ "Hello, " ++ name ++ ", how are you?"

Using these recomendations, prepare strings and wrap them to some 'translation' function '__':

module Main where

import IO 
import Text.Printf

__ = id

main = do
  putStrLn (__ "Please enter your name:")
  name <- getLine
  printf (__ "Hello, %s, how are you?") name

We will return to the definition of '__' a bit later; for now we will leave the function empty (id).

Translate

The next step is to generate a POT file (a template which contains all strings to needed to be translated). For Python, C, C++ and Scheme there is the xgettext utility, but it doesn't support Haskell. So I created simple utility, that does the same thing for haskell files --- hgettext. You could find it on Hackage.

Now, from the directory that contains your project, run this command:

hgettext -k __ -o messages.pot Main.hs

It will gather all strings containing the function '__' from the Main.hs and write everything to messages.pot.

Now look at the resulting pot file:

# Translation file

msgid ""
msgstr ""

"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-13 06:05-0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME \n"
"Language-Team: LANGUAGE \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: Main.hs:0
msgid "Please enter your name:"
msgstr ""

#: Main.hs:0
msgid "Hello, %s, how are you?\n"
msgstr ""

We are interested in the last part of this file -- the parts beginning with #: Main.hs:.... Each is followed by a pair of lines beginning with msgid and msgstr. msgid is the original text from the code, and msgstr is the translated string. Each language should have its own translation file. I will create two translations: German and English.

To create a PO file for specific locale we should use the msginit utility.
To generate the German translation template run:

msginit --input=messages.pot --locale=de.UTF-8

And for English translations run:

msginit --input=messages.pot --locale=en.UTF-8

If we look at the generated files (en.po and de.po), we will see that English translation is completely filled, only the German PO file needs to be edited. So we fill it with following strings:

#: Main.hs:0
msgid "Please enter your name:"
msgstr "Wie heißen Sie?"

#: Main.hs:0
msgid "Hello, %s, how are you?\n"
msgstr "Hallo, %s, wie geht es Ihnen?\n"

Install translation files

Now we have to create directories where these translations should be placed. Originally all translation files are placed in the folder /usr/share/locale/ , but you are free to select a different place. Run:

mkdir -p {de,en}/LC_MESSAGES

This will create two sub-directories 'de' and 'en', each containing LC_MESSAGES, in the current directory. Now we use the msgfmt tool to encode our po files to mo files (binary translation files):

msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po

Turn on internationalization in the code

Ok, now the preparatory tasks are done. The final step is to modify the code to support the internationalization:

module Main where
    
import IO 
import Text.I18N.GetText
import System.Locale.SetLocale
import System.IO.Unsafe

__ :: String -> String
__ = unsafePerformIO . getText

main = do
  setLocale LC_ALL (Just "") 
  bindTextDomain "hello" "." 
  textDomain "hello" 

  putStrLn (__ "Please enter your name:")
  name <- getLine
  printf (__ "Hello, %s, how are you?\n") name

Here we added three initialization strings:

setLocale LC_ALL (Just "")
bindTextDomain "hello" "."
textDomain "hello"

You'll have to download the setlocale package to enable the first function: it sets the current locale to the default value. The next two functions tell gettext to take the "hello.mo" message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).

The final step is to define the function '__'. It simply calls getText from the module Text.I18N.GetText. Its type is String -> IO String so I used unsafePerformIO to make it simpler the.

Run and test the program

Now you can build and try the program in different locales:

user> ghc --make Main.hs
[1 of 1] Compiling Main         ( Main.hs, Main.o )
Linking Main ...

user> LOCALE=en_US.UTF-8 ./Main
Please enter your name:
Bond
Hello, Bond, how are you?

user> LOCALE=de_DE.UTF-8 ./Main
Wie heißen Sie?
Bond
Hallo, Bond, wie geht es Ihnen?

user>

Distribute internationalized cabal package

From the version 0.1.5 of hgettext package, there is included module, that teaches Cabal to install language files.

Create directory structure

Currently we have following files:

Main.hs
The `hello` program itself.
messages.pot
Template file, which contain all strings to be translated. This file
should be included into the distribution to allow other users to
generate translation file for their language.
en.po, de.po
Translations to the English and German
languages. These files should be installed to the `locale` folder and
our program has to be able to find them (has to know where they going
to be installed)

Any other files could be generated from the previous, so they shouldn't be included to the distribution package.

Let's create the directory structure for our project. This is simple project, so directory structure should be simple too. Here it is:

hello\
   |
   |-po\
   |  |
   |  |-messages.pot
   |  |-en.po
   |  |-de.po
   |
   |-src\
       |
       |-Main.hs

Create install script

In order to create a cabal package, we have to add only two files. The first is hello.cabal:

Name:                   hello
Version:                0.1.3
Cabal-Version:          >= 1.6

License:                BSD3

Author:                 James Bond
Maintainer:             James.Bond@MI6.bi
Copyright:              2009 James Bond
Category:               Hello

Synopsis:               Internationalized Hello sample
Build-Type:             Simple

Extra-Source-Files:     po/*.po po/*.pot

x-gettext-po-files:     po/*.po 
x-gettext-domain-name:  hs-hello

Executable hello
        Main-Is:                Main.hs
        Hs-Source-Dirs:         src      
        Build-Depends:          base,hgettext >= 0.1.5, setlocale

This is standard .cabal file, but there we added two more lines:

x-gettext-po-files
Tells cabal where ar PO files to install
x-gettext-domain-name
Sets the domain name, under which files will be installed 

For other details see documentation for hgettext Distribution.Simple.I18N.GetText module.

Note that we also enumerated *.po files in the extra-source-files section to add them to the distribution package.

The second file to create --- Setup.hs:

import Distribution.Simple.I18N.GetText

main = gettextDefaultMain

The gettextDefaultMain function substitutes the defaultMain function, but also adds several install hooks to the cabal package, to handle internationalization stuff.

Update the program code

So our installer knows where to put the *.po files and the domain name for them. Our code should know it too --- to make proper initialization. It is not Haskell way to duplicate same information twice, so let's modify the code to get the initialization parameters directly from the installer:

module Main where

import Text.Printf
import Text.I18N.GetText
import System.Locale.SetLocale
import System.IO.Unsafe

__ :: String -> String
__ = unsafePerformIO . getText

main = do
  setLocale LC_ALL (Just "") 
  bindTextDomain __MESSAGE_CATALOG_DOMAIN__ (Just __MESSAGE_CATALOG_DIR__)
  textDomain __MESSAGE_CATALOG_DOMAIN__

  putStrLn (__ "Please enter your name:")
  name <- getLine
  printf (__ "Hello, %s, how are you?\n") name

So, the only lines were changed are:

  bindTextDomain __MESSAGE_CATALOG_DOMAIN__ (Just __MESSAGE_CATALOG_DIR__)
  textDomain __MESSAGE_CATALOG_DOMAIN__

Nice. __MESSAGE_CATALOG_DOMAIN__ and __MESSAGE_CATALOG_DIR__ are macro definitions, whose hold configured strings from the Cabal.

Build, install and run

Now you could configure, build and install newly created package by invoking commands:

runhaskell Setup.hs configure
runhaskell Setup.hs build
runhaskell Setup.hs install

And test it:

user> LOCALE=en_US.UTF-8 hello
Please enter your name:
Bond
Hello, Bond, how are you?

user> LOCALE=de_DE.UTF-8 hello
Wie heißen Sie?
Bond
Hallo, Bond, wie geht es Ihnen?

user>