Internationalization of Haskell programs

From HaskellWiki

Approaches to internationalization in Haskell

There are several different approaches you can use to internationalize your Haskell program.

Using GNU gettext

You can internationalize your program using GNU gettext and its Haskell bindings package hgettext.

Set up your translations and integrate them into your application using these instructions.

Using native Haskell data types

The Yesod web framework takes this approach but provides a more translator-friendly format.

You can internationalize your program using native Haskell data types.

Represent the individual texts to be translated as constructors of a Haskell data type. Then provide a function that automatically renders the texts appropriately in the current language context.

See this description of a simple example of using Haskell data types for internationalization.

Using the Grammatical Framework

You can internationalize your program using the Grammatical Framework (GF).

The GF provides a way to define human-language-independent syntax for expressing texts in Haskell. The GF can then render the texts automatically in any human language for which an appropriate GF grammer exists.

See this very simple example of an application internationalized using GF. It is based on the "Foods" grammar (included in the example), so it's quite contrived, but it should be enough to get you started. Usage instructions are in the file README.md.

See the GF download page for information about how to install GF. The standard installation of GF currently includes grammars for at least 26 languages.

Comparison of approaches to internationalization

GNU gettext

Advantages:

  • Easiest integration with other tools and programming languages
  • Little or no specialized knowledge required of translators
  • Little or no interaction needed between programmers and translators
  • Well-known and well-documented

Disadvantages:

  • Inflexible use of static text literals creates awkwardness when there are complex differences between how the same idea is expressed in different languages
  • Translation selection happens at runtime, so there is no type safety
  • Texts are loaded from external files at runtime, which creates overhead and deployment issues
  • Requires a moderate amount of work to set up and integrate
  • Not well supported on MS Windows

Native Haskell data types

Advantages:

  • Easy to implement in Haskell
  • Compile-time type safety
  • Fast
  • Flexible in handling complex differences between languages
  • Flexible in implementation: e.g., use a type class if you don't want one big data type, use Text or Builder instead of String
  • Platform independent

Disadvantages:

  • May require some training of translators and/or cooperative integration work between translators and programmers, depending on the level of sophistication needed in the rendering functions
  • May lead to many string literals in a Haskell source file. This requires a work-around for a current limitation of the GHC compiler; see file-embed, below.

Yesod's Native Data type approach

Yesod's approach provides a translator-friendly veneer that gets rid of the above disadvantages. It is integrated with the Yesod web framework and its use of the Hamlet template language.

A portion of it has been abstracted out into the http://hackage.haskell.org/package/shakespeare-i18n package.

Grammatical Framework

Advantages:

  • Translations automatically generated in all languages for which GF grammars exist, without the need for human translators
  • High quality translations
  • Platform independent

Disadvantages:

  • Learning curve for the programmer to express texts using existing GF grammars, and to extend GF grammars as needed
  • Extra work needed if you must support languages that do not yet have a GF grammar
  • Installing GF is not quite as simple as installing the usual Haskell package
  • A human translator may still be needed for domain-specific words and expressions not included in the standard GF grammars

Other tools

The following tools may also be useful when internationalizing your Haskell program:

file-embed

Internationalization often leads to a large number of literal strings in a Haskell source file. This creates a technical problem due to a current limitation of the GHC compiler - the GHC compiler does not behave well when compiling a source file with a large number of literal strings.

One classic work-around to this problem is to place the literal strings in a small C library and import them via the FFI.

Another work-around, based on Template Haskell, is provided by the file-embed package.

numerals

The numerals package renders numbers (currently only cardinal numbers) as text in many different languages.