| // |
| // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh) |
| // |
| // Distributed under the Boost Software License, Version 1.0. (See |
| // accompanying file LICENSE_1_0.txt or copy at |
| // http://www.boost.org/LICENSE_1_0.txt) |
| // |
| |
| // vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen |
| /*! |
| \page messages_formatting Messages Formatting (Translation) |
| |
| - \ref messages_formatting_into |
| - \ref msg_loading_dictionaries |
| - \ref message_translation |
| - \ref indirect_message_translation |
| - \ref plural_forms |
| - \ref multiple_gettext_domain |
| - \ref direct_message_translation |
| - \ref extracting_messages_from_code |
| - \ref custom_file_system_support |
| - \ref msg_non_ascii_keys |
| - \ref msg_qna |
| |
| \section messages_formatting_into Introduction |
| |
| Messages formatting is probably the most important part of |
| the localization - making your application speak in the user's language. |
| |
| Boost.Locale uses the <a href="http://www.gnu.org/software/gettext/">GNU Gettext</a> localization model. |
| We recommend you read the general <a href="http://www.gnu.org/software/gettext/manual/gettext.html">documentation</a> |
| of GNU Gettext, as it is outside the scope of this document. |
| |
| The model is following: |
| |
| - First, our application \c foo is prepared for localization by calling the \ref boost::locale::translate() "translate" function |
| for each message used in user interface. |
| \n |
| For example: |
| \code |
| cout << "Hello World" << endl; |
| \endcode |
| Is changed to |
| \n |
| \code |
| cout << translate("Hello World") << endl; |
| \endcode |
| - Then all messages are extracted from the source code and a special \c foo.po file is generated that contains all of the |
| original English strings. |
| \n |
| \verbatim |
| ... |
| msgid "Hello World" |
| msgstr "" |
| ... |
| \endverbatim |
| - The \c foo.po file is translated for the supported locales. For example, \c de.po, \c ar.po, \c en_CA.po , and \c he.po. |
| \n |
| \verbatim |
| ... |
| msgid "Hello World" |
| msgstr "שלום עולם" |
| \endverbatim |
| And then compiled to the binary \c mo format and stored in the following file structure: |
| \n |
| \verbatim |
| de |
| de/LC_MESSAGES |
| de/LC_MESSAGES/foo.mo |
| en_CA/ |
| en_CA/LC_MESSAGES |
| en_CA/LC_MESSAGES/foo.mo |
| ... |
| \endverbatim |
| \n |
| When the application starts, it loads the required dictionaries. Then when the \c translate function is called and the message is written |
| to an output stream, a dictionary lookup is performed and the localized message is written out instead. |
| |
| \section msg_loading_dictionaries Loading dictionaries |
| |
| All the dictionaries are loaded by the \ref boost::locale::generator "generator" class. |
| Using localized strings in the application, requires specification |
| of the following parameters: |
| |
| -# The search path of the dictionaries |
| -# The application domain (or name) |
| |
| This is done by calling the following member functions of the \ref boost::locale::generator "generator" class: |
| |
| - \ref boost::locale::generator::add_messages_path() "add_messages_path" - add the root path to the dictionaries. |
| \n |
| For example: if the dictionary is located at \c /usr/share/locale/ar/LC_MESSAGES/foo.mo, then path should be \c /usr/share/locale. |
| \n |
| - \ref boost::locale::generator::add_messages_domain() "add_messages_domain" - add the domain (name) of the application. In the above case it would be "foo". |
| |
| \note At least one domain and one path should be specified in order to load dictionaries. |
| |
| This is an example of our first fully localized program: |
| |
| \code |
| #include <boost/locale.hpp> |
| #include <iostream> |
| |
| using namespace std; |
| using namespace boost::locale; |
| |
| int main() |
| { |
| generator gen; |
| |
| // Specify location of dictionaries |
| gen.add_messages_path("."); |
| gen.add_messages_domain("hello"); |
| |
| // Generate locales and imbue them to iostream |
| locale::global(gen("")); |
| cout.imbue(locale()); |
| |
| // Display a message using current system locale |
| cout << translate("Hello World") << endl; |
| } |
| \endcode |
| |
| |
| \section message_translation Message Translation |
| |
| There are two ways to translate messages: |
| |
| - using \ref boost_locale_translate_family "boost::locale::translate()" family of functions: |
| \n |
| These functions create a special proxy object \ref boost::locale::basic_message "basic_message" |
| that can be converted to string according to given locale or written to \c std::ostream |
| formatting the message in the \c std::ostream's locale. |
| \n |
| It is very convenient for working with \c std::ostream object and for postponing message |
| translation |
| - Using \ref boost_locale_gettext_family "boost::locale::gettext()" family of functions: |
| \n |
| These are functions that are used for direct message translation: they receive as a parameter |
| an original message or a key and convert it to the \c std::basic_string in given locale. |
| \n |
| These functions have similar names to thous used in the GNU Gettext library. |
| |
| \subsection indirect_message_translation Indirect Message Translation |
| |
| The basic function that allows us to translate a message is \ref boost_locale_translate_family "boost::locale::translate()" family of functions. |
| |
| These functions use a character type \c CharType as template parameter and receive either <tt>CharType const *</tt> or <tt>std::basic_string<CharType></tt> as input. |
| |
| These functions receive an original message and return a special proxy |
| object - \ref boost::locale::basic_message "basic_message<CharType>". |
| This object holds all the required information for the message formatting. |
| |
| When this object is written to an output \c ostream, it performs a dictionary lookup of the message according to the locale |
| imbued in \c iostream. |
| |
| If the message is found in the dictionary it is written to the output stream, |
| otherwise the original string is written to the stream. |
| |
| For example: |
| |
| \code |
| // Translate a simple message "Hello World!" |
| std::cout << boost::locale::translate("Hello World!") << std::endl; |
| \endcode |
| |
| This allows the program to postpone translation of the message until the translation is actually needed, even to different |
| locale targets. |
| |
| \code |
| // Several output stream that we write a message to |
| // English, Japanese, Hebrew etc. |
| // Each one them has installed std::locale object that represents |
| // their specific locale |
| std::ofstream en,ja,he,de,ar; |
| |
| // Send single message to multiple streams |
| void send_to_all(message const &msg) |
| { |
| // in each of the cases below |
| // the message is translated to different |
| // language |
| en << msg; |
| ja << msg; |
| he << msg; |
| de << msg; |
| ar << msg; |
| } |
| |
| int main() |
| { |
| ... |
| send_to_all(translate("Hello World")); |
| } |
| \endcode |
| |
| \note |
| |
| - \ref boost::locale::basic_message "basic_message" can be implicitly converted |
| to an apopriate std::basic_string using |
| the global locale: |
| \n |
| \code |
| std::wstring msg = translate(L"Do you want to open the file?"); |
| \endcode |
| - \ref boost::locale::basic_message "basic_message" can be explicitly converted |
| to a string using the \ref boost::locale::basic_message::str() "str()" member function for a specific locale. |
| \n |
| \code |
| std::locale ru_RU = ... ; |
| std::string msg = translate("Do you want to open the file?").str(ru_RU); |
| \endcode |
| |
| |
| \subsection plural_forms Plural Forms |
| |
| GNU Gettext catalogs have simple, robust and yet powerful plural forms support. We recommend to read the |
| original GNU documentation <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">here</a>. |
| |
| Let's try to solve a simple problem, displaying a message to the user: |
| |
| \code |
| if(files == 1) |
| cout << translate("You have 1 file in the directory") << endl; |
| else |
| cout << format(translate("You have {1} files in the directory")) % files << endl; |
| \endcode |
| |
| This very simple task becomes quite complicated when we deal with languages other than English. Many languages have more |
| than two plural forms. For example, in Hebrew there are special forms for single, double, plural, and plural above 10. |
| They can't be distinguished by the simple rule "is n 1 or not" |
| |
| The correct solution is to give a translator an ability to choose a plural form on its own. Thus the translate |
| function can receive two additional parameters English plural form a number: <tt>translate(single,plural,count)</tt> |
| |
| For example: |
| |
| \code |
| cout << format(translate( "You have {1} file in the directory", |
| "You have {1} files in the directory", |
| files)) % files << endl; |
| \endcode |
| |
| A special entry in the dictionary specifies the rule to choose the correct plural form in the target language. |
| For example, the Slavic language family has 3 plural forms, that can be chosen using following equation: |
| |
| \code |
| plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; |
| \endcode |
| |
| Such equation is stored in the message catalog itself and it is evaluated during translation to supply the correct form. |
| |
| So the code above would display 3 different forms in Russian locale for values of 1, 3 and 5: |
| |
| \verbatim |
| У вас есть 1 файл в каталоге |
| У вас есть 3 файла в каталоге |
| У вас есть 5 файлов в каталоге |
| \endverbatim |
| |
| And for Japanese that does not have plural forms at all it would display the same message |
| for any numeric value. |
| |
| For more detailed information please refer to GNU Gettext: <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">11.2.6 Additional functions for plural forms</a> |
| |
| |
| \subsection adding_context_information Adding Context Information |
| |
| In many cases it is not sufficient to provide only the original English string to get the correct translation. |
| You sometimes need to provide some context information. In German, for example, a button labeled "open" is translated to |
| "öffnen" in the context of "opening a file", or to "aufbauen" in the context of opening an internet connection. |
| |
| In these cases you must add some context information to the original string, by adding a comment. |
| |
| \code |
| button->setLabel(translate("File","open")); |
| \endcode |
| |
| The context information is provided as the first parameter to the \ref boost::locale::translate() "translate" |
| function in both singular and plural forms. The translator would see this context information and would be able to translate the |
| "open" string correctly. |
| |
| For example, this is how the \c po file would look: |
| |
| \code |
| msgctxt "File" |
| msgid "open" |
| msgstr "öffnen" |
| |
| msgctxt "Internet Connection" |
| msgid "open" |
| msgstr "aufbauen" |
| \endcode |
| |
| \note Context information requires more recent versions of the gettext tools (>=0.15) for extracting strings and |
| formatting message catalogs. |
| |
| |
| \subsection multiple_gettext_domain Working with multiple messages domains |
| |
| In some cases it is useful to work with multiple message domains. |
| |
| For example, if an application consists of several independent modules, it may |
| have several domains - a separate domain for each module. |
| |
| For example, developing a FooBar office suite we might have: |
| |
| - a FooBar Word Processor, using the "foobarwriter" domain |
| - a FooBar Spreadsheet, using the "foobarspreadsheet" domain |
| - a FooBar Spell Checker, using the "foobarspell" domain |
| - a FooBar File handler, using the "foobarodt" domain |
| |
| There are three ways to use non-default domains: |
| |
| - When working with \c iostream, you can use the parameterized manipulator \ref |
| boost::locale::as::domain "as::domain(std::string const &)", which allows switching domains in a stream: |
| \n |
| \code |
| cout << as::domain("foo") << translate("Hello") << as::domain("bar") << translate("Hello"); |
| // First translation is taken from dictionary foo and the other from dictionary bar |
| \endcode |
| - You can specify the domain explicitly when converting a \c message object to a string: |
| \code |
| std::wstring foo_msg = translate(L"Hello World").str("foo"); |
| std::wstring bar_msg = translate(L"Hello World").str("bar"); |
| \endcode |
| - You can specify the domain directly using a \ref direct_message_translation "convenience" interface: |
| \code |
| MessageBox(dgettext("gui","Error Occurred")); |
| \endcode |
| |
| \subsection direct_message_translation Direct translation (Convenience Interface) |
| |
| Many applications do not write messages directly to an output stream or use only one locale in the process, so |
| calling <tt>translate("Hello World").str()</tt> for a single message would be annoying. Thus Boost.Locale provides |
| GNU Gettext-like localization functions for direct translation of the messages. However, unlike the GNU Gettext functions, |
| the Boost.Locale translation functions provide an additional optional parameter (locale), and support wide, u16 and u32 strings. |
| |
| The GNU Gettext like functions prototypes can be found \ref boost_locale_gettext_family "in this section". |
| |
| |
| All of these functions can have different prefixes for different forms: |
| |
| - \c d - translation in specific domain |
| - \c n - plural form translation |
| - \c p - translation in specific context |
| |
| \code |
| MessageBoxW(0,pgettext(L"File Dialog",L"Open?").c_str(),gettext(L"Question").c_str(),MB_YESNO); |
| \endcode |
| |
| |
| \section extracting_messages_from_code Extracting messages from the source code |
| |
| There are many tools to extract messages from the source code into the \c .po file format. The most |
| popular and "native" tool is \c xgettext which is installed by default on most Unix systems and freely downloadable |
| for Windows (see \ref gettext_for_windows). |
| |
| For example, we have a source file called \c dir.cpp that prints: |
| |
| \code |
| cout << translate("Listing of catalog {1}:") % file_name << endl; |
| cout << translate("Catalog {1} contains 1 file","Catalog {1} contains {2,num} files",files_no) |
| % file_name % files_no << endl; |
| \endcode |
| |
| Now we run: |
| |
| \verbatim |
| xgettext --keyword=translate:1,1t --keyword=translate:1,2,3t dir.cpp |
| \endverbatim |
| |
| And a file called \c messages.po created that looks like this (approximately): |
| |
| \code |
| #: dir.cpp:1 |
| msgid "Listing of catalog {1}:" |
| msgstr "" |
| |
| #: dir.cpp:2 |
| msgid "Catalog {1} contains 1 file" |
| msgid_plural "Catalog {1} contains {2,num} files" |
| msgstr[0] "" |
| msgstr[1] "" |
| \endcode |
| |
| This file can be given to translators to adapt it to specific languages. |
| |
| We used the \c --keyword parameter of \c xgettext to make it suitable for extracting messages from |
| source code localized with Boost.Locale, searching for <tt>translate()</tt> function calls instead of the default <tt>gettext()</tt> |
| and <tt>ngettext()</tt> ones. |
| The first parameter <tt>--keyword=translate:1,1t</tt> provides the template for basic messages: a \c translate function that is |
| called with 1 argument (1t) and the first message is taken as the key. The second one <tt>--keyword=translate:1,2,3t</tt> is used |
| for plural forms. |
| It tells \c xgettext to use a <tt>translate()</tt> function call with 3 parameters (3t) and take the 1st and 2nd parameter as keys. An |
| additional marker \c Nc can be used to mark context information. |
| |
| The full set of xgettext parameters suitable for Boost.Locale is: |
| |
| \code |
| xgettext --keyword=translate:1,1t --keyword=translate:1c,2,2t \ |
| --keyword=translate:1,2,3t --keyword=translate:1c,2,3,4t \ |
| --keyword=gettext:1 --keyword=pgettext:1c,2 \ |
| --keyword=ngettext:1,2 --keyword=npgettext:1c,2,3 \ |
| source_file_1.cpp ... source_file_N.cpp |
| \endcode |
| |
| Of course, if you do not use "gettext" like translation you |
| may ignore some of these parameters. |
| |
| \subsection custom_file_system_support Custom Filesystem Support |
| |
| When the access to actual file system is limited like in ActiveX controls or |
| when the developer wants to ship all-in-one executable file, |
| it is useful to be able to load \c gettext catalogs from a custom location - |
| a custom file system. |
| |
| Boost.Locale provides an option to install boost::locale::message_format facet |
| with customized options provided in boost::locale::gnu_gettext::messages_info structure. |
| |
| This structure contains \c boost::function based |
| \ref boost::locale::gnu_gettext::messages_info::callback_type "callback" |
| that allows user to provide custom functionality to load message catalog files. |
| |
| For example: |
| |
| \code |
| // Configure all options for message catalog |
| namespace blg = boost::locale::gnu_gettext; |
| blg::messages_info info; |
| info.language = "he"; |
| info.country = "IL"; |
| info.encoding="UTF-8"; |
| info.paths.push_back(""); // You need some even empty path |
| info.domains.push_back(blg::messages_info::domain("my_app")); |
| info.callback = some_file_loader; // Provide a callback |
| |
| // Create a basic locale without messages support |
| boost::locale::generator gen; |
| std::locale base_locale = gen("he_IL.UTF-8"); |
| |
| // Install messages catalogs for "char" support to the final locale |
| // we are going to use |
| std::locale real_locale(base_locale,blg::create_messages_facet<char>(info)); |
| \endcode |
| |
| In order to setup \ref boost::locale::gnu_gettext::messages_info::language "language", \ref boost::locale::gnu_gettext::messages_info::country "country" and other members you may use \ref boost::locale::info facet for convenience, |
| |
| \code |
| // Configure all options for message catalog |
| namespace blg = boost::locale::gnu_gettext; |
| blg::messages_info info; |
| |
| info.paths.push_back(""); // You need some even empty path |
| info.domains.push_back(blg::messages_info::domain("my_app")); |
| info.callback = some_file_loader; // Provide a callback |
| |
| // Create an object with default locale |
| std::locale base_locale = gen(""); |
| |
| // Use boost::locale::info to configure all parameters |
| |
| boost::locale::info const &properties = std::use_facet<boost::locale::info>(base_locale); |
| info.language = properties.language(); |
| info.country = properties.country(); |
| info.encoding = properties.encoding(); |
| info.variant = properties.variant(); |
| |
| // Install messages catalogs to the final locale |
| std::locale real_locale(base_locale,blg::create_messages_facet<char>(info)); |
| \endcode |
| |
| \section msg_non_ascii_keys Non US-ASCII Keys |
| |
| Boost.Locale assumes that you use English for original text messages. And the best |
| practice is to use US-ASCII characters for original keys. |
| |
| However in some cases it us useful in insert some Unicode characters in text like |
| for example Copyright "©" character. |
| |
| As long as your narrow character string encoding is UTF-8 nothing further should be done. |
| |
| Boost.Locale assumes that your sources are encoded in UTF-8 and the input narrow |
| string use UTF-8 - which is the default for most compilers around (with notable |
| exception of Microsoft Visual C++). |
| |
| However if your narrow strings encoding in the source file is not UTF-8 but some other |
| encoding like windows-1252, the string would be misinterpreted. |
| |
| You can specify the character set of the original strings when you specify the |
| domain name for the application. |
| |
| \code |
| #include <boost/locale.hpp> |
| #include <iostream> |
| |
| using namespace std; |
| using namespace boost::locale; |
| |
| int main() |
| { |
| generator gen; |
| |
| // Specify location of dictionaries |
| gen.add_messages_path("."); |
| // Specify the encoding of the source string |
| gen.add_messages_domain("copyrighted/windows-1255"); |
| |
| // Generate locales and imbue them to iostream |
| locale::global(gen("")); |
| cout.imbue(locale()); |
| |
| // In Windows 1255 (C) symbol is encoded as 0xA9 |
| cout << translate("© 2001 All Rights Reserved") << endl; |
| } |
| \endcode |
| |
| Thus if the programs runs in UTF-8 locale the copyright symbol would |
| be automatically converted to an appropriate UTF-8 sequence if the |
| key is missing in the dictionary. |
| |
| |
| \subsection msg_qna Questions and Answers |
| |
| - Do I need GNU Gettext to use Boost.Locale? |
| \n |
| Boost.Locale provides a run-time environment to load and use GNU Gettext message catalogs, but it does |
| not provide tools for generation, translation, compilation and management of these catalogs. |
| Boost.Locale only reimplements the GNU Gettext libintl. |
| \n |
| You would probably need: |
| \n |
| -# Boost.Locale itself -- for runtime. |
| -# A tool for extracting strings from source code, and managing them: GNU Gettext provides good tools, but other |
| implementations are available as well. |
| -# A good translation program like <a href="http://userbase.kde.org/Lokalize">Lokalize</a>, <a href="http://www.poedit.net/">Pedit</a> or <a href="http://projects.gnome.org/gtranslator/">GTranslator</a>. |
| |
| - Why doesn't Boost.Locale provide tools for extracting and management of message catalogs. Why should |
| I use GPL-ed software? Are my programs or message catalogs affected by its license? |
| \n |
| -# Boost.Locale does not link to or use any of the GNU Gettext code, so you need not worry about your code as |
| the runtime library is fully reimplemented. |
| -# You may freely use GPL-ed software for extracting and managing catalogs, the same way as you are free to use |
| a GPL-ed editor. It does not affect your message catalogs or your code. |
| -# I see no reason to reimplement well debugged, working tools like \c xgettext, \c msgfmt, \c msgmerge that |
| do a very fine job, especially as they are freely available for download and support almost any platform. |
| All Linux distributions, BSD Flavors, Mac OS X and other Unix like operating systems provide GNU Gettext tools |
| as a standard package.\n |
| Windows users can get GNU Gettext utilities via MinGW project. See \ref gettext_for_windows. |
| |
| |
| - Is there any reason to prefer the Boost.Locale implementation to the original GNU Gettext runtime library? |
| In either case I would probably need some of the GNU tools. |
| \n |
| There are two important differences between the GNU Gettext runtime library and the Boost.Locale implementation: |
| \n |
| -# The GNU Gettext runtime supports only one locale per process. It is not thread-safe to use multiple locales |
| and encodings in the same process. This is perfectly fine for applications that interact directly with |
| a single user like most GUI applications, but is problematic for services and servers. |
| -# The GNU Gettext API supports only 8-bit encodings, making it irrelevant in environments that natively use |
| wide strings. |
| -# The GNU Gettext runtime library distributed under LGPL license which may be not convenient for some users. |
| |
| */ |
| |
| |