blob: 1673bf9432834524da8e07977e096db6c9bb5fc1 [file] [log] [blame]
//
// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
//
// Distributed under the Boost Software License, Version 1.0. (See
// accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt)
//
// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
/*!
\page std_locales Introduction to C++ Standard Library localization support
\section std_locales_basics Getting familiar with standard C++ Locales
The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c
std::locale class, the container that holds all the required information about a specific culture, such as number formatting
patterns, date and time formatting, currency, case conversion etc.
All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are
packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class
keeps reference counters on installed facets and can be efficiently copied.
Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example,
the \c std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:
\code
std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale);
char upper_a = ctype_facet.toupper('a');
\endcode
A locale object can be imbued into an \c iostream so it would format information according to the locale:
\code
cout.imbue(std::locale("en_US.UTF-8"));
cout << 1345.45 << endl;
cout.imbue(std::locale("ru_RU.UTF-8"));
cout << 1345.45 << endl;
\endcode
Would display:
\verbatim
1,345.45 1.345,45
\endverbatim
You can also create your own facets and install them into existing locale objects. For example:
\code
class measure : public std::locale::facet {
public:
typedef enum { inches, ... } measure_type;
measure(measure_type m,size_t refs=0)
double from_metric(double value) const;
std::string name() const;
...
};
\endcode
And now you can simply provide this information to a locale:
\code
std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches)));
/// Create default locale built from en_US locale and add paper size facet.
\endcode
Now you can print a distance according to the correct locale:
\code
void print_distance(std::ostream &out,double value)
{
measure const &m = std::use_facet<measure>(out.getloc());
// Fetch locale information from stream
out << m.from_metric(value) << " " << m.name();
}
\endcode
This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using
the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.
\section std_locales_common Common Critical Problems with the Standard Library
There are numerous issues in the standard library that prevent the use of its full power, and there are several
additional issues:
- Setting the global locale has bad side effects.
\n
Consider following code:
\n
\code
int main()
{
std::locale::global(std::locale(""));
// Set system's default locale as global
std::ofstream csv("test.csv");
csv << 1.1 << "," << 1.3 << std::endl;
}
\endcode
\n
What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3"
rather than what you had expected.
\n
More than that it affects even \c printf and libraries like \c boost::lexical_cast giving
incorrect or unexpected formatting. In fact many third-party libraries are broken in such a
situation.
\n
Unlike the standard localization library, Boost.Locale never changes the basic number formatting,
even when it uses \c std based localization backends, so by default, numbers are always
formatted using C-style locale. Localized number formatting requires specific flags.
\n
- Number formatting is broken on some locales.
\n
Some locales use the non-breakable space u00A0 character for thousands separator, thus
in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space
is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle
this correctly, for example GCC and SunStudio display a "\xC2" character instead of
the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and
actually generate invalid UTF-8.
\n
- Locale names are not standardized. For example, under MSVC you need to provide the name
\c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8
or \c en_US.ISO-8859-1
\n
More than that, MSVC does not support UTF-8 locales at all.
\n
- Many standard libraries provide only the C and POSIX locales, thus GCC supports localization
only under Linux. On all other platforms, attempting to create locales other than "C" or
"POSIX" would fail.
*/