blob: 447751dbcd8d8dc0df834017cdf051ab0e8e20f0 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>Boost.Locale: Recommendations and Myths</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
</head>
<body>
<div id="top"><!-- do not remove this div! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><img alt="Logo" src="boost-small.png"/></td>
<td style="padding-left: 0.5em;">
<div id="projectname">Boost.Locale
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- Generated by Doxygen 1.7.6.1 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main&#160;Page</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="namespaces.html"><span>Namespaces</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<li><a href="examples.html"><span>Examples</span></a></li>
</ul>
</div>
</div>
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
initNavTree('recommendations_and_myths.html','');
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">Recommendations and Myths </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><h2><a class="anchor" id="recommendations"></a>
Recommendations</h2>
<ul>
<li>The first and most important recommendation: prefer UTF-8 encoding for narrow strings --- it represents all supported Unicode characters and is more convenient for general use than encodings like Latin1.</li>
<li>Remember, there are many different cultures. You can assume very little about the user's language. His calendar may not have "January". It may be not possible to convert strings to integers using <code>atoi</code> because they may not use the "ordinary" digits 0..9 at all. You can't assume that "space" characters are frequent because in Chinese the space character does not separate words. The text may be written from Right-to-Left or from Up-to-Down, and so on.</li>
<li>Using message formatting, try to provide as much context information as you can. Prefer translating entire sentences over single words. When translating words, <b>always</b> add some context information.</li>
</ul>
<h2><a class="anchor" id="myths"></a>
Myths</h2>
<h3><a class="anchor" id="myths_wide"></a>
To use Unicode in my application I should use wide strings everywhere.</h3>
<p>Unicode is not limited to wide strings. Both <code>std::string</code> and <code>std::wstring</code> can hold and process Unicode text. More than that, the semantics of <code>std::string</code> are much cleaner in multi-platform applications, because all "Unicode" strings are UTF-8. "Wide" strings may be encoded in "UTF-16" or "UTF-32", depending on the platform, so they may be even less convenient when dealing with Unicode than <code>char</code> based strings.</p>
<h3><a class="anchor" id="myths_utf16"></a>
UTF-16 is the best encoding to work with.</h3>
<p>There is common assumption that UTF-16 is the best encoding for storing information because it gives "shortest" representation of strings.</p>
<p>In fact, it is probably the most error-prone encoding to work with. The biggest issue is code points that lay outside of the BMP, which must be represented with surrogate pairs. These characters are very rare and many applications are not tested with them.</p>
<p>For example:</p>
<ul>
<li>Qt3 could not deal with characters outside of the BMP.</li>
<li>Editing a character with a codepoint above 0xFFFF often shows an unpleasant bug: for example, to erase such a character in Windows Notepad you have to press backspace twice.</li>
</ul>
<p>So UTF-16 can be used for Unicode, in fact ICU and many other applications use UTF-16 as their internal Unicode representation, but you should be very careful and never assume one-code-point == one-utf16-character. </p>
</div></div><!-- contents -->
</div>
<div id="nav-path" class="navpath">
<ul>
<li class="navelem"><a class="el" href="index.html">Boost.Locale</a> </li>
<li class="footer">
&copy; Copyright 2009-2012 Artyom Beilis, Distributed under the <a href="http://www.boost.org/LICENSE_1_0.txt">Boost Software License</a>, Version 1.0.
</li>
</ul>
</div>
</body>
</html>