boost/libs/locale/doc/html/conversions.html - nest-cam/4320010/boost - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
 <meta http-equiv="X-UA-Compatible" content="IE=9"/>
 <title>Boost.Locale: Text Conversions</title>

 <link href="tabs.css" rel="stylesheet" type="text/css"/>
 <link href="doxygen.css" rel="stylesheet" type="text/css" />
 <link href="navtree.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="jquery.js"></script>
 <script type="text/javascript" src="resize.js"></script>
 <script type="text/javascript" src="navtree.js"></script>
 <script type="text/javascript">
   $(document).ready(initResizable);
 </script>


 </head>
 <body>
 <div id="top"><!-- do not remove this div! -->


 <div id="titlearea">
 <table cellspacing="0" cellpadding="0">
  <tbody>
  <tr style="height: 56px;">

   <td id="projectlogo"><img alt="Logo" src="boost-small.png"/></td>


   <td style="padding-left: 0.5em;">
    <div id="projectname">Boost.Locale

    </div>

   </td>


  </tr>
  </tbody>
 </table>
 </div>

 <!-- Generated by Doxygen 1.7.6.1 -->
   <div id="navrow1" class="tabs">
     <ul class="tablist">
       <li><a href="index.html"><span>Main&#160;Page</span></a></li>
       <li><a href="modules.html"><span>Modules</span></a></li>
       <li><a href="namespaces.html"><span>Namespaces</span></a></li>
       <li><a href="annotated.html"><span>Classes</span></a></li>
       <li><a href="files.html"><span>Files</span></a></li>
       <li><a href="examples.html"><span>Examples</span></a></li>
     </ul>
   </div>
 </div>
 <div id="side-nav" class="ui-resizable side-nav-resizable">
   <div id="nav-tree">
     <div id="nav-tree-contents">
     </div>
   </div>
   <div id="splitbar" style="-moz-user-select:none;"
        class="ui-resizable-handle">
   </div>
 </div>
 <script type="text/javascript">
   initNavTree('conversions.html','');
 </script>
 <div id="doc-content">
 <div class="header">
   <div class="headertitle">
 <div class="title">Text Conversions </div>  </div>
 </div><!--header-->
 <div class="contents">
 <div class="textblock"><p>There is a set of functions that perform basic string conversion operations: upper, lower and <a class="el" href="glossary.html#term_title_case">title case</a> conversions, <a class="el" href="glossary.html#term_case_folding">case folding</a> and Unicode <a class="el" href="glossary.html#term_normalization">normalization</a>. These are <a class="el" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">to_upper</a> , <a class="el" href="group__convert.html#ga4a3eb15f42f5cbae7bdd00c9e9cac222">to_lower</a>, <a class="el" href="group__convert.html#ga684efb375e060c71cd3e1799a6329f7f">to_title</a>, <a class="el" href="group__convert.html#gadf59d16355babd955766deef89d470ea">fold_case</a> and <a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a>.</p>
 <p>All these functions receive an <code>std::locale</code> object as parameter or use a global locale by default.</p>
 <p>Global locale is used in all examples below.</p>
 <h2><a class="anchor" id="conversions_case"></a>
 Case Handing</h2>
 <p>For example: </p>
 <div class="fragment"><pre class="fragment">    std::string grussen = <span class="stringliteral">&quot;grüßEN&quot;</span>;
     std::cout   &lt;&lt;<span class="stringliteral">&quot;Upper &quot;</span>&lt;&lt; <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) &lt;&lt; std::endl
                 &lt;&lt;<span class="stringliteral">&quot;Lower &quot;</span>&lt;&lt; <a class="code" href="group__convert.html#ga4a3eb15f42f5cbae7bdd00c9e9cac222">boost::locale::to_lower</a>(grussen) &lt;&lt; std::endl
                 &lt;&lt;<span class="stringliteral">&quot;Title &quot;</span>&lt;&lt; <a class="code" href="group__convert.html#ga684efb375e060c71cd3e1799a6329f7f">boost::locale::to_title</a>(grussen) &lt;&lt; std::endl
                 &lt;&lt;<span class="stringliteral">&quot;Fold  &quot;</span>&lt;&lt; <a class="code" href="group__convert.html#gadf59d16355babd955766deef89d470ea">boost::locale::fold_case</a>(grussen) &lt;&lt; std::endl;
 </pre></div><p>Would print:</p>
 <div class="fragment"><pre class="fragment">
 Upper GRÜSSEN
 Lower grüßen
 Title Grüßen
 Fold  grüssen
 </pre></div><p>You may notice that there are existing functions <code>to_upper</code> and <code>to_lower</code> in the Boost.StringAlgo library. The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.</p>
 <p>For example:</p>
 <div class="fragment"><pre class="fragment">    std::wstring grussen = L<span class="stringliteral">&quot;grüßen&quot;</span>;
     std::wcout &lt;&lt; boost::algorithm::to_upper_copy(grussen) &lt;&lt; <span class="stringliteral">&quot; &quot;</span> &lt;&lt; <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) &lt;&lt; std::endl;
 </pre></div><p>Would give in output:</p>
 <div class="fragment"><pre class="fragment">
 GRÜßEN GRÜSSEN
 </pre></div><p>Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of <code>std::ctype</code> facet.</p>
 <p>This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all. For example, this code</p>
 <div class="fragment"><pre class="fragment">    std::string grussen = <span class="stringliteral">&quot;grüßen&quot;</span>;
     std::cout &lt;&lt; boost::algorithm::to_upper_copy(grussen) &lt;&lt; <span class="stringliteral">&quot; &quot;</span> &lt;&lt; <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) &lt;&lt; std::endl;
 </pre></div><p>Would modify ASCII characters only</p>
 <div class="fragment"><pre class="fragment">
 GRüßEN GRÜSSEN
 </pre></div><h2><a class="anchor" id="conversions_normalization"></a>
 Unicode Normalization</h2>
 <p>Unicode normalization is the process of converting strings to a standard form, suitable for text processing and comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the diaeresis "¨". Normalization is an important part of Unicode text processing.</p>
 <p>Unicode defines four normalization forms. Each specific form is selected by a flag passed to <a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a> function:</p>
 <ul>
 <li>NFD - Canonical decomposition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa6648d0eabb931f2e9d258570b297e98f" title="Canonical decomposition.">boost::locale::norm_nfd</a></li>
 <li>NFC - Canonical decomposition followed by canonical composition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9faf6fe7be275e5e13df415ab258105ada0" title="Canonical decomposition followed by canonical composition.">boost::locale::norm_nfc</a> or <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9faa29173d73d9be7fefcbb18c8712465d2" title="Default normalization - canonical decomposition followed by canonical composition.">boost::locale::norm_default</a></li>
 <li>NFKD - Compatibility decomposition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa0fbc2ac042fc6f58af5818bfd06d5379" title="Compatibility decomposition.">boost::locale::norm_nfkd</a></li>
 <li>NFKC - Compatibility decomposition followed by canonical composition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa0305c1f3405ea70facf4c6a5ffa40583" title="Compatibility decomposition followed by canonical composition.">boost::locale::norm_nfkc</a></li>
 </ul>
 <p>For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.</p>
 <h2><a class="anchor" id="conversions_notes"></a>
 Notes</h2>
 <ul>
 <li><a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a> operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the character width. So be careful when using non-UTF encodings as they may be treated incorrectly.</li>
 <li><a class="el" href="group__convert.html#gadf59d16355babd955766deef89d470ea">fold_case</a> is generally a locale-independent operation, but it receives a locale as a parameter to determine the 8-bit encoding.</li>
 <li>All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always return a newly created STL string.</li>
 <li>The length of the string may change, see the above example. </li>
 </ul>
 </div></div><!-- contents -->
 </div>
   <div id="nav-path" class="navpath">
     <ul>
       <li class="navelem"><a class="el" href="index.html">Boost.Locale</a>      </li>
       <li class="navelem"><a class="el" href="using_boost_locale.html">Using Boost.Locale</a>      </li>

     <li class="footer">
 &copy; Copyright 2009-2012 Artyom Beilis,  Distributed under the <a href="http://www.boost.org/LICENSE_1_0.txt">Boost Software License</a>, Version 1.0.
     </li>
    </ul>
  </div>


 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
	<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
	<meta http-equiv="X-UA-Compatible" content="IE=9"/>
	<title>Boost.Locale: Text Conversions</title>

	<link href="tabs.css" rel="stylesheet" type="text/css"/>
	<link href="doxygen.css" rel="stylesheet" type="text/css" />
	<link href="navtree.css" rel="stylesheet" type="text/css"/>
	<script type="text/javascript" src="jquery.js"></script>
	<script type="text/javascript" src="resize.js"></script>
	<script type="text/javascript" src="navtree.js"></script>
	<script type="text/javascript">
	$(document).ready(initResizable);
	</script>


	</head>
	<body>
	<div id="top"><!-- do not remove this div! -->


	<div id="titlearea">
	<table cellspacing="0" cellpadding="0">
	<tbody>
	<tr style="height: 56px;">

	<td id="projectlogo"><img alt="Logo" src="boost-small.png"/></td>


	<td style="padding-left: 0.5em;">
	<div id="projectname">Boost.Locale

	</div>

	</td>



	</tr>
	</tbody>
	</table>
	</div>

	<!-- Generated by Doxygen 1.7.6.1 -->
	<div id="navrow1" class="tabs">
	<ul class="tablist">
	<li><a href="index.html"><span>Main Page</span></a></li>
	<li><a href="modules.html"><span>Modules</span></a></li>
	<li><a href="namespaces.html"><span>Namespaces</span></a></li>
	<li><a href="annotated.html"><span>Classes</span></a></li>
	<li><a href="files.html"><span>Files</span></a></li>
	<li><a href="examples.html"><span>Examples</span></a></li>
	</ul>
	</div>
	</div>
	<div id="side-nav" class="ui-resizable side-nav-resizable">
	<div id="nav-tree">
	<div id="nav-tree-contents">
	</div>
	</div>
	<div id="splitbar" style="-moz-user-select:none;"
	class="ui-resizable-handle">
	</div>
	</div>
	<script type="text/javascript">
	initNavTree('conversions.html','');
	</script>
	<div id="doc-content">
	<div class="header">
	<div class="headertitle">
	<div class="title">Text Conversions </div> </div>
	</div><!--header-->
	<div class="contents">
	<div class="textblock"><p>There is a set of functions that perform basic string conversion operations: upper, lower and <a class="el" href="glossary.html#term_title_case">title case</a> conversions, <a class="el" href="glossary.html#term_case_folding">case folding</a> and Unicode <a class="el" href="glossary.html#term_normalization">normalization</a>. These are <a class="el" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">to_upper</a> , <a class="el" href="group__convert.html#ga4a3eb15f42f5cbae7bdd00c9e9cac222">to_lower</a>, <a class="el" href="group__convert.html#ga684efb375e060c71cd3e1799a6329f7f">to_title</a>, <a class="el" href="group__convert.html#gadf59d16355babd955766deef89d470ea">fold_case</a> and <a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a>.</p>
	<p>All these functions receive an <code>std::locale</code> object as parameter or use a global locale by default.</p>
	<p>Global locale is used in all examples below.</p>
	<h2><a class="anchor" id="conversions_case"></a>
	Case Handing</h2>
	<p>For example: </p>
	<div class="fragment"><pre class="fragment"> std::string grussen = <span class="stringliteral">"grüßEN"</span>;
	std::cout <<<span class="stringliteral">"Upper "</span><< <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) << std::endl
	<<<span class="stringliteral">"Lower "</span><< <a class="code" href="group__convert.html#ga4a3eb15f42f5cbae7bdd00c9e9cac222">boost::locale::to_lower</a>(grussen) << std::endl
	<<<span class="stringliteral">"Title "</span><< <a class="code" href="group__convert.html#ga684efb375e060c71cd3e1799a6329f7f">boost::locale::to_title</a>(grussen) << std::endl
	<<<span class="stringliteral">"Fold "</span><< <a class="code" href="group__convert.html#gadf59d16355babd955766deef89d470ea">boost::locale::fold_case</a>(grussen) << std::endl;
	</pre></div><p>Would print:</p>
	<div class="fragment"><pre class="fragment">
	Upper GRÜSSEN
	Lower grüßen
	Title Grüßen
	Fold grüssen
	</pre></div><p>You may notice that there are existing functions <code>to_upper</code> and <code>to_lower</code> in the Boost.StringAlgo library. The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.</p>
	<p>For example:</p>
	<div class="fragment"><pre class="fragment"> std::wstring grussen = L<span class="stringliteral">"grüßen"</span>;
	std::wcout << boost::algorithm::to_upper_copy(grussen) << <span class="stringliteral">" "</span> << <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) << std::endl;
	</pre></div><p>Would give in output:</p>
	<div class="fragment"><pre class="fragment">
	GRÜßEN GRÜSSEN
	</pre></div><p>Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of <code>std::ctype</code> facet.</p>
	<p>This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all. For example, this code</p>
	<div class="fragment"><pre class="fragment"> std::string grussen = <span class="stringliteral">"grüßen"</span>;
	std::cout << boost::algorithm::to_upper_copy(grussen) << <span class="stringliteral">" "</span> << <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) << std::endl;
	</pre></div><p>Would modify ASCII characters only</p>
	<div class="fragment"><pre class="fragment">
	GRüßEN GRÜSSEN
	</pre></div><h2><a class="anchor" id="conversions_normalization"></a>
	Unicode Normalization</h2>
	<p>Unicode normalization is the process of converting strings to a standard form, suitable for text processing and comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the diaeresis "¨". Normalization is an important part of Unicode text processing.</p>
	<p>Unicode defines four normalization forms. Each specific form is selected by a flag passed to <a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a> function:</p>
	<ul>
	<li>NFD - Canonical decomposition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa6648d0eabb931f2e9d258570b297e98f" title="Canonical decomposition.">boost::locale::norm_nfd</a></li>
	<li>NFC - Canonical decomposition followed by canonical composition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9faf6fe7be275e5e13df415ab258105ada0" title="Canonical decomposition followed by canonical composition.">boost::locale::norm_nfc</a> or <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9faa29173d73d9be7fefcbb18c8712465d2" title="Default normalization - canonical decomposition followed by canonical composition.">boost::locale::norm_default</a></li>
	<li>NFKD - Compatibility decomposition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa0fbc2ac042fc6f58af5818bfd06d5379" title="Compatibility decomposition.">boost::locale::norm_nfkd</a></li>
	<li>NFKC - Compatibility decomposition followed by canonical composition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa0305c1f3405ea70facf4c6a5ffa40583" title="Compatibility decomposition followed by canonical composition.">boost::locale::norm_nfkc</a></li>
	</ul>
	<p>For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.</p>
	<h2><a class="anchor" id="conversions_notes"></a>
	Notes</h2>
	<ul>
	<li><a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a> operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the character width. So be careful when using non-UTF encodings as they may be treated incorrectly.</li>
	<li><a class="el" href="group__convert.html#gadf59d16355babd955766deef89d470ea">fold_case</a> is generally a locale-independent operation, but it receives a locale as a parameter to determine the 8-bit encoding.</li>
	<li>All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always return a newly created STL string.</li>
	<li>The length of the string may change, see the above example. </li>
	</ul>
	</div></div><!-- contents -->
	</div>
	<div id="nav-path" class="navpath">
	<ul>
	<li class="navelem"><a class="el" href="index.html">Boost.Locale</a> </li>
	<li class="navelem"><a class="el" href="using_boost_locale.html">Using Boost.Locale</a> </li>

	<li class="footer">
	© Copyright 2009-2012 Artyom Beilis, Distributed under the <a href="http://www.boost.org/LICENSE_1_0.txt">Boost Software License</a>, Version 1.0.
	</li>
	</ul>
	</div>


	</body>
	</html>