arm-2011.03/share/doc/arm-arm-none-linux-gnueabi/html/libc/Generic-Charset-Conversion.html - nest-cam/v366/arm-none-linux-gnueabi-i686-pc-linux-gnu - Git at Google

 <html lang="en">
 <head>
 <title>Generic Charset Conversion - The GNU C Library</title>
 <meta http-equiv="Content-Type" content="text/html">
 <meta name="description" content="The GNU C Library">
 <meta name="generator" content="makeinfo 4.13">
 <link title="Top" rel="start" href="index.html#Top">
 <link rel="up" href="Character-Set-Handling.html#Character-Set-Handling" title="Character Set Handling">
 <link rel="prev" href="Non_002dreentrant-Conversion.html#Non_002dreentrant-Conversion" title="Non-reentrant Conversion">
 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
 <!--
 This file documents the GNU C library.

 This is Edition 0.12, last updated 2007-10-27,
 of `The GNU C Library Reference Manual', for version
 2.8 (Sourcery G++ Lite 2011.03-41).

 Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
 2003, 2007, 2008, 2010 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
 any later version published by the Free Software Foundation; with the
 Invariant Sections being ``Free Software Needs Free Documentation''
 and ``GNU Lesser General Public License'', the Front-Cover texts being
 ``A GNU Manual'', and with the Back-Cover Texts as in (a) below.  A
 copy of the license is included in the section entitled "GNU Free
 Documentation License".

 (a) The FSF's Back-Cover Text is: ``You have the freedom to
 copy and modify this GNU manual.  Buying copies from the FSF
 supports it in developing GNU and promoting software freedom.''-->
 <meta http-equiv="Content-Style-Type" content="text/css">
 <style type="text/css"><!--
   pre.display { font-family:inherit }
   pre.format  { font-family:inherit }
   pre.smalldisplay { font-family:inherit; font-size:smaller }
   pre.smallformat  { font-family:inherit; font-size:smaller }
   pre.smallexample { font-size:smaller }
   pre.smalllisp    { font-size:smaller }
   span.sc    { font-variant:small-caps }
   span.roman { font-family:serif; font-weight:normal; }
   span.sansserif { font-family:sans-serif; font-weight:normal; }
 --></style>
 <link rel="stylesheet" type="text/css" href="../cs.css">
 </head>
 <body>
 <div class="node">
 <a name="Generic-Charset-Conversion"></a>
 <p>
 Previous:&nbsp;<a rel="previous" accesskey="p" href="Non_002dreentrant-Conversion.html#Non_002dreentrant-Conversion">Non-reentrant Conversion</a>,
 Up:&nbsp;<a rel="up" accesskey="u" href="Character-Set-Handling.html#Character-Set-Handling">Character Set Handling</a>
 <hr>
 </div>

 <h3 class="section">6.5 Generic Charset Conversion</h3>

 <p>The conversion functions mentioned so far in this chapter all had in
 common that they operate on character sets that are not directly
 specified by the functions.  The multibyte encoding used is specified by
 the currently selected locale for the <code>LC_CTYPE</code> category.  The
 wide character set is fixed by the implementation (in the case of GNU C
 library it is always UCS-4 encoded ISO&nbsp;10646<!-- /@w -->.

    <p>This has of course several problems when it comes to general character
 conversion:

      <ul>
 <li>For every conversion where neither the source nor the destination
 character set is the character set of the locale for the <code>LC_CTYPE</code>
 category, one has to change the <code>LC_CTYPE</code> locale using
 <code>setlocale</code>.

      <p>Changing the <code>LC_TYPE</code> locale introduces major problems for the rest
 of the programs since several more functions (e.g., the character
 classification functions, see <a href="Classification-of-Characters.html#Classification-of-Characters">Classification of Characters</a>) use the
 <code>LC_CTYPE</code> category.

      <li>Parallel conversions to and from different character sets are not
 possible since the <code>LC_CTYPE</code> selection is global and shared by all
 threads.

      <li>If neither the source nor the destination character set is the character
 set used for <code>wchar_t</code> representation, there is at least a two-step
 process necessary to convert a text using the functions above.  One would
 have to select the source character set as the multibyte encoding,
 convert the text into a <code>wchar_t</code> text, select the destination
 character set as the multibyte encoding, and convert the wide character
 text to the multibyte (= destination) character set.

      <p>Even if this is possible (which is not guaranteed) it is a very tiring
 work.  Plus it suffers from the other two raised points even more due to
 the steady changing of the locale.
 </ul>

    <p>The XPG2 standard defines a completely new set of functions, which has
 none of these limitations.  They are not at all coupled to the selected
 locales, and they have no constraints on the character sets selected for
 source and destination.  Only the set of available conversions limits
 them.  The standard does not specify that any conversion at all must be
 available.  Such availability is a measure of the quality of the
 implementation.

    <p>In the following text first the interface to <code>iconv</code> and then the
 conversion function, will be described.  Comparisons with other
 implementations will show what obstacles stand in the way of portable
 applications.  Finally, the implementation is described in so far as might
 interest the advanced user who wants to extend conversion capabilities.

 <ul class="menu">
 <li><a accesskey="1" href="Generic-Conversion-Interface.html#Generic-Conversion-Interface">Generic Conversion Interface</a>:     Generic Character Set Conversion Interface.
 <li><a accesskey="2" href="iconv-Examples.html#iconv-Examples">iconv Examples</a>:                   A complete <code>iconv</code> example.
 <li><a accesskey="3" href="Other-iconv-Implementations.html#Other-iconv-Implementations">Other iconv Implementations</a>:      Some Details about other <code>iconv</code>
                                      Implementations.
 <li><a accesskey="4" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation">glibc iconv Implementation</a>:       The <code>iconv</code> Implementation in the GNU C
                                      library.
 </ul>

    </body></html>
	<html lang="en">
	<head>
	<title>Generic Charset Conversion - The GNU C Library</title>
	<meta http-equiv="Content-Type" content="text/html">
	<meta name="description" content="The GNU C Library">
	<meta name="generator" content="makeinfo 4.13">
	<link title="Top" rel="start" href="index.html#Top">
	<link rel="up" href="Character-Set-Handling.html#Character-Set-Handling" title="Character Set Handling">
	<link rel="prev" href="Non_002dreentrant-Conversion.html#Non_002dreentrant-Conversion" title="Non-reentrant Conversion">
	<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
	<!--
	This file documents the GNU C library.

	This is Edition 0.12, last updated 2007-10-27,
	of `The GNU C Library Reference Manual', for version
	2.8 (Sourcery G++ Lite 2011.03-41).

	Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
	2003, 2007, 2008, 2010 Free Software Foundation, Inc.

	Permission is granted to copy, distribute and/or modify this document
	under the terms of the GNU Free Documentation License, Version 1.3 or
	any later version published by the Free Software Foundation; with the
	Invariant Sections being ``Free Software Needs Free Documentation''
	and ``GNU Lesser General Public License'', the Front-Cover texts being
	``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
	copy of the license is included in the section entitled "GNU Free
	Documentation License".

	(a) The FSF's Back-Cover Text is: ``You have the freedom to
	copy and modify this GNU manual. Buying copies from the FSF
	supports it in developing GNU and promoting software freedom.''-->
	<meta http-equiv="Content-Style-Type" content="text/css">
	<style type="text/css"><!--
	pre.display { font-family:inherit }
	pre.format { font-family:inherit }
	pre.smalldisplay { font-family:inherit; font-size:smaller }
	pre.smallformat { font-family:inherit; font-size:smaller }
	pre.smallexample { font-size:smaller }
	pre.smalllisp { font-size:smaller }
	span.sc { font-variant:small-caps }
	span.roman { font-family:serif; font-weight:normal; }
	span.sansserif { font-family:sans-serif; font-weight:normal; }
	--></style>
	<link rel="stylesheet" type="text/css" href="../cs.css">
	</head>
	<body>
	<div class="node">
	<a name="Generic-Charset-Conversion"></a>
	<p>
	Previous: <a rel="previous" accesskey="p" href="Non_002dreentrant-Conversion.html#Non_002dreentrant-Conversion">Non-reentrant Conversion</a>,
	Up: <a rel="up" accesskey="u" href="Character-Set-Handling.html#Character-Set-Handling">Character Set Handling</a>
	<hr>
	</div>

	<h3 class="section">6.5 Generic Charset Conversion</h3>

	<p>The conversion functions mentioned so far in this chapter all had in
	common that they operate on character sets that are not directly
	specified by the functions. The multibyte encoding used is specified by
	the currently selected locale for the <code>LC_CTYPE</code> category. The
	wide character set is fixed by the implementation (in the case of GNU C
	library it is always UCS-4 encoded ISO 10646<!-- /@w -->.

	<p>This has of course several problems when it comes to general character
	conversion:

	<ul>
	<li>For every conversion where neither the source nor the destination
	character set is the character set of the locale for the <code>LC_CTYPE</code>
	category, one has to change the <code>LC_CTYPE</code> locale using
	<code>setlocale</code>.

	<p>Changing the <code>LC_TYPE</code> locale introduces major problems for the rest
	of the programs since several more functions (e.g., the character
	classification functions, see <a href="Classification-of-Characters.html#Classification-of-Characters">Classification of Characters</a>) use the
	<code>LC_CTYPE</code> category.

	<li>Parallel conversions to and from different character sets are not
	possible since the <code>LC_CTYPE</code> selection is global and shared by all
	threads.

	<li>If neither the source nor the destination character set is the character
	set used for <code>wchar_t</code> representation, there is at least a two-step
	process necessary to convert a text using the functions above. One would
	have to select the source character set as the multibyte encoding,
	convert the text into a <code>wchar_t</code> text, select the destination
	character set as the multibyte encoding, and convert the wide character
	text to the multibyte (= destination) character set.

	<p>Even if this is possible (which is not guaranteed) it is a very tiring
	work. Plus it suffers from the other two raised points even more due to
	the steady changing of the locale.
	</ul>

	<p>The XPG2 standard defines a completely new set of functions, which has
	none of these limitations. They are not at all coupled to the selected
	locales, and they have no constraints on the character sets selected for
	source and destination. Only the set of available conversions limits
	them. The standard does not specify that any conversion at all must be
	available. Such availability is a measure of the quality of the
	implementation.

	<p>In the following text first the interface to <code>iconv</code> and then the
	conversion function, will be described. Comparisons with other
	implementations will show what obstacles stand in the way of portable
	applications. Finally, the implementation is described in so far as might
	interest the advanced user who wants to extend conversion capabilities.

	<ul class="menu">
	<li><a accesskey="1" href="Generic-Conversion-Interface.html#Generic-Conversion-Interface">Generic Conversion Interface</a>: Generic Character Set Conversion Interface.
	<li><a accesskey="2" href="iconv-Examples.html#iconv-Examples">iconv Examples</a>: A complete <code>iconv</code> example.
	<li><a accesskey="3" href="Other-iconv-Implementations.html#Other-iconv-Implementations">Other iconv Implementations</a>: Some Details about other <code>iconv</code>
	Implementations.
	<li><a accesskey="4" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation">glibc iconv Implementation</a>: The <code>iconv</code> Implementation in the GNU C
	library.
	</ul>

	</body></html>