arm-2011.03/share/doc/arm-arm-none-linux-gnueabi/html/libc/Other-iconv-Implementations.html - nest-cam/v366/arm-none-linux-gnueabi-i686-pc-linux-gnu - Git at Google

 <html lang="en">
 <head>
 <title>Other iconv Implementations - The GNU C Library</title>
 <meta http-equiv="Content-Type" content="text/html">
 <meta name="description" content="The GNU C Library">
 <meta name="generator" content="makeinfo 4.13">
 <link title="Top" rel="start" href="index.html#Top">
 <link rel="up" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion" title="Generic Charset Conversion">
 <link rel="prev" href="iconv-Examples.html#iconv-Examples" title="iconv Examples">
 <link rel="next" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation" title="glibc iconv Implementation">
 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
 <!--
 This file documents the GNU C library.

 This is Edition 0.12, last updated 2007-10-27,
 of `The GNU C Library Reference Manual', for version
 2.8 (Sourcery G++ Lite 2011.03-41).

 Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
 2003, 2007, 2008, 2010 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
 any later version published by the Free Software Foundation; with the
 Invariant Sections being ``Free Software Needs Free Documentation''
 and ``GNU Lesser General Public License'', the Front-Cover texts being
 ``A GNU Manual'', and with the Back-Cover Texts as in (a) below.  A
 copy of the license is included in the section entitled "GNU Free
 Documentation License".

 (a) The FSF's Back-Cover Text is: ``You have the freedom to
 copy and modify this GNU manual.  Buying copies from the FSF
 supports it in developing GNU and promoting software freedom.''-->
 <meta http-equiv="Content-Style-Type" content="text/css">
 <style type="text/css"><!--
   pre.display { font-family:inherit }
   pre.format  { font-family:inherit }
   pre.smalldisplay { font-family:inherit; font-size:smaller }
   pre.smallformat  { font-family:inherit; font-size:smaller }
   pre.smallexample { font-size:smaller }
   pre.smalllisp    { font-size:smaller }
   span.sc    { font-variant:small-caps }
   span.roman { font-family:serif; font-weight:normal; }
   span.sansserif { font-family:sans-serif; font-weight:normal; }
 --></style>
 <link rel="stylesheet" type="text/css" href="../cs.css">
 </head>
 <body>
 <div class="node">
 <a name="Other-iconv-Implementations"></a>
 <p>
 Next:&nbsp;<a rel="next" accesskey="n" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation">glibc iconv Implementation</a>,
 Previous:&nbsp;<a rel="previous" accesskey="p" href="iconv-Examples.html#iconv-Examples">iconv Examples</a>,
 Up:&nbsp;<a rel="up" accesskey="u" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion">Generic Charset Conversion</a>
 <hr>
 </div>

 <h4 class="subsection">6.5.3 Some Details about other <code>iconv</code> Implementations</h4>

 <p>This is not really the place to discuss the <code>iconv</code> implementation
 of other systems but it is necessary to know a bit about them to write
 portable programs.  The above mentioned problems with the specification
 of the <code>iconv</code> functions can lead to portability issues.

    <p>The first thing to notice is that, due to the large number of character
 sets in use, it is certainly not practical to encode the conversions
 directly in the C library.  Therefore, the conversion information must
 come from files outside the C library.  This is usually done in one or
 both of the following ways:

      <ul>
 <li>The C library contains a set of generic conversion functions that can
 read the needed conversion tables and other information from data files.
 These files get loaded when necessary.

      <p>This solution is problematic as it requires a great deal of effort to
 apply to all character sets (potentially an infinite set).  The
 differences in the structure of the different character sets is so large
 that many different variants of the table-processing functions must be
 developed.  In addition, the generic nature of these functions make them
 slower than specifically implemented functions.

      <li>The C library only contains a framework that can dynamically load
 object files and execute the conversion functions contained therein.

      <p>This solution provides much more flexibility.  The C library itself
 contains only very little code and therefore reduces the general memory
 footprint.  Also, with a documented interface between the C library and
 the loadable modules it is possible for third parties to extend the set
 of available conversion modules.  A drawback of this solution is that
 dynamic loading must be available.
 </ul>

    <p>Some implementations in commercial Unices implement a mixture of these
 possibilities; the majority implement only the second solution.  Using
 loadable modules moves the code out of the library itself and keeps
 the door open for extensions and improvements, but this design is also
 limiting on some platforms since not many platforms support dynamic
 loading in statically linked programs.  On platforms without this
 capability it is therefore not possible to use this interface in
 statically linked programs.  The GNU C library has, on ELF platforms, no
 problems with dynamic loading in these situations; therefore, this
 point is moot.  The danger is that one gets acquainted with this
 situation and forgets about the restrictions on other systems.

    <p>A second thing to know about other <code>iconv</code> implementations is that
 the number of available conversions is often very limited.  Some
 implementations provide, in the standard release (not special
 international or developer releases), at most 100 to 200 conversion
 possibilities.  This does not mean 200 different character sets are
 supported; for example, conversions from one character set to a set of 10
 others might count as 10 conversions.  Together with the other direction
 this makes 20 conversion possibilities used up by one character set.  One
 can imagine the thin coverage these platform provide.  Some Unix vendors
 even provide only a handful of conversions, which renders them useless for
 almost all uses.

    <p>This directly leads to a third and probably the most problematic point.
 The way the <code>iconv</code> conversion functions are implemented on all
 known Unix systems and the availability of the conversion functions from
 character set A to B and the conversion from
 B to C does <em>not</em> imply that the
 conversion from A to C is available.

    <p>This might not seem unreasonable and problematic at first, but it is a
 quite big problem as one will notice shortly after hitting it.  To show
 the problem we assume to write a program that has to convert from
 A to C.  A call like

 <pre class="smallexample">     cd = iconv_open ("C", "A");
 </pre>
    <p class="noindent">fails according to the assumption above.  But what does the program
 do now?  The conversion is necessary; therefore, simply giving up is not
 an option.

    <p>This is a nuisance.  The <code>iconv</code> function should take care of this.
 But how should the program proceed from here on?  If it tries to convert
 to character set B, first the two <code>iconv_open</code>
 calls

 <pre class="smallexample">     cd1 = iconv_open ("B", "A");
 </pre>
    <p class="noindent">and

 <pre class="smallexample">     cd2 = iconv_open ("C", "B");
 </pre>
    <p class="noindent">will succeed, but how to find B?

    <p>Unfortunately, the answer is: there is no general solution.  On some
 systems guessing might help.  On those systems most character sets can
 convert to and from UTF-8 encoded ISO&nbsp;10646<!-- /@w --> or Unicode text. Beside
 this only some very system-specific methods can help.  Since the
 conversion functions come from loadable modules and these modules must
 be stored somewhere in the filesystem, one <em>could</em> try to find them
 and determine from the available file which conversions are available
 and whether there is an indirect route from A to
 C.

    <p>This example shows one of the design errors of <code>iconv</code> mentioned
 above.  It should at least be possible to determine the list of available
 conversion programmatically so that if <code>iconv_open</code> says there is no
 such conversion, one could make sure this also is true for indirect
 routes.

    </body></html>
	<html lang="en">
	<head>
	<title>Other iconv Implementations - The GNU C Library</title>
	<meta http-equiv="Content-Type" content="text/html">
	<meta name="description" content="The GNU C Library">
	<meta name="generator" content="makeinfo 4.13">
	<link title="Top" rel="start" href="index.html#Top">
	<link rel="up" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion" title="Generic Charset Conversion">
	<link rel="prev" href="iconv-Examples.html#iconv-Examples" title="iconv Examples">
	<link rel="next" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation" title="glibc iconv Implementation">
	<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
	<!--
	This file documents the GNU C library.

	This is Edition 0.12, last updated 2007-10-27,
	of `The GNU C Library Reference Manual', for version
	2.8 (Sourcery G++ Lite 2011.03-41).

	Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
	2003, 2007, 2008, 2010 Free Software Foundation, Inc.

	Permission is granted to copy, distribute and/or modify this document
	under the terms of the GNU Free Documentation License, Version 1.3 or
	any later version published by the Free Software Foundation; with the
	Invariant Sections being ``Free Software Needs Free Documentation''
	and ``GNU Lesser General Public License'', the Front-Cover texts being
	``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
	copy of the license is included in the section entitled "GNU Free
	Documentation License".

	(a) The FSF's Back-Cover Text is: ``You have the freedom to
	copy and modify this GNU manual. Buying copies from the FSF
	supports it in developing GNU and promoting software freedom.''-->
	<meta http-equiv="Content-Style-Type" content="text/css">
	<style type="text/css"><!--
	pre.display { font-family:inherit }
	pre.format { font-family:inherit }
	pre.smalldisplay { font-family:inherit; font-size:smaller }
	pre.smallformat { font-family:inherit; font-size:smaller }
	pre.smallexample { font-size:smaller }
	pre.smalllisp { font-size:smaller }
	span.sc { font-variant:small-caps }
	span.roman { font-family:serif; font-weight:normal; }
	span.sansserif { font-family:sans-serif; font-weight:normal; }
	--></style>
	<link rel="stylesheet" type="text/css" href="../cs.css">
	</head>
	<body>
	<div class="node">
	<a name="Other-iconv-Implementations"></a>
	<p>
	Next: <a rel="next" accesskey="n" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation">glibc iconv Implementation</a>,
	Previous: <a rel="previous" accesskey="p" href="iconv-Examples.html#iconv-Examples">iconv Examples</a>,
	Up: <a rel="up" accesskey="u" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion">Generic Charset Conversion</a>
	<hr>
	</div>

	<h4 class="subsection">6.5.3 Some Details about other <code>iconv</code> Implementations</h4>

	<p>This is not really the place to discuss the <code>iconv</code> implementation
	of other systems but it is necessary to know a bit about them to write
	portable programs. The above mentioned problems with the specification
	of the <code>iconv</code> functions can lead to portability issues.

	<p>The first thing to notice is that, due to the large number of character
	sets in use, it is certainly not practical to encode the conversions
	directly in the C library. Therefore, the conversion information must
	come from files outside the C library. This is usually done in one or
	both of the following ways:

	<ul>
	<li>The C library contains a set of generic conversion functions that can
	read the needed conversion tables and other information from data files.
	These files get loaded when necessary.

	<p>This solution is problematic as it requires a great deal of effort to
	apply to all character sets (potentially an infinite set). The
	differences in the structure of the different character sets is so large
	that many different variants of the table-processing functions must be
	developed. In addition, the generic nature of these functions make them
	slower than specifically implemented functions.

	<li>The C library only contains a framework that can dynamically load
	object files and execute the conversion functions contained therein.

	<p>This solution provides much more flexibility. The C library itself
	contains only very little code and therefore reduces the general memory
	footprint. Also, with a documented interface between the C library and
	the loadable modules it is possible for third parties to extend the set
	of available conversion modules. A drawback of this solution is that
	dynamic loading must be available.
	</ul>

	<p>Some implementations in commercial Unices implement a mixture of these
	possibilities; the majority implement only the second solution. Using
	loadable modules moves the code out of the library itself and keeps
	the door open for extensions and improvements, but this design is also
	limiting on some platforms since not many platforms support dynamic
	loading in statically linked programs. On platforms without this
	capability it is therefore not possible to use this interface in
	statically linked programs. The GNU C library has, on ELF platforms, no
	problems with dynamic loading in these situations; therefore, this
	point is moot. The danger is that one gets acquainted with this
	situation and forgets about the restrictions on other systems.

	<p>A second thing to know about other <code>iconv</code> implementations is that
	the number of available conversions is often very limited. Some
	implementations provide, in the standard release (not special
	international or developer releases), at most 100 to 200 conversion
	possibilities. This does not mean 200 different character sets are
	supported; for example, conversions from one character set to a set of 10
	others might count as 10 conversions. Together with the other direction
	this makes 20 conversion possibilities used up by one character set. One
	can imagine the thin coverage these platform provide. Some Unix vendors
	even provide only a handful of conversions, which renders them useless for
	almost all uses.

	<p>This directly leads to a third and probably the most problematic point.
	The way the <code>iconv</code> conversion functions are implemented on all
	known Unix systems and the availability of the conversion functions from
	character set A to B and the conversion from
	B to C does <em>not</em> imply that the
	conversion from A to C is available.

	<p>This might not seem unreasonable and problematic at first, but it is a
	quite big problem as one will notice shortly after hitting it. To show
	the problem we assume to write a program that has to convert from
	A to C. A call like

	<pre class="smallexample"> cd = iconv_open ("C", "A");
	</pre>
	<p class="noindent">fails according to the assumption above. But what does the program
	do now? The conversion is necessary; therefore, simply giving up is not
	an option.

	<p>This is a nuisance. The <code>iconv</code> function should take care of this.
	But how should the program proceed from here on? If it tries to convert
	to character set B, first the two <code>iconv_open</code>
	calls

	<pre class="smallexample"> cd1 = iconv_open ("B", "A");
	</pre>
	<p class="noindent">and

	<pre class="smallexample"> cd2 = iconv_open ("C", "B");
	</pre>
	<p class="noindent">will succeed, but how to find B?

	<p>Unfortunately, the answer is: there is no general solution. On some
	systems guessing might help. On those systems most character sets can
	convert to and from UTF-8 encoded ISO 10646<!-- /@w --> or Unicode text. Beside
	this only some very system-specific methods can help. Since the
	conversion functions come from loadable modules and these modules must
	be stored somewhere in the filesystem, one <em>could</em> try to find them
	and determine from the available file which conversions are available
	and whether there is an indirect route from A to
	C.

	<p>This example shows one of the design errors of <code>iconv</code> mentioned
	above. It should at least be possible to determine the list of available
	conversion programmatically so that if <code>iconv_open</code> says there is no
	such conversion, one could make sure this also is true for indirect
	routes.

	</body></html>