blob: bd76e1f902e0a5df2a050d160a4fc6e99e9dabfe [file] [log] [blame]
<html lang="en">
<head>
<title>Other iconv Implementations - The GNU C Library</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The GNU C Library">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion" title="Generic Charset Conversion">
<link rel="prev" href="iconv-Examples.html#iconv-Examples" title="iconv Examples">
<link rel="next" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation" title="glibc iconv Implementation">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
This file documents the GNU C library.
This is Edition 0.12, last updated 2007-10-27,
of `The GNU C Library Reference Manual', for version
2.8 (Sourcery G++ Lite 2011.03-41).
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
2003, 2007, 2008, 2010 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``Free Software Needs Free Documentation''
and ``GNU Lesser General Public License'', the Front-Cover texts being
``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
(a) The FSF's Back-Cover Text is: ``You have the freedom to
copy and modify this GNU manual. Buying copies from the FSF
supports it in developing GNU and promoting software freedom.''-->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
pre.display { font-family:inherit }
pre.format { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }
pre.smallformat { font-family:inherit; font-size:smaller }
pre.smallexample { font-size:smaller }
pre.smalllisp { font-size:smaller }
span.sc { font-variant:small-caps }
span.roman { font-family:serif; font-weight:normal; }
span.sansserif { font-family:sans-serif; font-weight:normal; }
--></style>
<link rel="stylesheet" type="text/css" href="../cs.css">
</head>
<body>
<div class="node">
<a name="Other-iconv-Implementations"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation">glibc iconv Implementation</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="iconv-Examples.html#iconv-Examples">iconv Examples</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion">Generic Charset Conversion</a>
<hr>
</div>
<h4 class="subsection">6.5.3 Some Details about other <code>iconv</code> Implementations</h4>
<p>This is not really the place to discuss the <code>iconv</code> implementation
of other systems but it is necessary to know a bit about them to write
portable programs. The above mentioned problems with the specification
of the <code>iconv</code> functions can lead to portability issues.
<p>The first thing to notice is that, due to the large number of character
sets in use, it is certainly not practical to encode the conversions
directly in the C library. Therefore, the conversion information must
come from files outside the C library. This is usually done in one or
both of the following ways:
<ul>
<li>The C library contains a set of generic conversion functions that can
read the needed conversion tables and other information from data files.
These files get loaded when necessary.
<p>This solution is problematic as it requires a great deal of effort to
apply to all character sets (potentially an infinite set). The
differences in the structure of the different character sets is so large
that many different variants of the table-processing functions must be
developed. In addition, the generic nature of these functions make them
slower than specifically implemented functions.
<li>The C library only contains a framework that can dynamically load
object files and execute the conversion functions contained therein.
<p>This solution provides much more flexibility. The C library itself
contains only very little code and therefore reduces the general memory
footprint. Also, with a documented interface between the C library and
the loadable modules it is possible for third parties to extend the set
of available conversion modules. A drawback of this solution is that
dynamic loading must be available.
</ul>
<p>Some implementations in commercial Unices implement a mixture of these
possibilities; the majority implement only the second solution. Using
loadable modules moves the code out of the library itself and keeps
the door open for extensions and improvements, but this design is also
limiting on some platforms since not many platforms support dynamic
loading in statically linked programs. On platforms without this
capability it is therefore not possible to use this interface in
statically linked programs. The GNU C library has, on ELF platforms, no
problems with dynamic loading in these situations; therefore, this
point is moot. The danger is that one gets acquainted with this
situation and forgets about the restrictions on other systems.
<p>A second thing to know about other <code>iconv</code> implementations is that
the number of available conversions is often very limited. Some
implementations provide, in the standard release (not special
international or developer releases), at most 100 to 200 conversion
possibilities. This does not mean 200 different character sets are
supported; for example, conversions from one character set to a set of 10
others might count as 10 conversions. Together with the other direction
this makes 20 conversion possibilities used up by one character set. One
can imagine the thin coverage these platform provide. Some Unix vendors
even provide only a handful of conversions, which renders them useless for
almost all uses.
<p>This directly leads to a third and probably the most problematic point.
The way the <code>iconv</code> conversion functions are implemented on all
known Unix systems and the availability of the conversion functions from
character set A to B and the conversion from
B to C does <em>not</em> imply that the
conversion from A to C is available.
<p>This might not seem unreasonable and problematic at first, but it is a
quite big problem as one will notice shortly after hitting it. To show
the problem we assume to write a program that has to convert from
A to C. A call like
<pre class="smallexample"> cd = iconv_open ("C", "A");
</pre>
<p class="noindent">fails according to the assumption above. But what does the program
do now? The conversion is necessary; therefore, simply giving up is not
an option.
<p>This is a nuisance. The <code>iconv</code> function should take care of this.
But how should the program proceed from here on? If it tries to convert
to character set B, first the two <code>iconv_open</code>
calls
<pre class="smallexample"> cd1 = iconv_open ("B", "A");
</pre>
<p class="noindent">and
<pre class="smallexample"> cd2 = iconv_open ("C", "B");
</pre>
<p class="noindent">will succeed, but how to find B?
<p>Unfortunately, the answer is: there is no general solution. On some
systems guessing might help. On those systems most character sets can
convert to and from UTF-8 encoded ISO&nbsp;10646<!-- /@w --> or Unicode text. Beside
this only some very system-specific methods can help. Since the
conversion functions come from loadable modules and these modules must
be stored somewhere in the filesystem, one <em>could</em> try to find them
and determine from the available file which conversions are available
and whether there is an indirect route from A to
C.
<p>This example shows one of the design errors of <code>iconv</code> mentioned
above. It should at least be possible to determine the list of available
conversion programmatically so that if <code>iconv_open</code> says there is no
such conversion, one could make sure this also is true for indirect
routes.
</body></html>