| <html lang="en"> |
| <head> |
| <title>Other iconv Implementations - The GNU C Library</title> |
| <meta http-equiv="Content-Type" content="text/html"> |
| <meta name="description" content="The GNU C Library"> |
| <meta name="generator" content="makeinfo 4.13"> |
| <link title="Top" rel="start" href="index.html#Top"> |
| <link rel="up" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion" title="Generic Charset Conversion"> |
| <link rel="prev" href="iconv-Examples.html#iconv-Examples" title="iconv Examples"> |
| <link rel="next" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation" title="glibc iconv Implementation"> |
| <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> |
| <!-- |
| This file documents the GNU C library. |
| |
| This is Edition 0.12, last updated 2007-10-27, |
| of `The GNU C Library Reference Manual', for version |
| 2.8 (Sourcery G++ Lite 2011.03-41). |
| |
| Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, |
| 2003, 2007, 2008, 2010 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 or |
| any later version published by the Free Software Foundation; with the |
| Invariant Sections being ``Free Software Needs Free Documentation'' |
| and ``GNU Lesser General Public License'', the Front-Cover texts being |
| ``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A |
| copy of the license is included in the section entitled "GNU Free |
| Documentation License". |
| |
| (a) The FSF's Back-Cover Text is: ``You have the freedom to |
| copy and modify this GNU manual. Buying copies from the FSF |
| supports it in developing GNU and promoting software freedom.''--> |
| <meta http-equiv="Content-Style-Type" content="text/css"> |
| <style type="text/css"><!-- |
| pre.display { font-family:inherit } |
| pre.format { font-family:inherit } |
| pre.smalldisplay { font-family:inherit; font-size:smaller } |
| pre.smallformat { font-family:inherit; font-size:smaller } |
| pre.smallexample { font-size:smaller } |
| pre.smalllisp { font-size:smaller } |
| span.sc { font-variant:small-caps } |
| span.roman { font-family:serif; font-weight:normal; } |
| span.sansserif { font-family:sans-serif; font-weight:normal; } |
| --></style> |
| <link rel="stylesheet" type="text/css" href="../cs.css"> |
| </head> |
| <body> |
| <div class="node"> |
| <a name="Other-iconv-Implementations"></a> |
| <p> |
| Next: <a rel="next" accesskey="n" href="glibc-iconv-Implementation.html#glibc-iconv-Implementation">glibc iconv Implementation</a>, |
| Previous: <a rel="previous" accesskey="p" href="iconv-Examples.html#iconv-Examples">iconv Examples</a>, |
| Up: <a rel="up" accesskey="u" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion">Generic Charset Conversion</a> |
| <hr> |
| </div> |
| |
| <h4 class="subsection">6.5.3 Some Details about other <code>iconv</code> Implementations</h4> |
| |
| <p>This is not really the place to discuss the <code>iconv</code> implementation |
| of other systems but it is necessary to know a bit about them to write |
| portable programs. The above mentioned problems with the specification |
| of the <code>iconv</code> functions can lead to portability issues. |
| |
| <p>The first thing to notice is that, due to the large number of character |
| sets in use, it is certainly not practical to encode the conversions |
| directly in the C library. Therefore, the conversion information must |
| come from files outside the C library. This is usually done in one or |
| both of the following ways: |
| |
| <ul> |
| <li>The C library contains a set of generic conversion functions that can |
| read the needed conversion tables and other information from data files. |
| These files get loaded when necessary. |
| |
| <p>This solution is problematic as it requires a great deal of effort to |
| apply to all character sets (potentially an infinite set). The |
| differences in the structure of the different character sets is so large |
| that many different variants of the table-processing functions must be |
| developed. In addition, the generic nature of these functions make them |
| slower than specifically implemented functions. |
| |
| <li>The C library only contains a framework that can dynamically load |
| object files and execute the conversion functions contained therein. |
| |
| <p>This solution provides much more flexibility. The C library itself |
| contains only very little code and therefore reduces the general memory |
| footprint. Also, with a documented interface between the C library and |
| the loadable modules it is possible for third parties to extend the set |
| of available conversion modules. A drawback of this solution is that |
| dynamic loading must be available. |
| </ul> |
| |
| <p>Some implementations in commercial Unices implement a mixture of these |
| possibilities; the majority implement only the second solution. Using |
| loadable modules moves the code out of the library itself and keeps |
| the door open for extensions and improvements, but this design is also |
| limiting on some platforms since not many platforms support dynamic |
| loading in statically linked programs. On platforms without this |
| capability it is therefore not possible to use this interface in |
| statically linked programs. The GNU C library has, on ELF platforms, no |
| problems with dynamic loading in these situations; therefore, this |
| point is moot. The danger is that one gets acquainted with this |
| situation and forgets about the restrictions on other systems. |
| |
| <p>A second thing to know about other <code>iconv</code> implementations is that |
| the number of available conversions is often very limited. Some |
| implementations provide, in the standard release (not special |
| international or developer releases), at most 100 to 200 conversion |
| possibilities. This does not mean 200 different character sets are |
| supported; for example, conversions from one character set to a set of 10 |
| others might count as 10 conversions. Together with the other direction |
| this makes 20 conversion possibilities used up by one character set. One |
| can imagine the thin coverage these platform provide. Some Unix vendors |
| even provide only a handful of conversions, which renders them useless for |
| almost all uses. |
| |
| <p>This directly leads to a third and probably the most problematic point. |
| The way the <code>iconv</code> conversion functions are implemented on all |
| known Unix systems and the availability of the conversion functions from |
| character set A to B and the conversion from |
| B to C does <em>not</em> imply that the |
| conversion from A to C is available. |
| |
| <p>This might not seem unreasonable and problematic at first, but it is a |
| quite big problem as one will notice shortly after hitting it. To show |
| the problem we assume to write a program that has to convert from |
| A to C. A call like |
| |
| <pre class="smallexample"> cd = iconv_open ("C", "A"); |
| </pre> |
| <p class="noindent">fails according to the assumption above. But what does the program |
| do now? The conversion is necessary; therefore, simply giving up is not |
| an option. |
| |
| <p>This is a nuisance. The <code>iconv</code> function should take care of this. |
| But how should the program proceed from here on? If it tries to convert |
| to character set B, first the two <code>iconv_open</code> |
| calls |
| |
| <pre class="smallexample"> cd1 = iconv_open ("B", "A"); |
| </pre> |
| <p class="noindent">and |
| |
| <pre class="smallexample"> cd2 = iconv_open ("C", "B"); |
| </pre> |
| <p class="noindent">will succeed, but how to find B? |
| |
| <p>Unfortunately, the answer is: there is no general solution. On some |
| systems guessing might help. On those systems most character sets can |
| convert to and from UTF-8 encoded ISO 10646<!-- /@w --> or Unicode text. Beside |
| this only some very system-specific methods can help. Since the |
| conversion functions come from loadable modules and these modules must |
| be stored somewhere in the filesystem, one <em>could</em> try to find them |
| and determine from the available file which conversions are available |
| and whether there is an indirect route from A to |
| C. |
| |
| <p>This example shows one of the design errors of <code>iconv</code> mentioned |
| above. It should at least be possible to determine the list of available |
| conversion programmatically so that if <code>iconv_open</code> says there is no |
| such conversion, one could make sure this also is true for indirect |
| routes. |
| |
| </body></html> |
| |