| <html lang="en"> |
| <head> |
| <title>iconv Examples - The GNU C Library</title> |
| <meta http-equiv="Content-Type" content="text/html"> |
| <meta name="description" content="The GNU C Library"> |
| <meta name="generator" content="makeinfo 4.13"> |
| <link title="Top" rel="start" href="index.html#Top"> |
| <link rel="up" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion" title="Generic Charset Conversion"> |
| <link rel="prev" href="Generic-Conversion-Interface.html#Generic-Conversion-Interface" title="Generic Conversion Interface"> |
| <link rel="next" href="Other-iconv-Implementations.html#Other-iconv-Implementations" title="Other iconv Implementations"> |
| <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> |
| <!-- |
| This file documents the GNU C library. |
| |
| This is Edition 0.12, last updated 2007-10-27, |
| of `The GNU C Library Reference Manual', for version |
| 2.8 (Sourcery G++ Lite 2011.03-41). |
| |
| Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, |
| 2003, 2007, 2008, 2010 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 or |
| any later version published by the Free Software Foundation; with the |
| Invariant Sections being ``Free Software Needs Free Documentation'' |
| and ``GNU Lesser General Public License'', the Front-Cover texts being |
| ``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A |
| copy of the license is included in the section entitled "GNU Free |
| Documentation License". |
| |
| (a) The FSF's Back-Cover Text is: ``You have the freedom to |
| copy and modify this GNU manual. Buying copies from the FSF |
| supports it in developing GNU and promoting software freedom.''--> |
| <meta http-equiv="Content-Style-Type" content="text/css"> |
| <style type="text/css"><!-- |
| pre.display { font-family:inherit } |
| pre.format { font-family:inherit } |
| pre.smalldisplay { font-family:inherit; font-size:smaller } |
| pre.smallformat { font-family:inherit; font-size:smaller } |
| pre.smallexample { font-size:smaller } |
| pre.smalllisp { font-size:smaller } |
| span.sc { font-variant:small-caps } |
| span.roman { font-family:serif; font-weight:normal; } |
| span.sansserif { font-family:sans-serif; font-weight:normal; } |
| --></style> |
| <link rel="stylesheet" type="text/css" href="../cs.css"> |
| </head> |
| <body> |
| <div class="node"> |
| <a name="iconv-Examples"></a> |
| <p> |
| Next: <a rel="next" accesskey="n" href="Other-iconv-Implementations.html#Other-iconv-Implementations">Other iconv Implementations</a>, |
| Previous: <a rel="previous" accesskey="p" href="Generic-Conversion-Interface.html#Generic-Conversion-Interface">Generic Conversion Interface</a>, |
| Up: <a rel="up" accesskey="u" href="Generic-Charset-Conversion.html#Generic-Charset-Conversion">Generic Charset Conversion</a> |
| <hr> |
| </div> |
| |
| <h4 class="subsection">6.5.2 A complete <code>iconv</code> example</h4> |
| |
| <p>The example below features a solution for a common problem. Given that |
| one knows the internal encoding used by the system for <code>wchar_t</code> |
| strings, one often is in the position to read text from a file and store |
| it in wide character buffers. One can do this using <code>mbsrtowcs</code>, |
| but then we run into the problems discussed above. |
| |
| <pre class="smallexample"> int |
| file2wcs (int fd, const char *charset, wchar_t *outbuf, size_t avail) |
| { |
| char inbuf[BUFSIZ]; |
| size_t insize = 0; |
| char *wrptr = (char *) outbuf; |
| int result = 0; |
| iconv_t cd; |
| |
| cd = iconv_open ("WCHAR_T", charset); |
| if (cd == (iconv_t) -1) |
| { |
| /* <span class="roman">Something went wrong.</span> */ |
| if (errno == EINVAL) |
| error (0, 0, "conversion from '%s' to wchar_t not available", |
| charset); |
| else |
| perror ("iconv_open"); |
| |
| /* <span class="roman">Terminate the output string.</span> */ |
| *outbuf = L'\0'; |
| |
| return -1; |
| } |
| |
| while (avail > 0) |
| { |
| size_t nread; |
| size_t nconv; |
| char *inptr = inbuf; |
| |
| /* <span class="roman">Read more input.</span> */ |
| nread = read (fd, inbuf + insize, sizeof (inbuf) - insize); |
| if (nread == 0) |
| { |
| /* <span class="roman">When we come here the file is completely read.</span> |
| <span class="roman">This still could mean there are some unused</span> |
| <span class="roman">characters in the </span><code>inbuf</code><span class="roman">. Put them back.</span> */ |
| if (lseek (fd, -insize, SEEK_CUR) == -1) |
| result = -1; |
| |
| /* <span class="roman">Now write out the byte sequence to get into the</span> |
| <span class="roman">initial state if this is necessary.</span> */ |
| iconv (cd, NULL, NULL, &wrptr, &avail); |
| |
| break; |
| } |
| insize += nread; |
| |
| /* <span class="roman">Do the conversion.</span> */ |
| nconv = iconv (cd, &inptr, &insize, &wrptr, &avail); |
| if (nconv == (size_t) -1) |
| { |
| /* <span class="roman">Not everything went right. It might only be</span> |
| <span class="roman">an unfinished byte sequence at the end of the</span> |
| <span class="roman">buffer. Or it is a real problem.</span> */ |
| if (errno == EINVAL) |
| /* <span class="roman">This is harmless. Simply move the unused</span> |
| <span class="roman">bytes to the beginning of the buffer so that</span> |
| <span class="roman">they can be used in the next round.</span> */ |
| memmove (inbuf, inptr, insize); |
| else |
| { |
| /* <span class="roman">It is a real problem. Maybe we ran out of</span> |
| <span class="roman">space in the output buffer or we have invalid</span> |
| <span class="roman">input. In any case back the file pointer to</span> |
| <span class="roman">the position of the last processed byte.</span> */ |
| lseek (fd, -insize, SEEK_CUR); |
| result = -1; |
| break; |
| } |
| } |
| } |
| |
| /* <span class="roman">Terminate the output string.</span> */ |
| if (avail >= sizeof (wchar_t)) |
| *((wchar_t *) wrptr) = L'\0'; |
| |
| if (iconv_close (cd) != 0) |
| perror ("iconv_close"); |
| |
| return (wchar_t *) wrptr - outbuf; |
| } |
| </pre> |
| <p><a name="index-stateful-676"></a>This example shows the most important aspects of using the <code>iconv</code> |
| functions. It shows how successive calls to <code>iconv</code> can be used to |
| convert large amounts of text. The user does not have to care about |
| stateful encodings as the functions take care of everything. |
| |
| <p>An interesting point is the case where <code>iconv</code> returns an error and |
| <code>errno</code> is set to <code>EINVAL</code>. This is not really an error in the |
| transformation. It can happen whenever the input character set contains |
| byte sequences of more than one byte for some character and texts are not |
| processed in one piece. In this case there is a chance that a multibyte |
| sequence is cut. The caller can then simply read the remainder of the |
| takes and feed the offending bytes together with new character from the |
| input to <code>iconv</code> and continue the work. The internal state kept in |
| the descriptor is <em>not</em> unspecified after such an event as is the |
| case with the conversion functions from the ISO C<!-- /@w --> standard. |
| |
| <p>The example also shows the problem of using wide character strings with |
| <code>iconv</code>. As explained in the description of the <code>iconv</code> |
| function above, the function always takes a pointer to a <code>char</code> |
| array and the available space is measured in bytes. In the example, the |
| output buffer is a wide character buffer; therefore, we use a local |
| variable <var>wrptr</var> of type <code>char *</code>, which is used in the |
| <code>iconv</code> calls. |
| |
| <p>This looks rather innocent but can lead to problems on platforms that |
| have tight restriction on alignment. Therefore the caller of <code>iconv</code> |
| has to make sure that the pointers passed are suitable for access of |
| characters from the appropriate character set. Since, in the |
| above case, the input parameter to the function is a <code>wchar_t</code> |
| pointer, this is the case (unless the user violates alignment when |
| computing the parameter). But in other situations, especially when |
| writing generic functions where one does not know what type of character |
| set one uses and, therefore, treats text as a sequence of bytes, it might |
| become tricky. |
| |
| </body></html> |
| |