blob: a51a3114c1de4da4258953e2afdd93ce55771d9e [file] [log] [blame]
<html lang="en">
<head>
<title>Advanced gettext functions - The GNU C Library</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The GNU C Library">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Message-catalogs-with-gettext.html#Message-catalogs-with-gettext" title="Message catalogs with gettext">
<link rel="prev" href="Locating-gettext-catalog.html#Locating-gettext-catalog" title="Locating gettext catalog">
<link rel="next" href="Charset-conversion-in-gettext.html#Charset-conversion-in-gettext" title="Charset conversion in gettext">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
This file documents the GNU C library.
This is Edition 0.12, last updated 2007-10-27,
of `The GNU C Library Reference Manual', for version
2.8 (Sourcery G++ Lite 2011.03-41).
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
2003, 2007, 2008, 2010 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``Free Software Needs Free Documentation''
and ``GNU Lesser General Public License'', the Front-Cover texts being
``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
(a) The FSF's Back-Cover Text is: ``You have the freedom to
copy and modify this GNU manual. Buying copies from the FSF
supports it in developing GNU and promoting software freedom.''-->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
pre.display { font-family:inherit }
pre.format { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }
pre.smallformat { font-family:inherit; font-size:smaller }
pre.smallexample { font-size:smaller }
pre.smalllisp { font-size:smaller }
span.sc { font-variant:small-caps }
span.roman { font-family:serif; font-weight:normal; }
span.sansserif { font-family:sans-serif; font-weight:normal; }
--></style>
<link rel="stylesheet" type="text/css" href="../cs.css">
</head>
<body>
<div class="node">
<a name="Advanced-gettext-functions"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="Charset-conversion-in-gettext.html#Charset-conversion-in-gettext">Charset conversion in gettext</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="Locating-gettext-catalog.html#Locating-gettext-catalog">Locating gettext catalog</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Message-catalogs-with-gettext.html#Message-catalogs-with-gettext">Message catalogs with gettext</a>
<hr>
</div>
<h5 class="subsubsection">8.2.1.3 Additional functions for more complicated situations</h5>
<p>The functions of the <code>gettext</code> family described so far (and all the
<code>catgets</code> functions as well) have one problem in the real world
which have been neglected completely in all existing approaches. What
is meant here is the handling of plural forms.
<p>Looking through Unix source code before the time anybody thought about
internationalization (and, sadly, even afterwards) one can often find
code similar to the following:
<pre class="smallexample"> printf ("%d file%s deleted", n, n == 1 ? "" : "s");
</pre>
<p class="noindent">After the first complaints from people internationalizing the code people
either completely avoided formulations like this or used strings like
<code>"file(s)"</code>. Both look unnatural and should be avoided. First
tries to solve the problem correctly looked like this:
<pre class="smallexample"> if (n == 1)
printf ("%d file deleted", n);
else
printf ("%d files deleted", n);
</pre>
<p>But this does not solve the problem. It helps languages where the
plural form of a noun is not simply constructed by adding an `s' but
that is all. Once again people fell into the trap of believing the
rules their language is using are universal. But the handling of plural
forms differs widely between the language families. There are two
things we can differ between (and even inside language families);
<ul>
<li>The form how plural forms are build differs. This is a problem with
language which have many irregularities. German, for instance, is a
drastic case. Though English and German are part of the same language
family (Germanic), the almost regular forming of plural noun forms
(appending an `s') is hardly found in German.
<li>The number of plural forms differ. This is somewhat surprising for
those who only have experiences with Romanic and Germanic languages
since here the number is the same (there are two).
<p>But other language families have only one form or many forms. More
information on this in an extra section.
</ul>
<p>The consequence of this is that application writers should not try to
solve the problem in their code. This would be localization since it is
only usable for certain, hardcoded language environments. Instead the
extended <code>gettext</code> interface should be used.
<p>These extra functions are taking instead of the one key string two
strings and an numerical argument. The idea behind this is that using
the numerical argument and the first string as a key, the implementation
can select using rules specified by the translator the right plural
form. The two string arguments then will be used to provide a return
value in case no message catalog is found (similar to the normal
<code>gettext</code> behavior). In this case the rules for Germanic language
is used and it is assumed that the first string argument is the singular
form, the second the plural form.
<p>This has the consequence that programs without language catalogs can
display the correct strings only if the program itself is written using
a Germanic language. This is a limitation but since the GNU C library
(as well as the GNU <code>gettext</code> package) are written as part of the
GNU package and the coding standards for the GNU project require program
being written in English, this solution nevertheless fulfills its
purpose.
<!-- libintl.h -->
<!-- GNU -->
<div class="defun">
&mdash; Function: char * <b>ngettext</b> (<var>const char *msgid1, const char *msgid2, unsigned long int n</var>)<var><a name="index-ngettext-813"></a></var><br>
<blockquote><p>The <code>ngettext</code> function is similar to the <code>gettext</code> function
as it finds the message catalogs in the same way. But it takes two
extra arguments. The <var>msgid1</var> parameter must contain the singular
form of the string to be converted. It is also used as the key for the
search in the catalog. The <var>msgid2</var> parameter is the plural form.
The parameter <var>n</var> is used to determine the plural form. If no
message catalog is found <var>msgid1</var> is returned if <code>n == 1</code>,
otherwise <code>msgid2</code>.
<p>An example for the us of this function is:
<pre class="smallexample"> printf (ngettext ("%d file removed", "%d files removed", n), n);
</pre>
<p>Please note that the numeric value <var>n</var> has to be passed to the
<code>printf</code> function as well. It is not sufficient to pass it only to
<code>ngettext</code>.
</p></blockquote></div>
<!-- libintl.h -->
<!-- GNU -->
<div class="defun">
&mdash; Function: char * <b>dngettext</b> (<var>const char *domain, const char *msgid1, const char *msgid2, unsigned long int n</var>)<var><a name="index-dngettext-814"></a></var><br>
<blockquote><p>The <code>dngettext</code> is similar to the <code>dgettext</code> function in the
way the message catalog is selected. The difference is that it takes
two extra parameter to provide the correct plural form. These two
parameters are handled in the same way <code>ngettext</code> handles them.
</p></blockquote></div>
<!-- libintl.h -->
<!-- GNU -->
<div class="defun">
&mdash; Function: char * <b>dcngettext</b> (<var>const char *domain, const char *msgid1, const char *msgid2, unsigned long int n, int category</var>)<var><a name="index-dcngettext-815"></a></var><br>
<blockquote><p>The <code>dcngettext</code> is similar to the <code>dcgettext</code> function in the
way the message catalog is selected. The difference is that it takes
two extra parameter to provide the correct plural form. These two
parameters are handled in the same way <code>ngettext</code> handles them.
</p></blockquote></div>
<h5 class="subsubheading">The problem of plural forms</h5>
<p>A description of the problem can be found at the beginning of the last
section. Now there is the question how to solve it. Without the input
of linguists (which was not available) it was not possible to determine
whether there are only a few different forms in which plural forms are
formed or whether the number can increase with every new supported
language.
<p>Therefore the solution implemented is to allow the translator to specify
the rules of how to select the plural form. Since the formula varies
with every language this is the only viable solution except for
hardcoding the information in the code (which still would require the
possibility of extensions to not prevent the use of new languages). The
details are explained in the GNU <code>gettext</code> manual. Here only a
bit of information is provided.
<p>The information about the plural form selection has to be stored in the
header entry (the one with the empty (<code>msgid</code> string). It looks
like this:
<pre class="smallexample"> Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
</pre>
<p>The <code>nplurals</code> value must be a decimal number which specifies how
many different plural forms exist for this language. The string
following <code>plural</code> is an expression which is using the C language
syntax. Exceptions are that no negative number are allowed, numbers
must be decimal, and the only variable allowed is <code>n</code>. This
expression will be evaluated whenever one of the functions
<code>ngettext</code>, <code>dngettext</code>, or <code>dcngettext</code> is called. The
numeric value passed to these functions is then substituted for all uses
of the variable <code>n</code> in the expression. The resulting value then
must be greater or equal to zero and smaller than the value given as the
value of <code>nplurals</code>.
<p class="noindent">The following rules are known at this point. The language with families
are listed. But this does not necessarily mean the information can be
generalized for the whole family (as can be easily seen in the table
below).<a rel="footnote" href="#fn-1" name="fnd-1"><sup>1</sup></a>
<dl>
<dt>Only one form:<dd>Some languages only require one single form. There is no distinction
between the singular and plural form. An appropriate header entry
would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=1; plural=0;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Finno-Ugric family<dd>Hungarian
<br><dt>Asian family<dd>Japanese, Korean
<br><dt>Turkic/Altaic family<dd>Turkish
</dl>
<br><dt>Two forms, singular used for one only<dd>This is the form used in most existing programs since it is what English
is using. A header entry would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=2; plural=n != 1;
</pre>
<p>(Note: this uses the feature of C expressions that boolean expressions
have to value zero or one.)
<p class="noindent">Languages with this property include:
<dl>
<dt>Germanic family<dd>Danish, Dutch, English, German, Norwegian, Swedish
<br><dt>Finno-Ugric family<dd>Estonian, Finnish
<br><dt>Latin/Greek family<dd>Greek
<br><dt>Semitic family<dd>Hebrew
<br><dt>Romance family<dd>Italian, Portuguese, Spanish
<br><dt>Artificial<dd>Esperanto
</dl>
<br><dt>Two forms, singular used for zero and one<dd>Exceptional case in the language family. The header entry would be:
<pre class="smallexample"> Plural-Forms: nplurals=2; plural=n&gt;1;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Romanic family<dd>French, Brazilian Portuguese
</dl>
<br><dt>Three forms, special case for zero<dd>The header entry would be:
<pre class="smallexample"> Plural-Forms: nplurals=3; plural=n%10==1 &amp;&amp; n%100!=11 ? 0 : n != 0 ? 1 : 2;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Baltic family<dd>Latvian
</dl>
<br><dt>Three forms, special cases for one and two<dd>The header entry would be:
<pre class="smallexample"> Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Celtic<dd>Gaeilge (Irish)
</dl>
<br><dt>Three forms, special case for numbers ending in 1[2-9]<dd>The header entry would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=3; \
plural=n%10==1 &amp;&amp; n%100!=11 ? 0 : \
n%10&gt;=2 &amp;&amp; (n%100&lt;10 || n%100&gt;=20) ? 1 : 2;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Baltic family<dd>Lithuanian
</dl>
<br><dt>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]<dd>The header entry would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=3; \
plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10&gt;3 ? 2 : 1;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Slavic family<dd>Croatian, Czech, Russian, Ukrainian
</dl>
<br><dt>Three forms, special cases for 1 and 2, 3, 4<dd>The header entry would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=3; \
plural=(n==1) ? 1 : (n&gt;=2 &amp;&amp; n&lt;=4) ? 2 : 0;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Slavic family<dd>Slovak
</dl>
<br><dt>Three forms, special case for one and some numbers ending in 2, 3, or 4<dd>The header entry would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=3; \
plural=n==1 ? 0 : \
n%10&gt;=2 &amp;&amp; n%10&lt;=4 &amp;&amp; (n%100&lt;10 || n%100&gt;=20) ? 1 : 2;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Slavic family<dd>Polish
</dl>
<br><dt>Four forms, special case for one and all numbers ending in 02, 03, or 04<dd>The header entry would look like this:
<pre class="smallexample"> Plural-Forms: nplurals=4; \
plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
</pre>
<p class="noindent">Languages with this property include:
<dl>
<dt>Slavic family<dd>Slovenian
</dl>
</dl>
<div class="footnote">
<hr>
<h4>Footnotes</h4><p class="footnote"><small>[<a name="fn-1" href="#fnd-1">1</a>]</small> Additions are welcome. Send appropriate information to
<a href="mailto:bug-glibc-manual@gnu.org">bug-glibc-manual@gnu.org</a>.</p>
<hr></div>
</body></html>