blob: 589591182735473936c692d1e251c23a94ddbbfd [file] [log] [blame]
<html lang="en">
<head>
<title>The message catalog files - The GNU C Library</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The GNU C Library">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Message-catalogs-a-la-X_002fOpen.html#Message-catalogs-a-la-X_002fOpen" title="Message catalogs a la X/Open">
<link rel="prev" href="The-catgets-Functions.html#The-catgets-Functions" title="The catgets Functions">
<link rel="next" href="The-gencat-program.html#The-gencat-program" title="The gencat program">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
This file documents the GNU C library.
This is Edition 0.12, last updated 2007-10-27,
of `The GNU C Library Reference Manual', for version
2.8 (Sourcery G++ Lite 2011.03-41).
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
2003, 2007, 2008, 2010 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``Free Software Needs Free Documentation''
and ``GNU Lesser General Public License'', the Front-Cover texts being
``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
(a) The FSF's Back-Cover Text is: ``You have the freedom to
copy and modify this GNU manual. Buying copies from the FSF
supports it in developing GNU and promoting software freedom.''-->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
pre.display { font-family:inherit }
pre.format { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }
pre.smallformat { font-family:inherit; font-size:smaller }
pre.smallexample { font-size:smaller }
pre.smalllisp { font-size:smaller }
span.sc { font-variant:small-caps }
span.roman { font-family:serif; font-weight:normal; }
span.sansserif { font-family:sans-serif; font-weight:normal; }
--></style>
<link rel="stylesheet" type="text/css" href="../cs.css">
</head>
<body>
<div class="node">
<a name="The-message-catalog-files"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="The-gencat-program.html#The-gencat-program">The gencat program</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="The-catgets-Functions.html#The-catgets-Functions">The catgets Functions</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Message-catalogs-a-la-X_002fOpen.html#Message-catalogs-a-la-X_002fOpen">Message catalogs a la X/Open</a>
<hr>
</div>
<h4 class="subsection">8.1.2 Format of the message catalog files</h4>
<p>The only reasonable way the translate all the messages of a function and
store the result in a message catalog file which can be read by the
<code>catopen</code> function is to write all the message text to the
translator and let her/him translate them all. I.e., we must have a
file with entries which associate the set/message tuple with a specific
translation. This file format is specified in the X/Open standard and
is as follows:
<ul>
<li>Lines containing only whitespace characters or empty lines are ignored.
<li>Lines which contain as the first non-whitespace character a <code>$</code>
followed by a whitespace character are comment and are also ignored.
<li>If a line contains as the first non-whitespace characters the sequence
<code>$set</code> followed by a whitespace character an additional argument
is required to follow. This argument can either be:
<ul>
<li>a number. In this case the value of this number determines the set
to which the following messages are added.
<li>an identifier consisting of alphanumeric characters plus the underscore
character. In this case the set get automatically a number assigned.
This value is one added to the largest set number which so far appeared.
<p>How to use the symbolic names is explained in section <a href="Common-Usage.html#Common-Usage">Common Usage</a>.
<p>It is an error if a symbol name appears more than once. All following
messages are placed in a set with this number.
</ul>
<li>If a line contains as the first non-whitespace characters the sequence
<code>$delset</code> followed by a whitespace character an additional argument
is required to follow. This argument can either be:
<ul>
<li>a number. In this case the value of this number determines the set
which will be deleted.
<li>an identifier consisting of alphanumeric characters plus the underscore
character. This symbolic identifier must match a name for a set which
previously was defined. It is an error if the name is unknown.
</ul>
<p>In both cases all messages in the specified set will be removed. They
will not appear in the output. But if this set is later again selected
with a <code>$set</code> command again messages could be added and these
messages will appear in the output.
<li>If a line contains after leading whitespaces the sequence
<code>$quote</code>, the quoting character used for this input file is
changed to the first non-whitespace character following the
<code>$quote</code>. If no non-whitespace character is present before the
line ends quoting is disable.
<p>By default no quoting character is used. In this mode strings are
terminated with the first unescaped line break. If there is a
<code>$quote</code> sequence present newline need not be escaped. Instead a
string is terminated with the first unescaped appearance of the quote
character.
<p>A common usage of this feature would be to set the quote character to
<code>"</code>. Then any appearance of the <code>"</code> in the strings must
be escaped using the backslash (i.e., <code>\"</code> must be written).
<li>Any other line must start with a number or an alphanumeric identifier
(with the underscore character included). The following characters
(starting after the first whitespace character) will form the string
which gets associated with the currently selected set and the message
number represented by the number and identifier respectively.
<p>If the start of the line is a number the message number is obvious. It
is an error if the same message number already appeared for this set.
<p>If the leading token was an identifier the message number gets
automatically assigned. The value is the current maximum messages
number for this set plus one. It is an error if the identifier was
already used for a message in this set. It is OK to reuse the
identifier for a message in another thread. How to use the symbolic
identifiers will be explained below (see <a href="Common-Usage.html#Common-Usage">Common Usage</a>). There is
one limitation with the identifier: it must not be <code>Set</code>. The
reason will be explained below.
<p>The text of the messages can contain escape characters. The usual bunch
of characters known from the ISO&nbsp;C<!-- /@w --> language are recognized
(<code>\n</code>, <code>\t</code>, <code>\v</code>, <code>\b</code>, <code>\r</code>, <code>\f</code>,
<code>\\</code>, and <code>\</code><var>nnn</var>, where <var>nnn</var> is the octal coding of
a character code).
</ul>
<p><strong>Important:</strong> The handling of identifiers instead of numbers for
the set and messages is a GNU extension. Systems strictly following the
X/Open specification do not have this feature. An example for a message
catalog file is this:
<pre class="smallexample"> $ This is a leading comment.
$quote "
$set SetOne
1 Message with ID 1.
two " Message with ID \"two\", which gets the value 2 assigned"
$set SetTwo
$ Since the last set got the number 1 assigned this set has number 2.
4000 "The numbers can be arbitrary, they need not start at one."
</pre>
<p>This small example shows various aspects:
<ul>
<li>Lines 1 and 9 are comments since they start with <code>$</code> followed by
a whitespace.
<li>The quoting character is set to <code>"</code>. Otherwise the quotes in the
message definition would have to be left away and in this case the
message with the identifier <code>two</code> would loose its leading whitespace.
<li>Mixing numbered messages with message having symbolic names is no
problem and the numbering happens automatically.
</ul>
<p>While this file format is pretty easy it is not the best possible for
use in a running program. The <code>catopen</code> function would have to
parser the file and handle syntactic errors gracefully. This is not so
easy and the whole process is pretty slow. Therefore the <code>catgets</code>
functions expect the data in another more compact and ready-to-use file
format. There is a special program <code>gencat</code> which is explained in
detail in the next section.
<p>Files in this other format are not human readable. To be easy to use by
programs it is a binary file. But the format is byte order independent
so translation files can be shared by systems of arbitrary architecture
(as long as they use the GNU C Library).
<p>Details about the binary file format are not important to know since
these files are always created by the <code>gencat</code> program. The
sources of the GNU C Library also provide the sources for the
<code>gencat</code> program and so the interested reader can look through
these source files to learn about the file format.
</body></html>