blob: 451149d508fa863888c8d7c53536d424f0f256aa [file] [log] [blame]
<html lang="en">
<head>
<title>Low-Level Time String Parsing - The GNU C Library</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The GNU C Library">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Parsing-Date-and-Time.html#Parsing-Date-and-Time" title="Parsing Date and Time">
<link rel="next" href="General-Time-String-Parsing.html#General-Time-String-Parsing" title="General Time String Parsing">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
This file documents the GNU C library.
This is Edition 0.12, last updated 2007-10-27,
of `The GNU C Library Reference Manual', for version
2.8 (Sourcery G++ Lite 2011.03-41).
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
2003, 2007, 2008, 2010 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``Free Software Needs Free Documentation''
and ``GNU Lesser General Public License'', the Front-Cover texts being
``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
(a) The FSF's Back-Cover Text is: ``You have the freedom to
copy and modify this GNU manual. Buying copies from the FSF
supports it in developing GNU and promoting software freedom.''-->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
pre.display { font-family:inherit }
pre.format { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }
pre.smallformat { font-family:inherit; font-size:smaller }
pre.smallexample { font-size:smaller }
pre.smalllisp { font-size:smaller }
span.sc { font-variant:small-caps }
span.roman { font-family:serif; font-weight:normal; }
span.sansserif { font-family:sans-serif; font-weight:normal; }
--></style>
<link rel="stylesheet" type="text/css" href="../cs.css">
</head>
<body>
<div class="node">
<a name="Low-Level-Time-String-Parsing"></a>
<a name="Low_002dLevel-Time-String-Parsing"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="General-Time-String-Parsing.html#General-Time-String-Parsing">General Time String Parsing</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Parsing-Date-and-Time.html#Parsing-Date-and-Time">Parsing Date and Time</a>
<hr>
</div>
<h5 class="subsubsection">21.4.6.1 Interpret string according to given format</h5>
<p>The first function is rather low-level. It is nevertheless frequently
used in software since it is better known. Its interface and
implementation are heavily influenced by the <code>getdate</code> function,
which is defined and implemented in terms of calls to <code>strptime</code>.
<!-- time.h -->
<!-- XPG4 -->
<div class="defun">
&mdash; Function: char * <b>strptime</b> (<var>const char *s, const char *fmt, struct tm *tp</var>)<var><a name="index-strptime-2663"></a></var><br>
<blockquote><p>The <code>strptime</code> function parses the input string <var>s</var> according
to the format string <var>fmt</var> and stores its results in the
structure <var>tp</var>.
<p>The input string could be generated by a <code>strftime</code> call or
obtained any other way. It does not need to be in a human-recognizable
format; e.g. a date passed as <code>"02:1999:9"</code> is acceptable, even
though it is ambiguous without context. As long as the format string
<var>fmt</var> matches the input string the function will succeed.
<p>The user has to make sure, though, that the input can be parsed in a
unambiguous way. The string <code>"1999112"</code> can be parsed using the
format <code>"%Y%m%d"</code> as 1999-1-12, 1999-11-2, or even 19991-1-2. It
is necessary to add appropriate separators to reliably get results.
<p>The format string consists of the same components as the format string
of the <code>strftime</code> function. The only difference is that the flags
<code>_</code>, <code>-</code>, <code>0</code>, and <code>^</code> are not allowed.
<!-- Is this really the intention? -drepper -->
Several of the distinct formats of <code>strftime</code> do the same work in
<code>strptime</code> since differences like case of the input do not matter.
For reasons of symmetry all formats are supported, though.
<p>The modifiers <code>E</code> and <code>O</code> are also allowed everywhere the
<code>strftime</code> function allows them.
<p>The formats are:
<dl>
<dt><code>%a</code><dt><code>%A</code><dd>The weekday name according to the current locale, in abbreviated form or
the full name.
<br><dt><code>%b</code><dt><code>%B</code><dt><code>%h</code><dd>The month name according to the current locale, in abbreviated form or
the full name.
<br><dt><code>%c</code><dd>The date and time representation for the current locale.
<br><dt><code>%Ec</code><dd>Like <code>%c</code> but the locale's alternative date and time format is used.
<br><dt><code>%C</code><dd>The century of the year.
<p>It makes sense to use this format only if the format string also
contains the <code>%y</code> format.
<br><dt><code>%EC</code><dd>The locale's representation of the period.
<p>Unlike <code>%C</code> it sometimes makes sense to use this format since some
cultures represent years relative to the beginning of eras instead of
using the Gregorian years.
<br><dt><code>%d</code><br><dt><code>%e</code><dd>The day of the month as a decimal number (range <code>1</code> through <code>31</code>).
Leading zeroes are permitted but not required.
<br><dt><code>%Od</code><dt><code>%Oe</code><dd>Same as <code>%d</code> but using the locale's alternative numeric symbols.
<p>Leading zeroes are permitted but not required.
<br><dt><code>%D</code><dd>Equivalent to <code>%m/%d/%y</code>.
<br><dt><code>%F</code><dd>Equivalent to <code>%Y-%m-%d</code>, which is the ISO&nbsp;8601<!-- /@w --> date
format.
<p>This is a GNU extension following an ISO&nbsp;C99<!-- /@w --> extension to
<code>strftime</code>.
<br><dt><code>%g</code><dd>The year corresponding to the ISO week number, but without the century
(range <code>00</code> through <code>99</code>).
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<p>This format is a GNU extension following a GNU extension of <code>strftime</code>.
<br><dt><code>%G</code><dd>The year corresponding to the ISO week number.
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<p>This format is a GNU extension following a GNU extension of <code>strftime</code>.
<br><dt><code>%H</code><dt><code>%k</code><dd>The hour as a decimal number, using a 24-hour clock (range <code>00</code> through
<code>23</code>).
<p><code>%k</code> is a GNU extension following a GNU extension of <code>strftime</code>.
<br><dt><code>%OH</code><dd>Same as <code>%H</code> but using the locale's alternative numeric symbols.
<br><dt><code>%I</code><dt><code>%l</code><dd>The hour as a decimal number, using a 12-hour clock (range <code>01</code> through
<code>12</code>).
<p><code>%l</code> is a GNU extension following a GNU extension of <code>strftime</code>.
<br><dt><code>%OI</code><dd>Same as <code>%I</code> but using the locale's alternative numeric symbols.
<br><dt><code>%j</code><dd>The day of the year as a decimal number (range <code>1</code> through <code>366</code>).
<p>Leading zeroes are permitted but not required.
<br><dt><code>%m</code><dd>The month as a decimal number (range <code>1</code> through <code>12</code>).
<p>Leading zeroes are permitted but not required.
<br><dt><code>%Om</code><dd>Same as <code>%m</code> but using the locale's alternative numeric symbols.
<br><dt><code>%M</code><dd>The minute as a decimal number (range <code>0</code> through <code>59</code>).
<p>Leading zeroes are permitted but not required.
<br><dt><code>%OM</code><dd>Same as <code>%M</code> but using the locale's alternative numeric symbols.
<br><dt><code>%n</code><dt><code>%t</code><dd>Matches any white space.
<br><dt><code>%p</code><br><dt><code>%P</code><dd>The locale-dependent equivalent to &lsquo;<samp><span class="samp">AM</span></samp>&rsquo; or &lsquo;<samp><span class="samp">PM</span></samp>&rsquo;.
<p>This format is not useful unless <code>%I</code> or <code>%l</code> is also used.
Another complication is that the locale might not define these values at
all and therefore the conversion fails.
<p><code>%P</code> is a GNU extension following a GNU extension to <code>strftime</code>.
<br><dt><code>%r</code><dd>The complete time using the AM/PM format of the current locale.
<p>A complication is that the locale might not define this format at all
and therefore the conversion fails.
<br><dt><code>%R</code><dd>The hour and minute in decimal numbers using the format <code>%H:%M</code>.
<p><code>%R</code> is a GNU extension following a GNU extension to <code>strftime</code>.
<br><dt><code>%s</code><dd>The number of seconds since the epoch, i.e., since 1970-01-01 00:00:00 UTC.
Leap seconds are not counted unless leap second support is available.
<p><code>%s</code> is a GNU extension following a GNU extension to <code>strftime</code>.
<br><dt><code>%S</code><dd>The seconds as a decimal number (range <code>0</code> through <code>60</code>).
<p>Leading zeroes are permitted but not required.
<p><strong>NB:</strong> The Unix specification says the upper bound on this value
is <code>61</code>, a result of a decision to allow double leap seconds. You
will not see the value <code>61</code> because no minute has more than one
leap second, but the myth persists.
<br><dt><code>%OS</code><dd>Same as <code>%S</code> but using the locale's alternative numeric symbols.
<br><dt><code>%T</code><dd>Equivalent to the use of <code>%H:%M:%S</code> in this place.
<br><dt><code>%u</code><dd>The day of the week as a decimal number (range <code>1</code> through
<code>7</code>), Monday being <code>1</code>.
<p>Leading zeroes are permitted but not required.
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<br><dt><code>%U</code><dd>The week number of the current year as a decimal number (range <code>0</code>
through <code>53</code>).
<p>Leading zeroes are permitted but not required.
<br><dt><code>%OU</code><dd>Same as <code>%U</code> but using the locale's alternative numeric symbols.
<br><dt><code>%V</code><dd>The ISO&nbsp;8601:1988<!-- /@w --> week number as a decimal number (range <code>1</code>
through <code>53</code>).
<p>Leading zeroes are permitted but not required.
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<br><dt><code>%w</code><dd>The day of the week as a decimal number (range <code>0</code> through
<code>6</code>), Sunday being <code>0</code>.
<p>Leading zeroes are permitted but not required.
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<br><dt><code>%Ow</code><dd>Same as <code>%w</code> but using the locale's alternative numeric symbols.
<br><dt><code>%W</code><dd>The week number of the current year as a decimal number (range <code>0</code>
through <code>53</code>).
<p>Leading zeroes are permitted but not required.
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<br><dt><code>%OW</code><dd>Same as <code>%W</code> but using the locale's alternative numeric symbols.
<br><dt><code>%x</code><dd>The date using the locale's date format.
<br><dt><code>%Ex</code><dd>Like <code>%x</code> but the locale's alternative data representation is used.
<br><dt><code>%X</code><dd>The time using the locale's time format.
<br><dt><code>%EX</code><dd>Like <code>%X</code> but the locale's alternative time representation is used.
<br><dt><code>%y</code><dd>The year without a century as a decimal number (range <code>0</code> through
<code>99</code>).
<p>Leading zeroes are permitted but not required.
<p>Note that it is questionable to use this format without
the <code>%C</code> format. The <code>strptime</code> function does regard input
values in the range 68 to 99 as the years 1969 to
1999 and the values 0 to 68 as the years
2000 to 2068. But maybe this heuristic fails for some
input data.
<p>Therefore it is best to avoid <code>%y</code> completely and use <code>%Y</code>
instead.
<br><dt><code>%Ey</code><dd>The offset from <code>%EC</code> in the locale's alternative representation.
<br><dt><code>%Oy</code><dd>The offset of the year (from <code>%C</code>) using the locale's alternative
numeric symbols.
<br><dt><code>%Y</code><dd>The year as a decimal number, using the Gregorian calendar.
<br><dt><code>%EY</code><dd>The full alternative year representation.
<br><dt><code>%z</code><dd>The offset from GMT in ISO&nbsp;8601<!-- /@w -->/RFC822 format.
<br><dt><code>%Z</code><dd>The timezone name.
<p><em>Note:</em> Currently, this is not fully implemented. The format is
recognized, input is consumed but no field in <var>tm</var> is set.
<br><dt><code>%%</code><dd>A literal &lsquo;<samp><span class="samp">%</span></samp>&rsquo; character.
</dl>
<p>All other characters in the format string must have a matching character
in the input string. Exceptions are white spaces in the input string
which can match zero or more whitespace characters in the format string.
<p><strong>Portability Note:</strong> The XPG standard advises applications to use
at least one whitespace character (as specified by <code>isspace</code>) or
other non-alphanumeric characters between any two conversion
specifications. The GNU&nbsp;C&nbsp;Library<!-- /@w --> does not have this limitation but
other libraries might have trouble parsing formats like
<code>"%d%m%Y%H%M%S"</code>.
<p>The <code>strptime</code> function processes the input string from right to
left. Each of the three possible input elements (white space, literal,
or format) are handled one after the other. If the input cannot be
matched to the format string the function stops. The remainder of the
format and input strings are not processed.
<p>The function returns a pointer to the first character it was unable to
process. If the input string contains more characters than required by
the format string the return value points right after the last consumed
input character. If the whole input string is consumed the return value
points to the <code>NULL</code> byte at the end of the string. If an error
occurs, i.e., <code>strptime</code> fails to match all of the format string,
the function returns <code>NULL</code>.
</p></blockquote></div>
<p>The specification of the function in the XPG standard is rather vague,
leaving out a few important pieces of information. Most importantly, it
does not specify what happens to those elements of <var>tm</var> which are
not directly initialized by the different formats. The
implementations on different Unix systems vary here.
<p>The GNU libc implementation does not touch those fields which are not
directly initialized. Exceptions are the <code>tm_wday</code> and
<code>tm_yday</code> elements, which are recomputed if any of the year, month,
or date elements changed. This has two implications:
<ul>
<li>Before calling the <code>strptime</code> function for a new input string, you
should prepare the <var>tm</var> structure you pass. Normally this will mean
initializing all values are to zero. Alternatively, you can set all
fields to values like <code>INT_MAX</code>, allowing you to determine which
elements were set by the function call. Zero does not work here since
it is a valid value for many of the fields.
<p>Careful initialization is necessary if you want to find out whether a
certain field in <var>tm</var> was initialized by the function call.
<li>You can construct a <code>struct tm</code> value with several consecutive
<code>strptime</code> calls. A useful application of this is e.g. the parsing
of two separate strings, one containing date information and the other
time information. By parsing one after the other without clearing the
structure in-between, you can construct a complete broken-down time.
</ul>
<p>The following example shows a function which parses a string which is
contains the date information in either US style or ISO&nbsp;8601<!-- /@w --> form:
<pre class="smallexample"> const char *
parse_date (const char *input, struct tm *tm)
{
const char *cp;
/* <span class="roman">First clear the result structure.</span> */
memset (tm, '\0', sizeof (*tm));
/* <span class="roman">Try the ISO format first.</span> */
cp = strptime (input, "%F", tm);
if (cp == NULL)
{
/* <span class="roman">Does not match. Try the US form.</span> */
cp = strptime (input, "%D", tm);
}
return cp;
}
</pre>
</body></html>