| <html lang="en"> |
| <head> |
| <title>String Input Conversions - The GNU C Library</title> |
| <meta http-equiv="Content-Type" content="text/html"> |
| <meta name="description" content="The GNU C Library"> |
| <meta name="generator" content="makeinfo 4.13"> |
| <link title="Top" rel="start" href="index.html#Top"> |
| <link rel="up" href="Formatted-Input.html#Formatted-Input" title="Formatted Input"> |
| <link rel="prev" href="Numeric-Input-Conversions.html#Numeric-Input-Conversions" title="Numeric Input Conversions"> |
| <link rel="next" href="Dynamic-String-Input.html#Dynamic-String-Input" title="Dynamic String Input"> |
| <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> |
| <!-- |
| This file documents the GNU C library. |
| |
| This is Edition 0.12, last updated 2007-10-27, |
| of `The GNU C Library Reference Manual', for version |
| 2.8 (Sourcery G++ Lite 2011.03-41). |
| |
| Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, |
| 2003, 2007, 2008, 2010 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 or |
| any later version published by the Free Software Foundation; with the |
| Invariant Sections being ``Free Software Needs Free Documentation'' |
| and ``GNU Lesser General Public License'', the Front-Cover texts being |
| ``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A |
| copy of the license is included in the section entitled "GNU Free |
| Documentation License". |
| |
| (a) The FSF's Back-Cover Text is: ``You have the freedom to |
| copy and modify this GNU manual. Buying copies from the FSF |
| supports it in developing GNU and promoting software freedom.''--> |
| <meta http-equiv="Content-Style-Type" content="text/css"> |
| <style type="text/css"><!-- |
| pre.display { font-family:inherit } |
| pre.format { font-family:inherit } |
| pre.smalldisplay { font-family:inherit; font-size:smaller } |
| pre.smallformat { font-family:inherit; font-size:smaller } |
| pre.smallexample { font-size:smaller } |
| pre.smalllisp { font-size:smaller } |
| span.sc { font-variant:small-caps } |
| span.roman { font-family:serif; font-weight:normal; } |
| span.sansserif { font-family:sans-serif; font-weight:normal; } |
| --></style> |
| <link rel="stylesheet" type="text/css" href="../cs.css"> |
| </head> |
| <body> |
| <div class="node"> |
| <a name="String-Input-Conversions"></a> |
| <p> |
| Next: <a rel="next" accesskey="n" href="Dynamic-String-Input.html#Dynamic-String-Input">Dynamic String Input</a>, |
| Previous: <a rel="previous" accesskey="p" href="Numeric-Input-Conversions.html#Numeric-Input-Conversions">Numeric Input Conversions</a>, |
| Up: <a rel="up" accesskey="u" href="Formatted-Input.html#Formatted-Input">Formatted Input</a> |
| <hr> |
| </div> |
| |
| <h4 class="subsection">12.14.5 String Input Conversions</h4> |
| |
| <p>This section describes the <code>scanf</code> input conversions for reading |
| string and character values: ‘<samp><span class="samp">%s</span></samp>’, ‘<samp><span class="samp">%S</span></samp>’, ‘<samp><span class="samp">%[</span></samp>’, ‘<samp><span class="samp">%c</span></samp>’, |
| and ‘<samp><span class="samp">%C</span></samp>’. |
| |
| <p>You have two options for how to receive the input from these |
| conversions: |
| |
| <ul> |
| <li>Provide a buffer to store it in. This is the default. You should |
| provide an argument of type <code>char *</code> or <code>wchar_t *</code> (the |
| latter of the ‘<samp><span class="samp">l</span></samp>’ modifier is present). |
| |
| <p><strong>Warning:</strong> To make a robust program, you must make sure that the |
| input (plus its terminating null) cannot possibly exceed the size of the |
| buffer you provide. In general, the only way to do this is to specify a |
| maximum field width one less than the buffer size. <strong>If you |
| provide the buffer, always specify a maximum field width to prevent |
| overflow.</strong> |
| |
| <li>Ask <code>scanf</code> to allocate a big enough buffer, by specifying the |
| ‘<samp><span class="samp">a</span></samp>’ flag character. This is a GNU extension. You should provide |
| an argument of type <code>char **</code> for the buffer address to be stored |
| in. See <a href="Dynamic-String-Input.html#Dynamic-String-Input">Dynamic String Input</a>. |
| </ul> |
| |
| <p>The ‘<samp><span class="samp">%c</span></samp>’ conversion is the simplest: it matches a fixed number of |
| characters, always. The maximum field width says how many characters to |
| read; if you don't specify the maximum, the default is 1. This |
| conversion doesn't append a null character to the end of the text it |
| reads. It also does not skip over initial whitespace characters. It |
| reads precisely the next <var>n</var> characters, and fails if it cannot get |
| that many. Since there is always a maximum field width with ‘<samp><span class="samp">%c</span></samp>’ |
| (whether specified, or 1 by default), you can always prevent overflow by |
| making the buffer long enough. |
| <!-- Is character == byte here??? -drepper --> |
| |
| <p>If the format is ‘<samp><span class="samp">%lc</span></samp>’ or ‘<samp><span class="samp">%C</span></samp>’ the function stores wide |
| characters which are converted using the conversion determined at the |
| time the stream was opened from the external byte stream. The number of |
| bytes read from the medium is limited by <code>MB_CUR_LEN * </code><var>n</var> but |
| at most <var>n</var> wide character get stored in the output string. |
| |
| <p>The ‘<samp><span class="samp">%s</span></samp>’ conversion matches a string of non-whitespace characters. |
| It skips and discards initial whitespace, but stops when it encounters |
| more whitespace after having read something. It stores a null character |
| at the end of the text that it reads. |
| |
| <p>For example, reading the input: |
| |
| <pre class="smallexample"> hello, world |
| </pre> |
| <p class="noindent">with the conversion ‘<samp><span class="samp">%10c</span></samp>’ produces <code>" hello, wo"</code>, but |
| reading the same input with the conversion ‘<samp><span class="samp">%10s</span></samp>’ produces |
| <code>"hello,"</code>. |
| |
| <p><strong>Warning:</strong> If you do not specify a field width for ‘<samp><span class="samp">%s</span></samp>’, |
| then the number of characters read is limited only by where the next |
| whitespace character appears. This almost certainly means that invalid |
| input can make your program crash—which is a bug. |
| |
| <p>The ‘<samp><span class="samp">%ls</span></samp>’ and ‘<samp><span class="samp">%S</span></samp>’ format are handled just like ‘<samp><span class="samp">%s</span></samp>’ |
| except that the external byte sequence is converted using the conversion |
| associated with the stream to wide characters with their own encoding. |
| A width or precision specified with the format do not directly determine |
| how many bytes are read from the stream since they measure wide |
| characters. But an upper limit can be computed by multiplying the value |
| of the width or precision by <code>MB_CUR_MAX</code>. |
| |
| <p>To read in characters that belong to an arbitrary set of your choice, |
| use the ‘<samp><span class="samp">%[</span></samp>’ conversion. You specify the set between the ‘<samp><span class="samp">[</span></samp>’ |
| character and a following ‘<samp><span class="samp">]</span></samp>’ character, using the same syntax used |
| in regular expressions. As special cases: |
| |
| <ul> |
| <li>A literal ‘<samp><span class="samp">]</span></samp>’ character can be specified as the first character |
| of the set. |
| |
| <li>An embedded ‘<samp><span class="samp">-</span></samp>’ character (that is, one that is not the first or |
| last character of the set) is used to specify a range of characters. |
| |
| <li>If a caret character ‘<samp><span class="samp">^</span></samp>’ immediately follows the initial ‘<samp><span class="samp">[</span></samp>’, |
| then the set of allowed input characters is the everything <em>except</em> |
| the characters listed. |
| </ul> |
| |
| <p>The ‘<samp><span class="samp">%[</span></samp>’ conversion does not skip over initial whitespace |
| characters. |
| |
| <p>Here are some examples of ‘<samp><span class="samp">%[</span></samp>’ conversions and what they mean: |
| |
| <dl> |
| <dt>‘<samp><span class="samp">%25[1234567890]</span></samp>’<dd>Matches a string of up to 25 digits. |
| |
| <br><dt>‘<samp><span class="samp">%25[][]</span></samp>’<dd>Matches a string of up to 25 square brackets. |
| |
| <br><dt>‘<samp><span class="samp">%25[^ \f\n\r\t\v]</span></samp>’<dd>Matches a string up to 25 characters long that doesn't contain any of |
| the standard whitespace characters. This is slightly different from |
| ‘<samp><span class="samp">%s</span></samp>’, because if the input begins with a whitespace character, |
| ‘<samp><span class="samp">%[</span></samp>’ reports a matching failure while ‘<samp><span class="samp">%s</span></samp>’ simply discards the |
| initial whitespace. |
| |
| <br><dt>‘<samp><span class="samp">%25[a-z]</span></samp>’<dd>Matches up to 25 lowercase characters. |
| </dl> |
| |
| <p>As for ‘<samp><span class="samp">%c</span></samp>’ and ‘<samp><span class="samp">%s</span></samp>’ the ‘<samp><span class="samp">%[</span></samp>’ format is also modified to |
| produce wide characters if the ‘<samp><span class="samp">l</span></samp>’ modifier is present. All what |
| is said about ‘<samp><span class="samp">%ls</span></samp>’ above is true for ‘<samp><span class="samp">%l[</span></samp>’. |
| |
| <p>One more reminder: the ‘<samp><span class="samp">%s</span></samp>’ and ‘<samp><span class="samp">%[</span></samp>’ conversions are |
| <strong>dangerous</strong> if you don't specify a maximum width or use the |
| ‘<samp><span class="samp">a</span></samp>’ flag, because input too long would overflow whatever buffer you |
| have provided for it. No matter how long your buffer is, a user could |
| supply input that is longer. A well-written program reports invalid |
| input with a comprehensible error message, not with a crash. |
| |
| </body></html> |
| |