arm-2011.03/share/doc/arm-arm-none-linux-gnueabi/html/libc/Subexpression-Complications.html - nest-cam/v366/arm-none-linux-gnueabi-i686-pc-linux-gnu - Git at Google

 <html lang="en">
 <head>
 <title>Subexpression Complications - The GNU C Library</title>
 <meta http-equiv="Content-Type" content="text/html">
 <meta name="description" content="The GNU C Library">
 <meta name="generator" content="makeinfo 4.13">
 <link title="Top" rel="start" href="index.html#Top">
 <link rel="up" href="Regular-Expressions.html#Regular-Expressions" title="Regular Expressions">
 <link rel="prev" href="Regexp-Subexpressions.html#Regexp-Subexpressions" title="Regexp Subexpressions">
 <link rel="next" href="Regexp-Cleanup.html#Regexp-Cleanup" title="Regexp Cleanup">
 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
 <!--
 This file documents the GNU C library.

 This is Edition 0.12, last updated 2007-10-27,
 of `The GNU C Library Reference Manual', for version
 2.8 (Sourcery G++ Lite 2011.03-41).

 Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
 2003, 2007, 2008, 2010 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
 any later version published by the Free Software Foundation; with the
 Invariant Sections being ``Free Software Needs Free Documentation''
 and ``GNU Lesser General Public License'', the Front-Cover texts being
 ``A GNU Manual'', and with the Back-Cover Texts as in (a) below.  A
 copy of the license is included in the section entitled "GNU Free
 Documentation License".

 (a) The FSF's Back-Cover Text is: ``You have the freedom to
 copy and modify this GNU manual.  Buying copies from the FSF
 supports it in developing GNU and promoting software freedom.''-->
 <meta http-equiv="Content-Style-Type" content="text/css">
 <style type="text/css"><!--
   pre.display { font-family:inherit }
   pre.format  { font-family:inherit }
   pre.smalldisplay { font-family:inherit; font-size:smaller }
   pre.smallformat  { font-family:inherit; font-size:smaller }
   pre.smallexample { font-size:smaller }
   pre.smalllisp    { font-size:smaller }
   span.sc    { font-variant:small-caps }
   span.roman { font-family:serif; font-weight:normal; }
   span.sansserif { font-family:sans-serif; font-weight:normal; }
 --></style>
 <link rel="stylesheet" type="text/css" href="../cs.css">
 </head>
 <body>
 <div class="node">
 <a name="Subexpression-Complications"></a>
 <p>
 Next:&nbsp;<a rel="next" accesskey="n" href="Regexp-Cleanup.html#Regexp-Cleanup">Regexp Cleanup</a>,
 Previous:&nbsp;<a rel="previous" accesskey="p" href="Regexp-Subexpressions.html#Regexp-Subexpressions">Regexp Subexpressions</a>,
 Up:&nbsp;<a rel="up" accesskey="u" href="Regular-Expressions.html#Regular-Expressions">Regular Expressions</a>
 <hr>
 </div>

 <h4 class="subsection">10.3.5 Complications in Subexpression Matching</h4>

 <p>Sometimes a subexpression matches a substring of no characters.  This
 happens when &lsquo;<samp><span class="samp">f\(o*\)</span></samp>&rsquo; matches the string &lsquo;<samp><span class="samp">fum</span></samp>&rsquo;.  (It really
 matches just the &lsquo;<samp><span class="samp">f</span></samp>&rsquo;.)  In this case, both of the offsets identify
 the point in the string where the null substring was found.  In this
 example, the offsets are both <code>1</code>.

    <p>Sometimes the entire regular expression can match without using some of
 its subexpressions at all&mdash;for example, when &lsquo;<samp><span class="samp">ba\(na\)*</span></samp>&rsquo; matches the
 string &lsquo;<samp><span class="samp">ba</span></samp>&rsquo;, the parenthetical subexpression is not used.  When
 this happens, <code>regexec</code> stores <code>-1</code> in both fields of the
 element for that subexpression.

    <p>Sometimes matching the entire regular expression can match a particular
 subexpression more than once&mdash;for example, when &lsquo;<samp><span class="samp">ba\(na\)*</span></samp>&rsquo;
 matches the string &lsquo;<samp><span class="samp">bananana</span></samp>&rsquo;, the parenthetical subexpression
 matches three times.  When this happens, <code>regexec</code> usually stores
 the offsets of the last part of the string that matched the
 subexpression.  In the case of &lsquo;<samp><span class="samp">bananana</span></samp>&rsquo;, these offsets are
 <code>6</code> and <code>8</code>.

    <p>But the last match is not always the one that is chosen.  It's more
 accurate to say that the last <em>opportunity</em> to match is the one
 that takes precedence.  What this means is that when one subexpression
 appears within another, then the results reported for the inner
 subexpression reflect whatever happened on the last match of the outer
 subexpression.  For an example, consider &lsquo;<samp><span class="samp">\(ba\(na\)*s \)*</span></samp>&rsquo; matching
 the string &lsquo;<samp><span class="samp">bananas bas </span></samp>&rsquo;.  The last time the inner expression
 actually matches is near the end of the first word.  But it is
 <em>considered</em> again in the second word, and fails to match there.
 <code>regexec</code> reports nonuse of the &ldquo;na&rdquo; subexpression.

    <p>Another place where this rule applies is when the regular expression
 <pre class="smallexample">     \(ba\(na\)*s \|nefer\(ti\)* \)*
 </pre>
    <p class="noindent">matches &lsquo;<samp><span class="samp">bananas nefertiti</span></samp>&rsquo;.  The &ldquo;na&rdquo; subexpression does match
 in the first word, but it doesn't match in the second word because the
 other alternative is used there.  Once again, the second repetition of
 the outer subexpression overrides the first, and within that second
 repetition, the &ldquo;na&rdquo; subexpression is not used.  So <code>regexec</code>
 reports nonuse of the &ldquo;na&rdquo; subexpression.

    </body></html>
	<html lang="en">
	<head>
	<title>Subexpression Complications - The GNU C Library</title>
	<meta http-equiv="Content-Type" content="text/html">
	<meta name="description" content="The GNU C Library">
	<meta name="generator" content="makeinfo 4.13">
	<link title="Top" rel="start" href="index.html#Top">
	<link rel="up" href="Regular-Expressions.html#Regular-Expressions" title="Regular Expressions">
	<link rel="prev" href="Regexp-Subexpressions.html#Regexp-Subexpressions" title="Regexp Subexpressions">
	<link rel="next" href="Regexp-Cleanup.html#Regexp-Cleanup" title="Regexp Cleanup">
	<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
	<!--
	This file documents the GNU C library.

	This is Edition 0.12, last updated 2007-10-27,
	of `The GNU C Library Reference Manual', for version
	2.8 (Sourcery G++ Lite 2011.03-41).

	Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
	2003, 2007, 2008, 2010 Free Software Foundation, Inc.

	Permission is granted to copy, distribute and/or modify this document
	under the terms of the GNU Free Documentation License, Version 1.3 or
	any later version published by the Free Software Foundation; with the
	Invariant Sections being ``Free Software Needs Free Documentation''
	and ``GNU Lesser General Public License'', the Front-Cover texts being
	``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
	copy of the license is included in the section entitled "GNU Free
	Documentation License".

	(a) The FSF's Back-Cover Text is: ``You have the freedom to
	copy and modify this GNU manual. Buying copies from the FSF
	supports it in developing GNU and promoting software freedom.''-->
	<meta http-equiv="Content-Style-Type" content="text/css">
	<style type="text/css"><!--
	pre.display { font-family:inherit }
	pre.format { font-family:inherit }
	pre.smalldisplay { font-family:inherit; font-size:smaller }
	pre.smallformat { font-family:inherit; font-size:smaller }
	pre.smallexample { font-size:smaller }
	pre.smalllisp { font-size:smaller }
	span.sc { font-variant:small-caps }
	span.roman { font-family:serif; font-weight:normal; }
	span.sansserif { font-family:sans-serif; font-weight:normal; }
	--></style>
	<link rel="stylesheet" type="text/css" href="../cs.css">
	</head>
	<body>
	<div class="node">
	<a name="Subexpression-Complications"></a>
	<p>
	Next: <a rel="next" accesskey="n" href="Regexp-Cleanup.html#Regexp-Cleanup">Regexp Cleanup</a>,
	Previous: <a rel="previous" accesskey="p" href="Regexp-Subexpressions.html#Regexp-Subexpressions">Regexp Subexpressions</a>,
	Up: <a rel="up" accesskey="u" href="Regular-Expressions.html#Regular-Expressions">Regular Expressions</a>
	<hr>
	</div>

	<h4 class="subsection">10.3.5 Complications in Subexpression Matching</h4>

	<p>Sometimes a subexpression matches a substring of no characters. This
	happens when ‘<samp><span class="samp">f\(o*\)</span></samp>’ matches the string ‘<samp><span class="samp">fum</span></samp>’. (It really
	matches just the ‘<samp><span class="samp">f</span></samp>’.) In this case, both of the offsets identify
	the point in the string where the null substring was found. In this
	example, the offsets are both <code>1</code>.

	<p>Sometimes the entire regular expression can match without using some of
	its subexpressions at all—for example, when ‘<samp><span class="samp">ba\(na\)*</span></samp>’ matches the
	string ‘<samp><span class="samp">ba</span></samp>’, the parenthetical subexpression is not used. When
	this happens, <code>regexec</code> stores <code>-1</code> in both fields of the
	element for that subexpression.

	<p>Sometimes matching the entire regular expression can match a particular
	subexpression more than once—for example, when ‘<samp><span class="samp">ba\(na\)*</span></samp>’
	matches the string ‘<samp><span class="samp">bananana</span></samp>’, the parenthetical subexpression
	matches three times. When this happens, <code>regexec</code> usually stores
	the offsets of the last part of the string that matched the
	subexpression. In the case of ‘<samp><span class="samp">bananana</span></samp>’, these offsets are
	<code>6</code> and <code>8</code>.

	<p>But the last match is not always the one that is chosen. It's more
	accurate to say that the last <em>opportunity</em> to match is the one
	that takes precedence. What this means is that when one subexpression
	appears within another, then the results reported for the inner
	subexpression reflect whatever happened on the last match of the outer
	subexpression. For an example, consider ‘<samp><span class="samp">\(ba\(na\)s \)</span></samp>’ matching
	the string ‘<samp><span class="samp">bananas bas </span></samp>’. The last time the inner expression
	actually matches is near the end of the first word. But it is
	<em>considered</em> again in the second word, and fails to match there.
	<code>regexec</code> reports nonuse of the “na” subexpression.

	<p>Another place where this rule applies is when the regular expression
	<pre class="smallexample"> \(ba\(na\)s \\|nefer\(ti\) \)*
	</pre>
	<p class="noindent">matches ‘<samp><span class="samp">bananas nefertiti</span></samp>’. The “na” subexpression does match
	in the first word, but it doesn't match in the second word because the
	other alternative is used there. Once again, the second repetition of
	the outer subexpression overrides the first, and within that second
	repetition, the “na” subexpression is not used. So <code>regexec</code>
	reports nonuse of the “na” subexpression.

	</body></html>