blob: c779ab4a08aa0a35170c167d2fef6b2b1ad38404 [file] [log] [blame]
<html lang="en">
<head>
<title>Subexpression Complications - The GNU C Library</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The GNU C Library">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Regular-Expressions.html#Regular-Expressions" title="Regular Expressions">
<link rel="prev" href="Regexp-Subexpressions.html#Regexp-Subexpressions" title="Regexp Subexpressions">
<link rel="next" href="Regexp-Cleanup.html#Regexp-Cleanup" title="Regexp Cleanup">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
This file documents the GNU C library.
This is Edition 0.12, last updated 2007-10-27,
of `The GNU C Library Reference Manual', for version
2.8 (Sourcery G++ Lite 2011.03-41).
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
2003, 2007, 2008, 2010 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``Free Software Needs Free Documentation''
and ``GNU Lesser General Public License'', the Front-Cover texts being
``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
(a) The FSF's Back-Cover Text is: ``You have the freedom to
copy and modify this GNU manual. Buying copies from the FSF
supports it in developing GNU and promoting software freedom.''-->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
pre.display { font-family:inherit }
pre.format { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }
pre.smallformat { font-family:inherit; font-size:smaller }
pre.smallexample { font-size:smaller }
pre.smalllisp { font-size:smaller }
span.sc { font-variant:small-caps }
span.roman { font-family:serif; font-weight:normal; }
span.sansserif { font-family:sans-serif; font-weight:normal; }
--></style>
<link rel="stylesheet" type="text/css" href="../cs.css">
</head>
<body>
<div class="node">
<a name="Subexpression-Complications"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="Regexp-Cleanup.html#Regexp-Cleanup">Regexp Cleanup</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="Regexp-Subexpressions.html#Regexp-Subexpressions">Regexp Subexpressions</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Regular-Expressions.html#Regular-Expressions">Regular Expressions</a>
<hr>
</div>
<h4 class="subsection">10.3.5 Complications in Subexpression Matching</h4>
<p>Sometimes a subexpression matches a substring of no characters. This
happens when &lsquo;<samp><span class="samp">f\(o*\)</span></samp>&rsquo; matches the string &lsquo;<samp><span class="samp">fum</span></samp>&rsquo;. (It really
matches just the &lsquo;<samp><span class="samp">f</span></samp>&rsquo;.) In this case, both of the offsets identify
the point in the string where the null substring was found. In this
example, the offsets are both <code>1</code>.
<p>Sometimes the entire regular expression can match without using some of
its subexpressions at all&mdash;for example, when &lsquo;<samp><span class="samp">ba\(na\)*</span></samp>&rsquo; matches the
string &lsquo;<samp><span class="samp">ba</span></samp>&rsquo;, the parenthetical subexpression is not used. When
this happens, <code>regexec</code> stores <code>-1</code> in both fields of the
element for that subexpression.
<p>Sometimes matching the entire regular expression can match a particular
subexpression more than once&mdash;for example, when &lsquo;<samp><span class="samp">ba\(na\)*</span></samp>&rsquo;
matches the string &lsquo;<samp><span class="samp">bananana</span></samp>&rsquo;, the parenthetical subexpression
matches three times. When this happens, <code>regexec</code> usually stores
the offsets of the last part of the string that matched the
subexpression. In the case of &lsquo;<samp><span class="samp">bananana</span></samp>&rsquo;, these offsets are
<code>6</code> and <code>8</code>.
<p>But the last match is not always the one that is chosen. It's more
accurate to say that the last <em>opportunity</em> to match is the one
that takes precedence. What this means is that when one subexpression
appears within another, then the results reported for the inner
subexpression reflect whatever happened on the last match of the outer
subexpression. For an example, consider &lsquo;<samp><span class="samp">\(ba\(na\)*s \)*</span></samp>&rsquo; matching
the string &lsquo;<samp><span class="samp">bananas bas </span></samp>&rsquo;. The last time the inner expression
actually matches is near the end of the first word. But it is
<em>considered</em> again in the second word, and fails to match there.
<code>regexec</code> reports nonuse of the &ldquo;na&rdquo; subexpression.
<p>Another place where this rule applies is when the regular expression
<pre class="smallexample"> \(ba\(na\)*s \|nefer\(ti\)* \)*
</pre>
<p class="noindent">matches &lsquo;<samp><span class="samp">bananas nefertiti</span></samp>&rsquo;. The &ldquo;na&rdquo; subexpression does match
in the first word, but it doesn't match in the second word because the
other alternative is used there. Once again, the second repetition of
the outer subexpression overrides the first, and within that second
repetition, the &ldquo;na&rdquo; subexpression is not used. So <code>regexec</code>
reports nonuse of the &ldquo;na&rdquo; subexpression.
</body></html>