| <html lang="en"> |
| <head> |
| <title>Subexpression Complications - The GNU C Library</title> |
| <meta http-equiv="Content-Type" content="text/html"> |
| <meta name="description" content="The GNU C Library"> |
| <meta name="generator" content="makeinfo 4.13"> |
| <link title="Top" rel="start" href="index.html#Top"> |
| <link rel="up" href="Regular-Expressions.html#Regular-Expressions" title="Regular Expressions"> |
| <link rel="prev" href="Regexp-Subexpressions.html#Regexp-Subexpressions" title="Regexp Subexpressions"> |
| <link rel="next" href="Regexp-Cleanup.html#Regexp-Cleanup" title="Regexp Cleanup"> |
| <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> |
| <!-- |
| This file documents the GNU C library. |
| |
| This is Edition 0.12, last updated 2007-10-27, |
| of `The GNU C Library Reference Manual', for version |
| 2.8 (Sourcery G++ Lite 2011.03-41). |
| |
| Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, |
| 2003, 2007, 2008, 2010 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 or |
| any later version published by the Free Software Foundation; with the |
| Invariant Sections being ``Free Software Needs Free Documentation'' |
| and ``GNU Lesser General Public License'', the Front-Cover texts being |
| ``A GNU Manual'', and with the Back-Cover Texts as in (a) below. A |
| copy of the license is included in the section entitled "GNU Free |
| Documentation License". |
| |
| (a) The FSF's Back-Cover Text is: ``You have the freedom to |
| copy and modify this GNU manual. Buying copies from the FSF |
| supports it in developing GNU and promoting software freedom.''--> |
| <meta http-equiv="Content-Style-Type" content="text/css"> |
| <style type="text/css"><!-- |
| pre.display { font-family:inherit } |
| pre.format { font-family:inherit } |
| pre.smalldisplay { font-family:inherit; font-size:smaller } |
| pre.smallformat { font-family:inherit; font-size:smaller } |
| pre.smallexample { font-size:smaller } |
| pre.smalllisp { font-size:smaller } |
| span.sc { font-variant:small-caps } |
| span.roman { font-family:serif; font-weight:normal; } |
| span.sansserif { font-family:sans-serif; font-weight:normal; } |
| --></style> |
| <link rel="stylesheet" type="text/css" href="../cs.css"> |
| </head> |
| <body> |
| <div class="node"> |
| <a name="Subexpression-Complications"></a> |
| <p> |
| Next: <a rel="next" accesskey="n" href="Regexp-Cleanup.html#Regexp-Cleanup">Regexp Cleanup</a>, |
| Previous: <a rel="previous" accesskey="p" href="Regexp-Subexpressions.html#Regexp-Subexpressions">Regexp Subexpressions</a>, |
| Up: <a rel="up" accesskey="u" href="Regular-Expressions.html#Regular-Expressions">Regular Expressions</a> |
| <hr> |
| </div> |
| |
| <h4 class="subsection">10.3.5 Complications in Subexpression Matching</h4> |
| |
| <p>Sometimes a subexpression matches a substring of no characters. This |
| happens when ‘<samp><span class="samp">f\(o*\)</span></samp>’ matches the string ‘<samp><span class="samp">fum</span></samp>’. (It really |
| matches just the ‘<samp><span class="samp">f</span></samp>’.) In this case, both of the offsets identify |
| the point in the string where the null substring was found. In this |
| example, the offsets are both <code>1</code>. |
| |
| <p>Sometimes the entire regular expression can match without using some of |
| its subexpressions at all—for example, when ‘<samp><span class="samp">ba\(na\)*</span></samp>’ matches the |
| string ‘<samp><span class="samp">ba</span></samp>’, the parenthetical subexpression is not used. When |
| this happens, <code>regexec</code> stores <code>-1</code> in both fields of the |
| element for that subexpression. |
| |
| <p>Sometimes matching the entire regular expression can match a particular |
| subexpression more than once—for example, when ‘<samp><span class="samp">ba\(na\)*</span></samp>’ |
| matches the string ‘<samp><span class="samp">bananana</span></samp>’, the parenthetical subexpression |
| matches three times. When this happens, <code>regexec</code> usually stores |
| the offsets of the last part of the string that matched the |
| subexpression. In the case of ‘<samp><span class="samp">bananana</span></samp>’, these offsets are |
| <code>6</code> and <code>8</code>. |
| |
| <p>But the last match is not always the one that is chosen. It's more |
| accurate to say that the last <em>opportunity</em> to match is the one |
| that takes precedence. What this means is that when one subexpression |
| appears within another, then the results reported for the inner |
| subexpression reflect whatever happened on the last match of the outer |
| subexpression. For an example, consider ‘<samp><span class="samp">\(ba\(na\)*s \)*</span></samp>’ matching |
| the string ‘<samp><span class="samp">bananas bas </span></samp>’. The last time the inner expression |
| actually matches is near the end of the first word. But it is |
| <em>considered</em> again in the second word, and fails to match there. |
| <code>regexec</code> reports nonuse of the “na” subexpression. |
| |
| <p>Another place where this rule applies is when the regular expression |
| <pre class="smallexample"> \(ba\(na\)*s \|nefer\(ti\)* \)* |
| </pre> |
| <p class="noindent">matches ‘<samp><span class="samp">bananas nefertiti</span></samp>’. The “na” subexpression does match |
| in the first word, but it doesn't match in the second word because the |
| other alternative is used there. Once again, the second repetition of |
| the outer subexpression overrides the first, and within that second |
| repetition, the “na” subexpression is not used. So <code>regexec</code> |
| reports nonuse of the “na” subexpression. |
| |
| </body></html> |
| |