| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> |
| <title>User's Guide</title> |
| <link rel="stylesheet" href="../../../doc/src/boostbook.css" type="text/css"> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.75.2"> |
| <link rel="home" href="../index.html" title="The Boost C++ Libraries BoostBook Documentation Subset"> |
| <link rel="up" href="../xpressive.html" title="Chapter 29. Boost.Xpressive"> |
| <link rel="prev" href="../xpressive.html" title="Chapter 29. Boost.Xpressive"> |
| <link rel="next" href="reference.html" title="Reference"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <table cellpadding="2" width="100%"><tr> |
| <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../boost.png"></td> |
| <td align="center"><a href="../../../index.html">Home</a></td> |
| <td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td> |
| <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> |
| <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> |
| <td align="center"><a href="../../../more/index.htm">More</a></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="../xpressive.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../xpressive.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="reference.html"><img src="../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="xpressive.user_s_guide"></a><a class="link" href="user_s_guide.html" title="User's Guide">User's Guide</a> |
| </h2></div></div></div> |
| <div class="toc"><dl> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.introduction">Introduction</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive">Installing |
| xpressive</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start">Quick Start</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#xpressive.user_s_guide.creating_a_regex_object">Creating |
| a Regex Object</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching">Matching |
| and Searching</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results">Accessing |
| Results</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions">String |
| Substitutions</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization">String |
| Splitting and Tokenization</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures">Named Captures</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches">Grammars |
| and Nested Matches</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions">Semantic |
| Actions and User-Defined Assertions</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes">Symbol |
| Tables and Attributes</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits">Localization |
| and Regex Traits</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks">Tips 'N Tricks</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.concepts">Concepts</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.examples">Examples</a></span></dt> |
| </dl></div> |
| <p> |
| This section describes how to use xpressive to accomplish text manipulation |
| and parsing tasks. If you are looking for detailed information regarding specific |
| components in xpressive, check the <a class="link" href="reference.html" title="Reference">Reference</a> |
| section. |
| </p> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.introduction"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.introduction" title="Introduction">Introduction</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.introduction.what_is_xpressive_"></a><h3> |
| <a name="id3091401"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.introduction.what_is_xpressive_">What |
| is xpressive?</a> |
| </h3> |
| <p> |
| xpressive is a regular expression template library. Regular expressions (regexes) |
| can be written as strings that are parsed dynamically at runtime (dynamic |
| regexes), or as <span class="emphasis"><em>expression templates</em></span><sup>[<a name="id3091425" href="#ftn.id3091425" class="footnote">4</a>]</sup> that are parsed at compile-time (static regexes). Dynamic regexes |
| have the advantage that they can be accepted from the user as input at runtime |
| or read from an initialization file. Static regexes have several advantages. |
| Since they are C++ expressions instead of strings, they can be syntax-checked |
| at compile-time. Also, they can naturally refer to code and data elsewhere |
| in your program, giving you the ability to call back into your code from |
| within a regex match. Finally, since they are statically bound, the compiler |
| can generate faster code for static regexes. |
| </p> |
| <p> |
| xpressive's dual nature is unique and powerful. Static xpressive is a bit |
| like the <a href="http://spirit.sourceforge.net" target="_top">Spirit Parser Framework</a>. |
| Like <a href="http://spirit.sourceforge.net" target="_top">Spirit</a>, you can build |
| grammars with static regexes using expression templates. (Unlike <a href="http://spirit.sourceforge.net" target="_top">Spirit</a>, |
| xpressive does exhaustive backtracking, trying every possibility to find |
| a match for your pattern.) Dynamic xpressive is a bit like <a href="../../../libs/regex" target="_top">Boost.Regex</a>. |
| In fact, xpressive's interface should be familiar to anyone who has used |
| <a href="../../../libs/regex" target="_top">Boost.Regex</a>. xpressive's innovation |
| comes from allowing you to mix and match static and dynamic regexes in the |
| same program, and even in the same expression! You can embed a dynamic regex |
| in a static regex, or <span class="emphasis"><em>vice versa</em></span>, and the embedded regex |
| will participate fully in the search, back-tracking as needed to make the |
| match succeed. |
| </p> |
| <a name="boost_xpressive.user_s_guide.introduction.hello__world_"></a><h3> |
| <a name="id3091497"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.introduction.hello__world_">Hello, |
| world!</a> |
| </h3> |
| <p> |
| Enough theory. Let's have a look at <span class="emphasis"><em>Hello World</em></span>, xpressive |
| style: |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">hello</span><span class="special">(</span> <span class="string">"hello world!"</span> <span class="special">);</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\w+) (\\w+)!"</span> <span class="special">);</span> |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">hello</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rex</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// whole match |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// first capture |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">2</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// second capture |
| </span> <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">hello world! |
| hello |
| world |
| </pre> |
| <p> |
| The first thing you'll notice about the code is that all the types in xpressive |
| live in the <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span></code> namespace. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Most of the rest of the examples in this document will leave off the <code class="computeroutput"><span class="keyword">using</span> <span class="keyword">namespace</span> |
| <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span></code> |
| directive. Just pretend it's there. |
| </p></td></tr> |
| </table></div> |
| <p> |
| Next, you'll notice the type of the regular expression object is <code class="computeroutput"><span class="identifier">sregex</span></code>. If you are familiar with <a href="../../../libs/regex" target="_top">Boost.Regex</a>, this is different than what you |
| are used to. The "<code class="computeroutput"><span class="identifier">s</span></code>" |
| in "<code class="computeroutput"><span class="identifier">sregex</span></code>" stands |
| for "<code class="computeroutput"><span class="identifier">string</span></code>", indicating |
| that this regex can be used to find patterns in <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> |
| objects. I'll discuss this difference and its implications in detail later. |
| </p> |
| <p> |
| Notice how the regex object is initialized: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\w+) (\\w+)!"</span> <span class="special">);</span> |
| </pre> |
| <p> |
| To create a regular expression object from a string, you must call a factory |
| method such as <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1532404-bb">basic_regex<>::compile()</a></code></code>. |
| This is another area in which xpressive differs from other object-oriented |
| regular expression libraries. Other libraries encourage you to think of a |
| regular expression as a kind of string on steroids. In xpressive, regular |
| expressions are not strings; they are little programs in a domain-specific |
| language. Strings are only one <span class="emphasis"><em>representation</em></span> of that |
| language. Another representation is an expression template. For example, |
| the above line of code is equivalent to the following: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">' '</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">'!'</span><span class="special">;</span> |
| </pre> |
| <p> |
| This describes the same regular expression, except it uses the domain-specific |
| embedded language defined by static xpressive. |
| </p> |
| <p> |
| As you can see, static regexes have a syntax that is noticeably different |
| than standard Perl syntax. That is because we are constrained by C++'s syntax. |
| The biggest difference is the use of <code class="computeroutput"><span class="special">>></span></code> |
| to mean "followed by". For instance, in Perl you can just put sub-expressions |
| next to each other: |
| </p> |
| <pre class="programlisting"><span class="identifier">abc</span> |
| </pre> |
| <p> |
| But in C++, there must be an operator separating sub-expressions: |
| </p> |
| <pre class="programlisting"><span class="identifier">a</span> <span class="special">>></span> <span class="identifier">b</span> <span class="special">>></span> <span class="identifier">c</span> |
| </pre> |
| <p> |
| In Perl, parentheses <code class="computeroutput"><span class="special">()</span></code> have |
| special meaning. They group, but as a side-effect they also create back-references |
| like <code class="literal">$1</code> and <code class="literal">$2</code>. In C++, there is no |
| way to overload parentheses to give them side-effects. To get the same effect, |
| we use the special <code class="computeroutput"><span class="identifier">s1</span></code>, <code class="computeroutput"><span class="identifier">s2</span></code>, etc. tokens. Assign to one to create |
| a back-reference (known as a sub-match in xpressive). |
| </p> |
| <p> |
| You'll also notice that the one-or-more repetition operator <code class="computeroutput"><span class="special">+</span></code> has moved from postfix to prefix position. |
| That's because C++ doesn't have a postfix <code class="computeroutput"><span class="special">+</span></code> |
| operator. So: |
| </p> |
| <pre class="programlisting"><span class="string">"\\w+"</span> |
| </pre> |
| <p> |
| is the same as: |
| </p> |
| <pre class="programlisting"><span class="special">+</span><span class="identifier">_w</span> |
| </pre> |
| <p> |
| We'll cover all the other differences <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes" title="Static Regexes">later</a>. |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.installing_xpressive"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive" title="Installing xpressive">Installing |
| xpressive</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.installing_xpressive.getting_xpressive"></a><h3> |
| <a name="id3092554"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.getting_xpressive">Getting |
| xpressive</a> |
| </h3> |
| <p> |
| There are two ways to get xpressive. The first and simplest is to download |
| the latest version of Boost. Just go to <a href="http://sf.net/projects/boost" target="_top">http://sf.net/projects/boost</a> |
| and follow the <span class="quote">“<span class="quote">Download</span>”</span> link. |
| </p> |
| <p> |
| The second way is by directly accessing the Boost Subversion repository. |
| Just go to <a href="http://svn.boost.org/trac/boost/" target="_top">http://svn.boost.org/trac/boost/</a> |
| and follow the instructions there for anonymous Subversion access. The version |
| in Boost Subversion is unstable. |
| </p> |
| <a name="boost_xpressive.user_s_guide.installing_xpressive.building_with_xpressive"></a><h3> |
| <a name="id3092607"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.building_with_xpressive">Building |
| with xpressive</a> |
| </h3> |
| <p> |
| Xpressive is a header-only template library, which means you don't need to |
| alter your build scripts or link to any separate lib file to use it. All |
| you need to do is <code class="computeroutput"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>. |
| If you are only using static regexes, you can improve compile times by only |
| including <code class="computeroutput"><span class="identifier">xpressive_static</span><span class="special">.</span><span class="identifier">hpp</span></code>. Likewise, |
| you can include <code class="computeroutput"><span class="identifier">xpressive_dynamic</span><span class="special">.</span><span class="identifier">hpp</span></code> if |
| you only plan on using dynamic regexes. |
| </p> |
| <p> |
| If you would also like to use semantic actions or custom assertions with |
| your static regexes, you will need to additionally include <code class="computeroutput"><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span></code>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.installing_xpressive.requirements"></a><h3> |
| <a name="id3092614"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.requirements">Requirements</a> |
| </h3> |
| <p> |
| Xpressive requires Boost version 1.34.1 or higher. |
| </p> |
| <a name="boost_xpressive.user_s_guide.installing_xpressive.supported_compilers"></a><h3> |
| <a name="id3092775"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.installing_xpressive.supported_compilers">Supported |
| Compilers</a> |
| </h3> |
| <p> |
| Currently, Boost.Xpressive is known to work on the following compilers: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| Visual C++ 7.1 and higher |
| </li> |
| <li class="listitem"> |
| GNU C++ 3.4 and higher |
| </li> |
| <li class="listitem"> |
| Intel for Linux 8.1 and higher |
| </li> |
| <li class="listitem"> |
| Intel for Windows 10 and higher |
| </li> |
| <li class="listitem"> |
| tru64cxx 71 and higher |
| </li> |
| <li class="listitem"> |
| MinGW 3.4 and higher |
| </li> |
| <li class="listitem"> |
| HP C/aC++ A.06.14 |
| </li> |
| </ul></div> |
| <p> |
| Check the latest tests results at Boost's <a href="http://beta.boost.org/development/tests/trunk/developer/xpressive.html" target="_top">Regression |
| Results Page</a>. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Please send any questions, comments and bug reports to eric <at> |
| boost-consulting <dot> com. |
| </p></td></tr> |
| </table></div> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.quick_start"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start" title="Quick Start">Quick Start</a> |
| </h3></div></div></div> |
| <p> |
| You don't need to know much to start being productive with xpressive. Let's |
| begin with the nickel tour of the types and algorithms xpressive provides. |
| </p> |
| <div class="table"> |
| <a name="id3092894"></a><p class="title"><b>Table 29.1. xpressive's Tool-Box</b></p> |
| <div class="table-contents"><table class="table" summary="xpressive's Tool-Box"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Tool |
| </p> |
| </th> |
| <th> |
| <p> |
| Description |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Contains a compiled regular expression. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| is the most important type in xpressive. Everything you do with |
| xpressive will begin with creating an object of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>, |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| contains the results of a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| operation. It acts like a vector of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| objects. A <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| object contains a marked sub-expression (also known as a back-reference |
| in Perl). It is basically just a pair of iterators representing |
| the begin and end of the marked sub-expression. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Checks to see if a string matches a regex. For <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| to succeed, the <span class="emphasis"><em>whole string</em></span> must match the |
| regex, from beginning to end. If you give <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>, |
| it will write into it any marked sub-expressions it finds. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Searches a string to find a sub-string that matches the regex. |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| will try to find a match at every position in the string, starting |
| at the beginning, and stopping when it finds a match or when the |
| string is exhausted. As with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code>, |
| if you give <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>, |
| it will write into it any marked sub-expressions it finds. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Given an input string, a regex, and a substitution string, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| builds a new string by replacing those parts of the input string |
| that match the regex with the substitution string. The substitution |
| string can contain references to marked sub-expressions. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| An STL-compatible iterator that makes it easy to find all the places |
| in a string that match a regex. Dereferencing a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> |
| returns a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>. |
| Incrementing a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> |
| finds the next match. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Like <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code>, |
| except dereferencing a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| returns a string. By default, it will return the whole sub-string |
| that the regex matched, but it can be configured to return any |
| or all of the marked sub-expressions one at a time, or even the |
| parts of the string that <span class="emphasis"><em>didn't</em></span> match the |
| regex. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A factory for <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| objects. It "compiles" a string into a regular expression. |
| You will not usually have to deal directly with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| because the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| class has a factory method that uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| internally. But if you need to do anything fancy like create a |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| object with a different <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, |
| you will need to use a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| explicitly. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| Now that you know a bit about the tools xpressive provides, you can pick |
| the right tool for you by answering the following two questions: |
| </p> |
| <div class="orderedlist"><ol class="orderedlist" type="1"> |
| <li class="listitem"> |
| What <span class="emphasis"><em>iterator</em></span> type will you use to traverse your |
| data? |
| </li> |
| <li class="listitem"> |
| What do you want to <span class="emphasis"><em>do</em></span> to your data? |
| </li> |
| </ol></div> |
| <a name="boost_xpressive.user_s_guide.quick_start.know_your_iterator_type"></a><h3> |
| <a name="id3093949"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start.know_your_iterator_type">Know |
| Your Iterator Type</a> |
| </h3> |
| <p> |
| Most of the classes in xpressive are templates that are parameterized on |
| the iterator type. xpressive defines some common typedefs to make the job |
| of choosing the right types easier. You can use the table below to find the |
| right types based on the type of your iterator. |
| </p> |
| <div class="table"> |
| <a name="id3093973"></a><p class="title"><b>Table 29.2. xpressive Typedefs vs. Iterator Types</b></p> |
| <div class="table-contents"><table class="table" summary="xpressive Typedefs vs. Iterator Types"> |
| <colgroup> |
| <col> |
| <col> |
| <col> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| </th> |
| <th> |
| <p> |
| std::string::const_iterator |
| </p> |
| </th> |
| <th> |
| <p> |
| char const * |
| </p> |
| </th> |
| <th> |
| <p> |
| std::wstring::const_iterator |
| </p> |
| </th> |
| <th> |
| <p> |
| wchar_t const * |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sregex</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">cregex</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wsregex</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wcregex</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">smatch</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">cmatch</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wsmatch</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wcmatch</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sregex_compiler</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">cregex_compiler</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wsregex_compiler</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wcregex_compiler</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sregex_iterator</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">cregex_iterator</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wsregex_iterator</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wcregex_iterator</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sregex_token_iterator</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">cregex_token_iterator</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wsregex_token_iterator</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">wcregex_token_iterator</span></code> |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| You should notice the systematic naming convention. Many of these types are |
| used together, so the naming convention helps you to use them consistently. |
| For instance, if you have a <code class="computeroutput"><span class="identifier">sregex</span></code>, |
| you should also be using a <code class="computeroutput"><span class="identifier">smatch</span></code>. |
| </p> |
| <p> |
| If you are not using one of those four iterator types, then you can use the |
| templates directly and specify your iterator type. |
| </p> |
| <a name="boost_xpressive.user_s_guide.quick_start.know_your_task"></a><h3> |
| <a name="id3094491"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start.know_your_task">Know |
| Your Task</a> |
| </h3> |
| <p> |
| Do you want to find a pattern once? Many times? Search and replace? xpressive |
| has tools for all that and more. Below is a quick reference: |
| </p> |
| <div class="table"> |
| <a name="id3094512"></a><p class="title"><b>Table 29.3. Tasks and Tools</b></p> |
| <div class="table-contents"><table class="table" summary="Tasks and Tools"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| To do this ... |
| </p> |
| </th> |
| <th> |
| <p> |
| Use this ... |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex">See |
| if a whole string matches a regex</a> |
| </p> |
| </td> |
| <td> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex">See |
| if a string contains a sub-string that matches a regex</a> |
| </p> |
| </td> |
| <td> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| algorithm |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex">Replace |
| all sub-strings that match a regex</a> |
| </p> |
| </td> |
| <td> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| algorithm |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.find_all_the_sub_strings_that_match_a_regex_and_step_through_them_one_at_a_time">Find |
| all the sub-strings that match a regex and step through them one |
| at a time</a> |
| </p> |
| </td> |
| <td> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> |
| class |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_into_tokens_that_each_match_a_regex">Split |
| a string into tokens that each match a regex</a> |
| </p> |
| </td> |
| <td> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| class |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <span class="inlinemediaobject"><img src="../images/tip.png" alt="tip"></span> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_using_a_regex_as_a_delimiter">Split |
| a string using a regex as a delimiter</a> |
| </p> |
| </td> |
| <td> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| class |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| These algorithms and classes are described in excruciating detail in the |
| Reference section. |
| </p> |
| <div class="tip"><table border="0" summary="Tip"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> |
| <th align="left">Tip</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Try clicking on a task in the table above to see a complete example program |
| that uses xpressive to solve that particular task. |
| </p></td></tr> |
| </table></div> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="xpressive.user_s_guide.creating_a_regex_object"></a><a class="link" href="user_s_guide.html#xpressive.user_s_guide.creating_a_regex_object" title="Creating a Regex Object">Creating |
| a Regex Object</a> |
| </h3></div></div></div> |
| <div class="toc"><dl> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes">Static |
| Regexes</a></span></dt> |
| <dt><span class="section"><a href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes">Dynamic |
| Regexes</a></span></dt> |
| </dl></div> |
| <p> |
| When using xpressive, the first thing you'll do is create a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| object. This section goes over the nuts and bolts of building a regular expression |
| in the two dialects xpressive supports: static and dynamic. |
| </p> |
| <div class="section"> |
| <div class="titlepage"><div><div><h4 class="title"> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes" title="Static Regexes">Static |
| Regexes</a> |
| </h4></div></div></div> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.overview"></a><h3> |
| <a name="id3094974"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.overview">Overview</a> |
| </h3> |
| <p> |
| The feature that really sets xpressive apart from other C/C++ regular expression |
| libraries is the ability to author a regular expression using C++ expressions. |
| xpressive achieves this through operator overloading, using a technique |
| called <span class="emphasis"><em>expression templates</em></span> to embed a mini-language |
| dedicated to pattern matching within C++. These "static regexes" |
| have many advantages over their string-based brethren. In particular, static |
| regexes: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| are syntax-checked at compile-time; they will never fail at run-time |
| due to a syntax error. |
| </li> |
| <li class="listitem"> |
| can naturally refer to other C++ data and code, including other regexes, |
| making it simple to build grammars out of regular expressions and bind |
| user-defined actions that execute when parts of your regex match. |
| </li> |
| <li class="listitem"> |
| are statically bound for better inlining and optimization. Static regexes |
| require no state tables, virtual functions, byte-code or calls through |
| function pointers that cannot be resolved at compile time. |
| </li> |
| <li class="listitem"> |
| are not limited to searching for patterns in strings. You can declare |
| a static regex that finds patterns in an array of integers, for instance. |
| </li> |
| </ul></div> |
| <p> |
| Since we compose static regexes using C++ expressions, we are constrained |
| by the rules for legal C++ expressions. Unfortunately, that means that |
| "classic" regular expression syntax cannot always be mapped cleanly |
| into C++. Rather, we map the regex <span class="emphasis"><em>constructs</em></span>, picking |
| new syntax that is legal C++. |
| </p> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.construction_and_assignment"></a><h3> |
| <a name="id3095069"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.construction_and_assignment">Construction |
| and Assignment</a> |
| </h3> |
| <p> |
| You create a static regex by assigning one to an object of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>. |
| For instance, the following defines a regex that can be used to find patterns |
| in objects of type <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="char">'$'</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_d</span> <span class="special">>></span> <span class="char">'.'</span> <span class="special">>></span> <span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">;</span> |
| </pre> |
| <p> |
| Assignment works similarly. |
| </p> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.character_and_string_literals"></a><h3> |
| <a name="id3095216"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.character_and_string_literals">Character |
| and String Literals</a> |
| </h3> |
| <p> |
| In static regexes, character and string literals match themselves. For |
| instance, in the regex above, <code class="computeroutput"><span class="char">'$'</span></code> |
| and <code class="computeroutput"><span class="char">'.'</span></code> match the characters |
| <code class="computeroutput"><span class="char">'$'</span></code> and <code class="computeroutput"><span class="char">'.'</span></code> |
| respectively. Don't be confused by the fact that <code class="literal">$</code> and |
| <code class="literal">.</code> are meta-characters in Perl. In xpressive, literals |
| always represent themselves. |
| </p> |
| <p> |
| When using literals in static regexes, you must take care that at least |
| one operand is not a literal. For instance, the following are <span class="emphasis"><em>not</em></span> |
| valid regexes: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re1</span> <span class="special">=</span> <span class="char">'a'</span> <span class="special">>></span> <span class="char">'b'</span><span class="special">;</span> <span class="comment">// ERROR! |
| </span><span class="identifier">sregex</span> <span class="identifier">re2</span> <span class="special">=</span> <span class="special">+</span><span class="char">'a'</span><span class="special">;</span> <span class="comment">// ERROR! |
| </span></pre> |
| <p> |
| The two operands to the binary <code class="computeroutput"><span class="special">>></span></code> |
| operator are both literals, and the operand of the unary <code class="computeroutput"><span class="special">+</span></code> operator is also a literal, so these statements |
| will call the native C++ binary right-shift and unary plus operators, respectively. |
| That's not what we want. To get operator overloading to kick in, at least |
| one operand must be a user-defined type. We can use xpressive's <code class="computeroutput"><span class="identifier">as_xpr</span><span class="special">()</span></code> |
| helper function to "taint" an expression with regex-ness, forcing |
| operator overloading to find the correct operators. The two regexes above |
| should be written as: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re1</span> <span class="special">=</span> <span class="identifier">as_xpr</span><span class="special">(</span><span class="char">'a'</span><span class="special">)</span> <span class="special">>></span> <span class="char">'b'</span><span class="special">;</span> <span class="comment">// OK |
| </span><span class="identifier">sregex</span> <span class="identifier">re2</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">as_xpr</span><span class="special">(</span><span class="char">'a'</span><span class="special">);</span> <span class="comment">// OK |
| </span></pre> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.sequencing_and_alternation"></a><h3> |
| <a name="id3095533"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.sequencing_and_alternation">Sequencing |
| and Alternation</a> |
| </h3> |
| <p> |
| As you've probably already noticed, sub-expressions in static regexes must |
| be separated by the sequencing operator, <code class="computeroutput"><span class="special">>></span></code>. |
| You can read this operator as "followed by". |
| </p> |
| <pre class="programlisting"><span class="comment">// Match an 'a' followed by a digit |
| </span><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="char">'a'</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">;</span> |
| </pre> |
| <p> |
| Alternation works just as it does in Perl with the <code class="computeroutput"><span class="special">|</span></code> |
| operator. You can read this operator as "or". For example: |
| </p> |
| <pre class="programlisting"><span class="comment">// match a digit character or a word character one or more times |
| </span><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">|</span> <span class="identifier">_w</span> <span class="special">);</span> |
| </pre> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.grouping_and_captures"></a><h3> |
| <a name="id3095686"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.grouping_and_captures">Grouping |
| and Captures</a> |
| </h3> |
| <p> |
| In Perl, parentheses <code class="computeroutput"><span class="special">()</span></code> have |
| special meaning. They group, but as a side-effect they also create back-references |
| like <code class="literal">$1</code> and <code class="literal">$2</code>. In C++, parentheses |
| only group -- there is no way to give them side-effects. To get the same |
| effect, we use the special <code class="computeroutput"><span class="identifier">s1</span></code>, |
| <code class="computeroutput"><span class="identifier">s2</span></code>, etc. tokens. Assigning |
| to one creates a back-reference. You can then use the back-reference later |
| in your expression, like using <code class="literal">\1</code> and <code class="literal">\2</code> |
| in Perl. For example, consider the following regex, which finds matching |
| HTML tags: |
| </p> |
| <pre class="programlisting"><span class="string">"<(\\w+)>.*?</\\1>"</span> |
| </pre> |
| <p> |
| In static xpressive, this would be: |
| </p> |
| <pre class="programlisting"><span class="char">'<'</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">'>'</span> <span class="special">>></span> <span class="special">-*</span><span class="identifier">_</span> <span class="special">>></span> <span class="string">"</"</span> <span class="special">>></span> <span class="identifier">s1</span> <span class="special">>></span> <span class="char">'>'</span> |
| </pre> |
| <p> |
| Notice how you capture a back-reference by assigning to <code class="computeroutput"><span class="identifier">s1</span></code>, |
| and then you use <code class="computeroutput"><span class="identifier">s1</span></code> later |
| in the pattern to find the matching end tag. |
| </p> |
| <div class="tip"><table border="0" summary="Tip"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> |
| <th align="left">Tip</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| <span class="bold"><strong>Grouping without capturing a back-reference</strong></span> |
| <br> <br> In xpressive, if you just want grouping without capturing |
| a back-reference, you can just use <code class="computeroutput"><span class="special">()</span></code> |
| without <code class="computeroutput"><span class="identifier">s1</span></code>. That is the |
| equivalent of Perl's <code class="literal">(?:)</code> non-capturing grouping construct. |
| </p></td></tr> |
| </table></div> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.case_insensitivity_and_internationalization"></a><h3> |
| <a name="id3095958"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.case_insensitivity_and_internationalization">Case-Insensitivity |
| and Internationalization</a> |
| </h3> |
| <p> |
| Perl lets you make part of your regular expression case-insensitive by |
| using the <code class="literal">(?i:)</code> pattern modifier. xpressive also has |
| a case-insensitivity pattern modifier, called <code class="computeroutput"><span class="identifier">icase</span></code>. |
| You can use it as follows: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="string">"this"</span> <span class="special">>></span> <span class="identifier">icase</span><span class="special">(</span> <span class="string">"that"</span> <span class="special">);</span> |
| </pre> |
| <p> |
| In this regular expression, <code class="computeroutput"><span class="string">"this"</span></code> |
| will be matched exactly, but <code class="computeroutput"><span class="string">"that"</span></code> |
| will be matched irrespective of case. |
| </p> |
| <p> |
| Case-insensitive regular expressions raise the issue of internationalization: |
| how should case-insensitive character comparisons be evaluated? Also, many |
| character classes are locale-specific. Which characters are matched by |
| <code class="computeroutput"><span class="identifier">digit</span></code> and which are matched |
| by <code class="computeroutput"><span class="identifier">alpha</span></code>? The answer depends |
| on the <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code> object the regular expression |
| object is using. By default, all regular expression objects use the global |
| locale. You can override the default by using the <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> pattern modifier, as follows: |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">my_locale</span> <span class="special">=</span> <span class="comment">/* initialize a std::locale object */</span><span class="special">;</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span> <span class="identifier">my_locale</span> <span class="special">)(</span> <span class="special">+</span><span class="identifier">alpha</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">digit</span> <span class="special">);</span> |
| </pre> |
| <p> |
| This regular expression will evaluate <code class="computeroutput"><span class="identifier">alpha</span></code> |
| and <code class="computeroutput"><span class="identifier">digit</span></code> according to |
| <code class="computeroutput"><span class="identifier">my_locale</span></code>. See the section |
| on <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits" title="Localization and Regex Traits">Localization |
| and Regex Traits</a> for more information about how to customize the |
| behavior of your regexes. |
| </p> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.static_xpressive_syntax_cheat_sheet"></a><h3> |
| <a name="id3096293"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes.static_xpressive_syntax_cheat_sheet">Static |
| xpressive Syntax Cheat Sheet</a> |
| </h3> |
| <p> |
| The table below lists the familiar regex constructs and their equivalents |
| in static xpressive. |
| </p> |
| <div class="table"> |
| <a name="id3096315"></a><p class="title"><b>Table 29.4. Perl syntax vs. Static xpressive syntax</b></p> |
| <div class="table-contents"><table class="table" summary="Perl syntax vs. Static xpressive syntax"> |
| <colgroup> |
| <col> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Perl |
| </p> |
| </th> |
| <th> |
| <p> |
| Static xpressive |
| </p> |
| </th> |
| <th> |
| <p> |
| Meaning |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">.</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_.html" title="Global _">_</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| any character (assuming Perl's /s modifier). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">ab</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">a</span> <span class="special">>></span> |
| <span class="identifier">b</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| sequencing of <code class="literal">a</code> and <code class="literal">b</code> sub-expressions. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a|b</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">a</span> <span class="special">|</span> |
| <span class="identifier">b</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| alternation of <code class="literal">a</code> and <code class="literal">b</code> |
| sub-expressions. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(a)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">(</span><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s1</a><span class="special">=</span> <span class="identifier">a</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| group and capture a back-reference. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?:a)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">(</span><span class="identifier">a</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| group and do not capture a back-reference. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\1</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s1</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a previously captured back-reference. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a*</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">*</span><span class="identifier">a</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| zero or more times, greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a+</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">+</span><span class="identifier">a</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| one or more times, greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a?</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">!</span><span class="identifier">a</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| zero or one time, greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a{n,m}</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/repeat.html" title="Function repeat">repeat</a><span class="special"><</span><span class="identifier">n</span><span class="special">,</span><span class="identifier">m</span><span class="special">>(</span><span class="identifier">a</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| between <code class="literal">n</code> and <code class="literal">m</code> times, |
| greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a*?</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">-*</span><span class="identifier">a</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| zero or more times, non-greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a+?</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">-+</span><span class="identifier">a</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| one or more times, non-greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a??</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">-!</span><span class="identifier">a</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| zero or one time, non-greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">a{n,m}?</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">-</span><a class="link" href="../boost/xpressive/repeat.html" title="Function repeat">repeat</a><span class="special"><</span><span class="identifier">n</span><span class="special">,</span><span class="identifier">m</span><span class="special">>(</span><span class="identifier">a</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| between <code class="literal">n</code> and <code class="literal">m</code> times, |
| non-greedy. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">^</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/bos.html" title="Global bos">bos</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| beginning of sequence assertion. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/eos.html" title="Global eos">eos</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| end of sequence assertion. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\b</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_b.html" title="Global _b">_b</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| word boundary assertion. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\B</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_b.html" title="Global _b">_b</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| not word boundary assertion. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\n</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_n.html" title="Global _n">_n</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| literal newline. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">.</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_n.html" title="Global _n">_n</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| any character except a literal newline (without Perl's /s modifier). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\r?\n|\r</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_ln.html" title="Global _ln">_ln</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| logical newline. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[^\r\n]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_ln.html" title="Global _ln">_ln</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| any single character not a logical newline. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\w</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_w.html" title="Global _w">_w</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a word character, equivalent to set[alnum | '_']. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\W</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_w.html" title="Global _w">_w</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| not a word character, equivalent to ~set[alnum | '_']. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\d</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_d.html" title="Global _d">_d</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a digit character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\D</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_d.html" title="Global _d">_d</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| not a digit character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\s</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/_s.html" title="Global _s">_s</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a space character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\S</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/_s.html" title="Global _s">_s</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| not a space character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:alnum:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/alnum.html" title="Global alnum">alnum</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| an alpha-numeric character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:alpha:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/alpha.html" title="Global alpha">alpha</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| an alphabetic character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:blank:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/blank.html" title="Global blank">blank</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a horizontal white-space character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:cntrl:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/cntrl.html" title="Global cntrl">cntrl</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a control character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:digit:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/digit.html" title="Global digit">digit</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a digit character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:graph:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/graph.html" title="Global graph">graph</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a graphable character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:lower:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/lower.html" title="Global lower">lower</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a lower-case character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:print:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/print.html" title="Global print">print</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a printing character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:punct:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/punct.html" title="Global punct">punct</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a punctuation character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:space:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/space.html" title="Global space">space</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a white-space character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:upper:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/upper.html" title="Global upper">upper</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| an upper-case character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[:xdigit:]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/xdigit.html" title="Global xdigit">xdigit</a></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a hexadecimal digit character. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[0-9]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/range.html" title="Function template range">range</a><span class="special">(</span><span class="char">'0'</span><span class="special">,</span><span class="char">'9'</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| characters in range <code class="computeroutput"><span class="char">'0'</span></code> |
| through <code class="computeroutput"><span class="char">'9'</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[abc]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">as_xpr</span><span class="special">(</span><span class="char">'a'</span><span class="special">)</span> <span class="special">|</span> <span class="char">'b'</span> <span class="special">|</span><span class="char">'c'</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| characters <code class="computeroutput"><span class="char">'a'</span></code>, <code class="computeroutput"><span class="char">'b'</span></code>, or <code class="computeroutput"><span class="char">'c'</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[abc]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">(</span><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">=</span> <span class="char">'a'</span><span class="special">,</span><span class="char">'b'</span><span class="special">,</span><span class="char">'c'</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <span class="emphasis"><em>same as above</em></span> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[0-9abc]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">[</span> <a class="link" href="../boost/xpressive/range.html" title="Function template range">range</a><span class="special">(</span><span class="char">'0'</span><span class="special">,</span><span class="char">'9'</span><span class="special">)</span> <span class="special">|</span> |
| <span class="char">'a'</span> <span class="special">|</span> |
| <span class="char">'b'</span> <span class="special">|</span> |
| <span class="char">'c'</span> <span class="special">]</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| characters <code class="computeroutput"><span class="char">'a'</span></code>, <code class="computeroutput"><span class="char">'b'</span></code>, <code class="computeroutput"><span class="char">'c'</span></code> |
| or in range <code class="computeroutput"><span class="char">'0'</span></code> through |
| <code class="computeroutput"><span class="char">'9'</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[0-9abc]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">[</span> <a class="link" href="../boost/xpressive/range.html" title="Function template range">range</a><span class="special">(</span><span class="char">'0'</span><span class="special">,</span><span class="char">'9'</span><span class="special">)</span> <span class="special">|</span> |
| <span class="special">(</span><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">=</span> <span class="char">'a'</span><span class="special">,</span><span class="char">'b'</span><span class="special">,</span><span class="char">'c'</span><span class="special">)</span> <span class="special">]</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <span class="emphasis"><em>same as above</em></span> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">[^abc]</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~(</span><a class="link" href="../boost/xpressive/set.html" title="Global set">set</a><span class="special">=</span> <span class="char">'a'</span><span class="special">,</span><span class="char">'b'</span><span class="special">,</span><span class="char">'c'</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| not characters <code class="computeroutput"><span class="char">'a'</span></code>, |
| <code class="computeroutput"><span class="char">'b'</span></code>, or <code class="computeroutput"><span class="char">'c'</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?i:<span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/icase.html" title="Function template icase">icase</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| match <span class="emphasis"><em>stuff</em></span> disregarding case. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?><span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/keep.html" title="Function template keep">keep</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| independent sub-expression, match <span class="emphasis"><em>stuff</em></span> |
| and turn off backtracking. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?=<span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/before.html" title="Function template before">before</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| positive look-ahead assertion, match if before <span class="emphasis"><em>stuff</em></span> |
| but don't include <span class="emphasis"><em>stuff</em></span> in the match. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?!<span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/before.html" title="Function template before">before</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| negative look-ahead assertion, match if not before <span class="emphasis"><em>stuff</em></span>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?<=<span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><a class="link" href="../boost/xpressive/after.html" title="Function template after">after</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| positive look-behind assertion, match if after <span class="emphasis"><em>stuff</em></span> |
| but don't include <span class="emphasis"><em>stuff</em></span> in the match. (<span class="emphasis"><em>stuff</em></span> |
| must be constant-width.) |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?<!<span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="special">~</span><a class="link" href="../boost/xpressive/after.html" title="Function template after">after</a><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| negative look-behind assertion, match if not after <span class="emphasis"><em>stuff</em></span>. |
| (<span class="emphasis"><em>stuff</em></span> must be constant-width.) |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?P<<span class="emphasis"><em>name</em></span>><span class="emphasis"><em>stuff</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><code class="literal"><a class="link" href="../boost/xpressive/mark_tag.html" title="Struct mark_tag">mark_tag</a></code> |
| </code><code class="literal"><span class="emphasis"><em>name</em></span></code><code class="computeroutput"><span class="special">(</span></code><span class="emphasis"><em>n</em></span><code class="computeroutput"><span class="special">);</span></code><br> ...<br> <code class="computeroutput"><span class="special">(</span></code><code class="literal"><span class="emphasis"><em>name</em></span></code><code class="computeroutput"><span class="special">=</span> </code><code class="literal"><span class="emphasis"><em>stuff</em></span></code><code class="computeroutput"><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Create a named capture. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">(?P=<span class="emphasis"><em>name</em></span>)</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><code class="literal"><a class="link" href="../boost/xpressive/mark_tag.html" title="Struct mark_tag">mark_tag</a></code> |
| </code><code class="literal"><span class="emphasis"><em>name</em></span></code><code class="computeroutput"><span class="special">(</span></code><span class="emphasis"><em>n</em></span><code class="computeroutput"><span class="special">);</span></code><br> ...<br> <code class="literal"><span class="emphasis"><em>name</em></span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Refer back to a previously created named capture. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| <br> |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h4 class="title"> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes" title="Dynamic Regexes">Dynamic |
| Regexes</a> |
| </h4></div></div></div> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.overview"></a><h3> |
| <a name="id3099458"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.overview">Overview</a> |
| </h3> |
| <p> |
| Static regexes are dandy, but sometimes you need something a bit more ... |
| dynamic. Imagine you are developing a text editor with a regex search/replace |
| feature. You need to accept a regular expression from the end user as input |
| at run-time. There should be a way to parse a string into a regular expression. |
| That's what xpressive's dynamic regexes are for. They are built from the |
| same core components as their static counterparts, but they are late-bound |
| so you can specify them at run-time. |
| </p> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.construction_and_assignment"></a><h3> |
| <a name="id3099486"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.construction_and_assignment">Construction |
| and Assignment</a> |
| </h3> |
| <p> |
| There are two ways to create a dynamic regex: with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1532404-bb">basic_regex<>::compile()</a></code></code> |
| function or with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| class template. Use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1532404-bb">basic_regex<>::compile()</a></code></code> |
| if you want the default locale. Use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| if you need to specify a different locale. In the section on <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches" title="Grammars and Nested Matches">regex |
| grammars</a>, we'll see another use for <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. |
| </p> |
| <p> |
| Here is an example of using <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">compile</span><span class="special">()</span></code>: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"this|that"</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">icase</span> <span class="special">);</span> |
| </pre> |
| <p> |
| Here is the same example using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex_compiler</span> <span class="identifier">compiler</span><span class="special">;</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"this|that"</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">icase</span> <span class="special">);</span> |
| </pre> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html#id1532404-bb">basic_regex<>::compile()</a></code></code> |
| is implemented in terms of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.dynamic_xpressive_syntax"></a><h3> |
| <a name="id3099827"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.dynamic_xpressive_syntax">Dynamic |
| xpressive Syntax</a> |
| </h3> |
| <p> |
| Since the dynamic syntax is not constrained by the rules for valid C++ |
| expressions, we are free to use familiar syntax for dynamic regexes. For |
| this reason, the syntax used by xpressive for dynamic regexes follows the |
| lead set by John Maddock's <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1429.htm" target="_top">proposal</a> |
| to add regular expressions to the Standard Library. It is essentially the |
| syntax standardized by <a href="http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf" target="_top">ECMAScript</a>, |
| with minor changes in support of internationalization. |
| </p> |
| <p> |
| Since the syntax is documented exhaustively elsewhere, I will simply refer |
| you to the existing standards, rather than duplicate the specification |
| here. |
| </p> |
| <a name="boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.internationalization"></a><h3> |
| <a name="id3099882"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.creating_a_regex_object.dynamic_regexes.internationalization">Internationalization</a> |
| </h3> |
| <p> |
| As with static regexes, dynamic regexes support internationalization by |
| allowing you to specify a different <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>. |
| To do this, you must use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| class has an <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> |
| function. After you have imbued a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| object with a custom <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, |
| all regex objects compiled by that <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| will use that locale. For example: |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">my_locale</span> <span class="special">=</span> <span class="comment">/* initialize your locale object here */</span><span class="special">;</span> |
| <span class="identifier">sregex_compiler</span> <span class="identifier">compiler</span><span class="special">;</span> |
| <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">imbue</span><span class="special">(</span> <span class="identifier">my_locale</span> <span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"\\w+|\\d+"</span> <span class="special">);</span> |
| </pre> |
| <p> |
| This regex will use <code class="computeroutput"><span class="identifier">my_locale</span></code> |
| when evaluating the intrinsic character sets <code class="computeroutput"><span class="string">"\\w"</span></code> |
| and <code class="computeroutput"><span class="string">"\\d"</span></code>. |
| </p> |
| </div> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.matching_and_searching"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching" title="Matching and Searching">Matching |
| and Searching</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.matching_and_searching.overview"></a><h3> |
| <a name="id3100200"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching.overview">Overview</a> |
| </h3> |
| <p> |
| Once you have created a regex object, you can use the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| algorithms to find patterns in strings. This page covers the basics of regex |
| matching and searching. In all cases, if you are familiar with how <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| in the <a href="../../../libs/regex" target="_top">Boost.Regex</a> library work, xpressive's |
| versions work the same way. |
| </p> |
| <a name="boost_xpressive.user_s_guide.matching_and_searching.seeing_if_a_string_matches_a_regex"></a><h3> |
| <a name="id3100294"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching.seeing_if_a_string_matches_a_regex">Seeing |
| if a String Matches a Regex</a> |
| </h3> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm checks to see if a regex matches a given input. |
| </p> |
| <div class="warning"><table border="0" summary="Warning"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="../../../doc/src/images/warning.png"></td> |
| <th align="left">Warning</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm will only report success if the regex matches the <span class="emphasis"><em>whole |
| input</em></span>, from beginning to end. If the regex matches only a part |
| of the input, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| will return false. If you want to search through the string looking for |
| sub-strings that the regex matches, use the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| algorithm. |
| </p></td></tr> |
| </table></div> |
| <p> |
| The input can be a bidirectional range such as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>, |
| a C-style null-terminated string or a pair of iterators. In all cases, the |
| type of the iterator used to traverse the input sequence must match the iterator |
| type used to declare the regex object. (You can use the table in the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.quick_start.know_your_iterator_type">Quick |
| Start</a> to find the correct regex type for your iterator.) |
| </p> |
| <pre class="programlisting"><span class="identifier">cregex</span> <span class="identifier">cre</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// this regex can match C-style strings |
| </span><span class="identifier">sregex</span> <span class="identifier">sre</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// this regex can match std::strings |
| </span> |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="string">"hello"</span><span class="special">,</span> <span class="identifier">cre</span> <span class="special">)</span> <span class="special">)</span> <span class="comment">// OK |
| </span> <span class="special">{</span> <span class="comment">/*...*/</span> <span class="special">}</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">(</span><span class="string">"hello"</span><span class="special">),</span> <span class="identifier">sre</span> <span class="special">)</span> <span class="special">)</span> <span class="comment">// OK |
| </span> <span class="special">{</span> <span class="comment">/*...*/</span> <span class="special">}</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="string">"hello"</span><span class="special">,</span> <span class="identifier">sre</span> <span class="special">)</span> <span class="special">)</span> <span class="comment">// ERROR! iterator mis-match! |
| </span> <span class="special">{</span> <span class="comment">/*...*/</span> <span class="special">}</span> |
| </pre> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm optionally accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| struct as an out parameter. If given, the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm fills in the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| struct with information about which parts of the regex matched which parts |
| of the input. |
| </p> |
| <pre class="programlisting"><span class="identifier">cmatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="identifier">cregex</span> <span class="identifier">cre</span> <span class="special">=</span> <span class="special">+(</span><span class="identifier">s1</span><span class="special">=</span> <span class="identifier">_w</span><span class="special">);</span> |
| |
| <span class="comment">// store the results of the regex_match in "what" |
| </span><span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="string">"hello"</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">cre</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// prints "o" |
| </span><span class="special">}</span> |
| </pre> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm also optionally accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code> |
| bitmask. With <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code>, |
| you can control certain aspects of how the match is evaluated. See the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code> |
| reference for a complete list of the flags and their meanings. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"hello"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">sre</span> <span class="special">=</span> <span class="identifier">bol</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> |
| |
| <span class="comment">// match_not_bol means that "bol" should not match at [begin,begin) |
| </span><span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">sre</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">match_not_bol</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="comment">// should never get here!!! |
| </span><span class="special">}</span> |
| </pre> |
| <p> |
| Click <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex">here</a> |
| to see a complete example program that shows how to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code>. |
| And check the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| reference to see a complete list of the available overloads. |
| </p> |
| <a name="boost_xpressive.user_s_guide.matching_and_searching.searching_for_matching_sub_strings"></a><h3> |
| <a name="id3101277"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.matching_and_searching.searching_for_matching_sub_strings">Searching |
| for Matching Sub-Strings</a> |
| </h3> |
| <p> |
| Use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| when you want to know if an input sequence contains a sub-sequence that a |
| regex matches. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| will try to match the regex at the beginning of the input sequence and scan |
| forward in the sequence until it either finds a match or exhausts the sequence. |
| </p> |
| <p> |
| In all other regards, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| behaves like <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| <span class="emphasis"><em>(see above)</em></span>. In particular, it can operate on a bidirectional |
| range such as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>, C-style null-terminated strings |
| or iterator ranges. The same care must be taken to ensure that the iterator |
| type of your regex matches the iterator type of your input sequence. As with |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code>, |
| you can optionally provide a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| struct to receive the results of the search, and a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_constants/match_flag_type.html" title="Type match_flag_type">match_flag_type</a></code></code> |
| bitmask to control how the match is evaluated. |
| </p> |
| <p> |
| Click <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex">here</a> |
| to see a complete example program that shows how to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code>. |
| And check the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| reference to see a complete list of the available overloads. |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.accessing_results"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results" title="Accessing Results">Accessing |
| Results</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.accessing_results.overview"></a><h3> |
| <a name="id3101498"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results.overview">Overview</a> |
| </h3> |
| <p> |
| Sometimes, it is not enough to know simply whether a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| was successful or not. If you pass an object of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code>, |
| then after the algorithm has completed successfully the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| will contain extra information about which parts of the regex matched which |
| parts of the sequence. In Perl, these sub-sequences are called <span class="emphasis"><em>back-references</em></span>, |
| and they are stored in the variables <code class="literal">$1</code>, <code class="literal">$2</code>, |
| etc. In xpressive, they are objects of type <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>, |
| and they are stored in the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| structure, which acts as a vector of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| objects. |
| </p> |
| <a name="boost_xpressive.user_s_guide.accessing_results.match_results"></a><h3> |
| <a name="id3101672"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results.match_results">match_results</a> |
| </h3> |
| <p> |
| So, you've passed a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object to a regex algorithm, and the algorithm has succeeded. Now you want |
| to examine the results. Most of what you'll be doing with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object is indexing into it to access its internally stored <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| objects, but there are a few other things you can do with a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object besides. |
| </p> |
| <p> |
| The table below shows how to access the information stored in a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object named <code class="computeroutput"><span class="identifier">what</span></code>. |
| </p> |
| <div class="table"> |
| <a name="id3101780"></a><p class="title"><b>Table 29.5. match_results<> Accessors</b></p> |
| <div class="table-contents"><table class="table" summary="match_results<> Accessors"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Accessor |
| </p> |
| </th> |
| <th> |
| <p> |
| Effects |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the number of sub-matches, which is always greater than |
| zero after a successful match because the full match is stored |
| in the zero-th sub-match. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">[</span><span class="identifier">n</span><span class="special">]</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the <span class="emphasis"><em>n</em></span>-th sub-match. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">length</span><span class="special">(</span><span class="identifier">n</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the length of the <span class="emphasis"><em>n</em></span>-th sub-match. |
| Same as <code class="computeroutput"><span class="identifier">what</span><span class="special">[</span><span class="identifier">n</span><span class="special">].</span><span class="identifier">length</span><span class="special">()</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">position</span><span class="special">(</span><span class="identifier">n</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the offset into the input sequence at which the <span class="emphasis"><em>n</em></span>-th |
| sub-match begins. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">str</span><span class="special">(</span><span class="identifier">n</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code> |
| constructed from the <span class="emphasis"><em>n</em></span>-th sub-match. Same |
| as <code class="computeroutput"><span class="identifier">what</span><span class="special">[</span><span class="identifier">n</span><span class="special">].</span><span class="identifier">str</span><span class="special">()</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">prefix</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| object which represents the sub-sequence from the beginning of |
| the input sequence to the start of the full match. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">suffix</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| object which represents the sub-sequence from the end of the full |
| match to the end of the input sequence. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">regex_id</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the <code class="computeroutput"><span class="identifier">regex_id</span></code> |
| of the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| object that was last used with this <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| There is more you can do with the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object, but that will be covered when we talk about <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches" title="Grammars and Nested Matches">Grammars |
| and Nested Matches</a>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.accessing_results.sub_match"></a><h3> |
| <a name="id3102363"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results.sub_match">sub_match</a> |
| </h3> |
| <p> |
| When you index into a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object, you get back a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| object. A <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| is basically a pair of iterators. It is defined like this: |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span><span class="special"><</span> <span class="keyword">class</span> <span class="identifier">BidirectionalIterator</span> <span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">sub_match</span> |
| <span class="special">:</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">pair</span><span class="special"><</span> <span class="identifier">BidirectionalIterator</span><span class="special">,</span> <span class="identifier">BidirectionalIterator</span> <span class="special">></span> |
| <span class="special">{</span> |
| <span class="keyword">bool</span> <span class="identifier">matched</span><span class="special">;</span> |
| <span class="comment">// ... |
| </span><span class="special">};</span> |
| </pre> |
| <p> |
| Since it inherits publicaly from <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">pair</span><span class="special"><></span></code>, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| has <code class="computeroutput"><span class="identifier">first</span></code> and <code class="computeroutput"><span class="identifier">second</span></code> data members of type <code class="computeroutput"><span class="identifier">BidirectionalIterator</span></code>. These are the beginning |
| and end of the sub-sequence this <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| represents. <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| also has a Boolean <code class="computeroutput"><span class="identifier">matched</span></code> |
| data member, which is true if this <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| participated in the full match. |
| </p> |
| <p> |
| The following table shows how you might access the information stored in |
| a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| object called <code class="computeroutput"><span class="identifier">sub</span></code>. |
| </p> |
| <div class="table"> |
| <a name="id3102694"></a><p class="title"><b>Table 29.6. sub_match<> Accessors</b></p> |
| <div class="table-contents"><table class="table" summary="sub_match<> Accessors"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Accessor |
| </p> |
| </th> |
| <th> |
| <p> |
| Effects |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">length</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the length of the sub-match. Same as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">distance</span><span class="special">(</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">first</span><span class="special">,</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">second</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">str</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code> |
| constructed from the sub-match. Same as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><</span><span class="identifier">char_type</span><span class="special">>(</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">first</span><span class="special">,</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">second</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">compare</span><span class="special">(</span><span class="identifier">str</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Performs a string comparison between the sub-match and <code class="computeroutput"><span class="identifier">str</span></code>, where <code class="computeroutput"><span class="identifier">str</span></code> |
| can be a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code>, |
| C-style null-terminated string, or another sub-match. Same as |
| <code class="computeroutput"><span class="identifier">sub</span><span class="special">.</span><span class="identifier">str</span><span class="special">().</span><span class="identifier">compare</span><span class="special">(</span><span class="identifier">str</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><a name="boost_xpressive.user_s_guide.accessing_results._inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject__results_invalidation__inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject_"></a><h3> |
| <a name="id3103093"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.accessing_results._inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject__results_invalidation__inlinemediaobject__imageobject__imagedata_fileref__images_caution_png____imagedata___imageobject__textobject__phrase_caution__phrase___textobject___inlinemediaobject_"><span class="inlinemediaobject"><img src="../images/caution.png" alt="caution"></span> Results Invalidation <span class="inlinemediaobject"><img src="../images/caution.png" alt="caution"></span></a> |
| </h3> |
| <p> |
| Results are stored as iterators into the input sequence. Anything which invalidates |
| the input sequence will invalidate the match results. For instance, if you |
| match a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> object, the results are only valid |
| until your next call to a non-const member function of that <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> |
| object. After that, the results held by the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object are invalid. Don't use them! |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.string_substitutions"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions" title="String Substitutions">String |
| Substitutions</a> |
| </h3></div></div></div> |
| <p> |
| Regular expressions are not only good for searching text; they're good at |
| <span class="emphasis"><em>manipulating</em></span> it. And one of the most common text manipulation |
| tasks is search-and-replace. xpressive provides the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| algorithm for searching and replacing. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_substitutions.regex_replace__"></a><h3> |
| <a name="id3103267"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.regex_replace__">regex_replace()</a> |
| </h3> |
| <p> |
| Performing search-and-replace using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| is simple. All you need is an input sequence, a regex object, and a format |
| string or a formatter object. There are several versions of the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| algorithm. Some accept the input sequence as a bidirectional container such |
| as <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code> and returns the result in a new |
| container of the same type. Others accept the input as a null terminated |
| string and return a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>. Still others accept the input sequence |
| as a pair of iterators and writes the result into an output iterator. The |
| substitution may be specified as a string with format sequences or as a formatter |
| object. Below are some simple examples of using string-based substitutions. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"This is his face"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">as_xpr</span><span class="special">(</span><span class="string">"his"</span><span class="special">);</span> <span class="comment">// find all occurrences of "his" ... |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">format</span><span class="special">(</span><span class="string">"her"</span><span class="special">);</span> <span class="comment">// ... and replace them with "her" |
| </span> |
| <span class="comment">// use the version of regex_replace() that operates on strings |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span> <span class="identifier">input</span><span class="special">,</span> <span class="identifier">re</span><span class="special">,</span> <span class="identifier">format</span> <span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| |
| <span class="comment">// use the version of regex_replace() that operates on iterators |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="keyword">char</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">);</span> |
| <span class="identifier">regex_replace</span><span class="special">(</span> <span class="identifier">out_iter</span><span class="special">,</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="identifier">format</span> <span class="special">);</span> |
| </pre> |
| <p> |
| The above program prints out the following: |
| </p> |
| <pre class="programlisting">Ther is her face |
| Ther is her face |
| </pre> |
| <p> |
| Notice that <span class="emphasis"><em>all</em></span> the occurrences of <code class="computeroutput"><span class="string">"his"</span></code> |
| have been replaced with <code class="computeroutput"><span class="string">"her"</span></code>. |
| </p> |
| <p> |
| Click <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex">here</a> |
| to see a complete example program that shows how to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>. |
| And check the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| reference to see a complete list of the available overloads. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_substitutions.replace_options"></a><h3> |
| <a name="id3103818"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.replace_options">Replace |
| Options</a> |
| </h3> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| algorithm takes an optional bitmask parameter to control the formatting. |
| The possible values of the bitmask are: |
| </p> |
| <div class="table"> |
| <a name="id3103853"></a><p class="title"><b>Table 29.7. Format Flags</b></p> |
| <div class="table-contents"><table class="table" summary="Format Flags"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Flag |
| </p> |
| </th> |
| <th> |
| <p> |
| Meaning |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_default</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Recognize the ECMA-262 format sequences (see below). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_first_only</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Only replace the first match, not all of them. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_no_copy</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Don't copy the parts of the input sequence that didn't match the |
| regex to the output sequence. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_literal</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Treat the format string as a literal; that is, don't recognize |
| any escape sequences. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_perl</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Recognize the Perl format sequences (see below). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_sed</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Recognize the sed format sequences (see below). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">format_all</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| In addition to the Perl format sequences, recognize some Boost-specific |
| format sequences. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| These flags live in the <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">regex_constants</span></code> |
| namespace. If the substitution parameter is a function object instead of |
| a string, the flags <code class="computeroutput"><span class="identifier">format_literal</span></code>, |
| <code class="computeroutput"><span class="identifier">format_perl</span></code>, <code class="computeroutput"><span class="identifier">format_sed</span></code>, and <code class="computeroutput"><span class="identifier">format_all</span></code> |
| are ignored. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_substitutions.the_ecma_262_format_sequences"></a><h3> |
| <a name="id3104147"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_ecma_262_format_sequences">The |
| ECMA-262 Format Sequences</a> |
| </h3> |
| <p> |
| When you haven't specified a substitution string dialect with one of the |
| format flags above, you get the dialect defined by ECMA-262, the standard |
| for ECMAScript. The table below shows the escape sequences recognized in |
| ECMA-262 mode. |
| </p> |
| <div class="table"> |
| <a name="id3104170"></a><p class="title"><b>Table 29.8. Format Escape Sequences</b></p> |
| <div class="table-contents"><table class="table" summary="Format Escape Sequences"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Escape Sequence |
| </p> |
| </th> |
| <th> |
| <p> |
| Meaning |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$1</code>, <code class="literal">$2</code>, etc. |
| </p> |
| </td> |
| <td> |
| <p> |
| the corresponding sub-match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$&</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the full match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$`</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the match prefix |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$'</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the match suffix |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$$</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a literal <code class="computeroutput"><span class="char">'$'</span></code> character |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| Any other sequence beginning with <code class="computeroutput"><span class="char">'$'</span></code> |
| simply represents itself. For example, if the format string were <code class="computeroutput"><span class="string">"$a"</span></code> then <code class="computeroutput"><span class="string">"$a"</span></code> |
| would be inserted into the output sequence. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_substitutions.the_sed_format_sequences"></a><h3> |
| <a name="id3104377"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_sed_format_sequences">The |
| Sed Format Sequences</a> |
| </h3> |
| <p> |
| When specifying the <code class="computeroutput"><span class="identifier">format_sed</span></code> |
| flag to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, |
| the following escape sequences are recognized: |
| </p> |
| <div class="table"> |
| <a name="id3104420"></a><p class="title"><b>Table 29.9. Sed Format Escape Sequences</b></p> |
| <div class="table-contents"><table class="table" summary="Sed Format Escape Sequences"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Escape Sequence |
| </p> |
| </th> |
| <th> |
| <p> |
| Meaning |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\1</code>, <code class="literal">\2</code>, etc. |
| </p> |
| </td> |
| <td> |
| <p> |
| The corresponding sub-match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">&</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the full match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\a</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\a'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\e</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">27</span><span class="special">)</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\f</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\f'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\n</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\n'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\r</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\r'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\t</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\t'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\v</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\v'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\xFF</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> |
| is any hex digit |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\x{FFFF}</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFFFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> |
| is any hex digit |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\cX</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| The control character <code class="literal"><span class="emphasis"><em>X</em></span></code> |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><a name="boost_xpressive.user_s_guide.string_substitutions.the_perl_format_sequences"></a><h3> |
| <a name="id3104880"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_perl_format_sequences">The |
| Perl Format Sequences</a> |
| </h3> |
| <p> |
| When specifying the <code class="computeroutput"><span class="identifier">format_perl</span></code> |
| flag to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, |
| the following escape sequences are recognized: |
| </p> |
| <div class="table"> |
| <a name="id3104922"></a><p class="title"><b>Table 29.10. Perl Format Escape Sequences</b></p> |
| <div class="table-contents"><table class="table" summary="Perl Format Escape Sequences"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Escape Sequence |
| </p> |
| </th> |
| <th> |
| <p> |
| Meaning |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$1</code>, <code class="literal">$2</code>, etc. |
| </p> |
| </td> |
| <td> |
| <p> |
| the corresponding sub-match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$&</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the full match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$`</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the match prefix |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$'</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| the match suffix |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">$$</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| a literal <code class="computeroutput"><span class="char">'$'</span></code> character |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\a</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\a'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\e</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">27</span><span class="special">)</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\f</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\f'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\n</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\n'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\r</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\r'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\t</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\t'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\v</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="char">'\v'</span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\xFF</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> |
| is any hex digit |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\x{FFFF}</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| A literal <code class="computeroutput"><span class="identifier">char_type</span><span class="special">(</span><span class="number">0xFFFF</span><span class="special">)</span></code>, where <code class="literal"><span class="emphasis"><em>F</em></span></code> |
| is any hex digit |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\cX</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| The control character <code class="literal"><span class="emphasis"><em>X</em></span></code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\l</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Make the next character lowercase |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\L</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Make the rest of the substitution lowercase until the next <code class="literal">\E</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\u</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Make the next character uppercase |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\U</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Make the rest of the substitution uppercase until the next <code class="literal">\E</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\E</code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Terminate <code class="literal">\L</code> or <code class="literal">\U</code> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\1</code>, <code class="literal">\2</code>, etc. |
| </p> |
| </td> |
| <td> |
| <p> |
| The corresponding sub-match |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="literal">\g<name></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| The named backref <span class="emphasis"><em>name</em></span> |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><a name="boost_xpressive.user_s_guide.string_substitutions.the_boost_specific_format_sequences"></a><h3> |
| <a name="id3105648"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.the_boost_specific_format_sequences">The |
| Boost-Specific Format Sequences</a> |
| </h3> |
| <p> |
| When specifying the <code class="computeroutput"><span class="identifier">format_all</span></code> |
| flag to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, |
| the escape sequences recognized are the same as those above for <code class="computeroutput"><span class="identifier">format_perl</span></code>. In addition, conditional expressions |
| of the following form are recognized: |
| </p> |
| <pre class="programlisting">?Ntrue-expression:false-expression |
| </pre> |
| <p> |
| where <span class="emphasis"><em>N</em></span> is a decimal digit representing a sub-match. |
| If the corresponding sub-match participated in the full match, then the substitution |
| is <span class="emphasis"><em>true-expression</em></span>. Otherwise, it is <span class="emphasis"><em>false-expression</em></span>. |
| In this mode, you can use parens <code class="literal">()</code> for grouping. If you |
| want a literal paren, you must escape it as <code class="literal">\(</code>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_substitutions.formatter_objects"></a><h3> |
| <a name="id3105747"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.formatter_objects">Formatter |
| Objects</a> |
| </h3> |
| <p> |
| Format strings are not always expressive enough for all your text substitution |
| needs. Consider the simple example of wanting to map input strings to output |
| strings, as you may want to do with environment variables. Rather than a |
| format <span class="emphasis"><em>string</em></span>, for this you would use a formatter <span class="emphasis"><em>object</em></span>. |
| Consider the following code, which finds embedded environment variables of |
| the form <code class="computeroutput"><span class="string">"$(XYZ)"</span></code> and |
| computes the substitution string by looking up the environment variable in |
| a map. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">map</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">;</span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">></span> <span class="identifier">env</span><span class="special">;</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">format_fun</span><span class="special">(</span><span class="identifier">smatch</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span><span class="special">)</span> |
| <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">env</span><span class="special">[</span><span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">].</span><span class="identifier">str</span><span class="special">()];</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">env</span><span class="special">[</span><span class="string">"X"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"this"</span><span class="special">;</span> |
| <span class="identifier">env</span><span class="special">[</span><span class="string">"Y"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"that"</span><span class="special">;</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"\"$(X)\" has the value \"$(Y)\""</span><span class="special">);</span> |
| |
| <span class="comment">// replace strings like "$(XYZ)" with the result of env["XYZ"] |
| </span> <span class="identifier">sregex</span> <span class="identifier">envar</span> <span class="special">=</span> <span class="string">"$("</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">input</span><span class="special">,</span> <span class="identifier">envar</span><span class="special">,</span> <span class="identifier">format_fun</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| In this case, we use a function, <code class="computeroutput"><span class="identifier">format_fun</span><span class="special">()</span></code> to compute the substitution string on the |
| fly. It accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object which contains the results of the current match. <code class="computeroutput"><span class="identifier">format_fun</span><span class="special">()</span></code> uses the first submatch as a key into the |
| global <code class="computeroutput"><span class="identifier">env</span></code> map. The above |
| code displays: |
| </p> |
| <pre class="programlisting">"this" has the value "that" |
| </pre> |
| <p> |
| The formatter need not be an ordinary function. It may be an object of class |
| type. And rather than return a string, it may accept an output iterator into |
| which it writes the substitution. Consider the following, which is functionally |
| equivalent to the above. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">map</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">;</span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">struct</span> <span class="identifier">formatter</span> |
| <span class="special">{</span> |
| <span class="keyword">typedef</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">></span> <span class="identifier">env_map</span><span class="special">;</span> |
| <span class="identifier">env_map</span> <span class="identifier">env</span><span class="special">;</span> |
| |
| <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Out</span><span class="special">></span> |
| <span class="identifier">Out</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">smatch</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span><span class="special">,</span> <span class="identifier">Out</span> <span class="identifier">out</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="identifier">env_map</span><span class="special">::</span><span class="identifier">const_iterator</span> <span class="identifier">where</span> <span class="special">=</span> <span class="identifier">env</span><span class="special">.</span><span class="identifier">find</span><span class="special">(</span><span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]);</span> |
| <span class="keyword">if</span><span class="special">(</span><span class="identifier">where</span> <span class="special">!=</span> <span class="identifier">env</span><span class="special">.</span><span class="identifier">end</span><span class="special">())</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">sub</span> <span class="special">=</span> <span class="identifier">where</span><span class="special">-></span><span class="identifier">second</span><span class="special">;</span> |
| <span class="identifier">out</span> <span class="special">=</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span><span class="identifier">sub</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">sub</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">out</span><span class="special">);</span> |
| <span class="special">}</span> |
| <span class="keyword">return</span> <span class="identifier">out</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="special">};</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">formatter</span> <span class="identifier">fmt</span><span class="special">;</span> |
| <span class="identifier">fmt</span><span class="special">.</span><span class="identifier">env</span><span class="special">[</span><span class="string">"X"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"this"</span><span class="special">;</span> |
| <span class="identifier">fmt</span><span class="special">.</span><span class="identifier">env</span><span class="special">[</span><span class="string">"Y"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"that"</span><span class="special">;</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"\"$(X)\" has the value \"$(Y)\""</span><span class="special">);</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">envar</span> <span class="special">=</span> <span class="string">"$("</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">input</span><span class="special">,</span> <span class="identifier">envar</span><span class="special">,</span> <span class="identifier">fmt</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| The formatter must be a callable object -- a function or a function object |
| -- that has one of three possible signatures, detailed in the table below. |
| For the table, <code class="computeroutput"><span class="identifier">fmt</span></code> is a function |
| pointer or function object, <code class="computeroutput"><span class="identifier">what</span></code> |
| is a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object, <code class="computeroutput"><span class="identifier">out</span></code> is an OutputIterator, |
| and <code class="computeroutput"><span class="identifier">flags</span></code> is a value of |
| <code class="computeroutput"><span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">match_flag_type</span></code>: |
| </p> |
| <div class="table"> |
| <a name="id3107534"></a><p class="title"><b>Table 29.11. Formatter Signatures</b></p> |
| <div class="table-contents"><table class="table" summary="Formatter Signatures"> |
| <colgroup> |
| <col> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| Formatter Invocation |
| </p> |
| </th> |
| <th> |
| <p> |
| Return Type |
| </p> |
| </th> |
| <th> |
| <p> |
| Semantics |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">fmt</span><span class="special">(</span><span class="identifier">what</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Range of characters (e.g. <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span></code>) |
| or null-terminated string |
| </p> |
| </td> |
| <td> |
| <p> |
| The string matched by the regex is replaced with the string returned |
| by the formatter. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">fmt</span><span class="special">(</span><span class="identifier">what</span><span class="special">,</span> |
| <span class="identifier">out</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| OutputIterator |
| </p> |
| </td> |
| <td> |
| <p> |
| The formatter writes the replacement string into <code class="computeroutput"><span class="identifier">out</span></code> and returns <code class="computeroutput"><span class="identifier">out</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">fmt</span><span class="special">(</span><span class="identifier">what</span><span class="special">,</span> |
| <span class="identifier">out</span><span class="special">,</span> |
| <span class="identifier">flags</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| OutputIterator |
| </p> |
| </td> |
| <td> |
| <p> |
| The formatter writes the replacement string into <code class="computeroutput"><span class="identifier">out</span></code> and returns <code class="computeroutput"><span class="identifier">out</span></code>. The <code class="computeroutput"><span class="identifier">flags</span></code> |
| parameter is the value of the match flags passed to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| algorithm. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><a name="boost_xpressive.user_s_guide.string_substitutions.formatter_expressions"></a><h3> |
| <a name="id3107836"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_substitutions.formatter_expressions">Formatter |
| Expressions</a> |
| </h3> |
| <p> |
| In addition to format <span class="emphasis"><em>strings</em></span> and formatter <span class="emphasis"><em>objects</em></span>, |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| also accepts formatter <span class="emphasis"><em>expressions</em></span>. A formatter expression |
| is a lambda expression that generates a string. It uses the same syntax as |
| that for <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions" title="Semantic Actions and User-Defined Assertions">Semantic |
| Actions</a>, which are covered later. The above example, which uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code> |
| to substitute strings for environment variables, is repeated here using a |
| formatter expression. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">map</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">></span> <span class="identifier">env</span><span class="special">;</span> |
| <span class="identifier">env</span><span class="special">[</span><span class="string">"X"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"this"</span><span class="special">;</span> |
| <span class="identifier">env</span><span class="special">[</span><span class="string">"Y"</span><span class="special">]</span> <span class="special">=</span> <span class="string">"that"</span><span class="special">;</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"\"$(X)\" has the value \"$(Y)\""</span><span class="special">);</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">envar</span> <span class="special">=</span> <span class="string">"$("</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s1</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">output</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">input</span><span class="special">,</span> <span class="identifier">envar</span><span class="special">,</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">env</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">output</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| In the above, the formatter expression is <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">env</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span></code>. This |
| means to use the value of the first submatch, <code class="computeroutput"><span class="identifier">s1</span></code>, |
| as a key into the <code class="computeroutput"><span class="identifier">env</span></code> map. |
| The purpose of <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> |
| here is to make the reference to the <code class="computeroutput"><span class="identifier">env</span></code> |
| local variable <span class="emphasis"><em>lazy</em></span> so that the index operation is deferred |
| until we know what to replace <code class="computeroutput"><span class="identifier">s1</span></code> |
| with. |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization" title="String Splitting and Tokenization">String |
| Splitting and Tokenization</a> |
| </h3></div></div></div> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| is the Ginsu knife of the text manipulation world. It slices! It dices! This |
| section describes how to use the highly-configurable <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| to chop up input sequences. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.overview"></a><h3> |
| <a name="id3108650"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.overview">Overview</a> |
| </h3> |
| <p> |
| You initialize a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| with an input sequence, a regex, and some optional configuration parameters. |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| will use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| to find the first place in the sequence that the regex matches. When dereferenced, |
| the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| returns a <span class="emphasis"><em>token</em></span> in the form of a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><></span></code>. Which string it returns depends |
| on the configuration parameters. By default it returns a string corresponding |
| to the full match, but it could also return a string corresponding to a particular |
| marked sub-expression, or even the part of the sequence that <span class="emphasis"><em>didn't</em></span> |
| match. When you increment the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code>, |
| it will move to the next token. Which token is next depends on the configuration |
| parameters. It could simply be a different marked sub-expression in the current |
| match, or it could be part or all of the next match. Or it could be the part |
| that <span class="emphasis"><em>didn't</em></span> match. |
| </p> |
| <p> |
| As you can see, <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| can do a lot. That makes it hard to describe, but some examples should make |
| it clear. |
| </p> |
| <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_1__simple_tokenization"></a><h3> |
| <a name="id3108818"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_1__simple_tokenization">Example |
| 1: Simple Tokenization</a> |
| </h3> |
| <p> |
| This example uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| to chop a sequence into a series of tokens consisting of words. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"This is his face"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">;</span> <span class="comment">// find a word |
| </span> |
| <span class="comment">// iterate over all the words in the input |
| </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="comment">// write all the words to std::cout |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> |
| </pre> |
| <p> |
| This program displays the following: |
| </p> |
| <pre class="programlisting">This |
| is |
| his |
| face |
| </pre> |
| <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_2__simple_tokenization__reloaded"></a><h3> |
| <a name="id3109156"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_2__simple_tokenization__reloaded">Example |
| 2: Simple Tokenization, Reloaded</a> |
| </h3> |
| <p> |
| This example also uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| to chop a sequence into a series of tokens consisting of words, but it uses |
| the regex as a delimiter. When we pass a <code class="computeroutput"><span class="special">-</span><span class="number">1</span></code> as the last parameter to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| constructor, it instructs the token iterator to consider as tokens those |
| parts of the input that <span class="emphasis"><em>didn't</em></span> match the regex. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"This is his face"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_s</span><span class="special">;</span> <span class="comment">// find white space |
| </span> |
| <span class="comment">// iterate over all non-white space in the input. Note the -1 below: |
| </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="special">-</span><span class="number">1</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="comment">// write all the words to std::cout |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> |
| </pre> |
| <p> |
| This program displays the following: |
| </p> |
| <pre class="programlisting">This |
| is |
| his |
| face |
| </pre> |
| <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_3__simple_tokenization__revolutions"></a><h3> |
| <a name="id3109545"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_3__simple_tokenization__revolutions">Example |
| 3: Simple Tokenization, Revolutions</a> |
| </h3> |
| <p> |
| This example also uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| to chop a sequence containing a bunch of dates into a series of tokens consisting |
| of just the years. When we pass a positive integer <code class="literal"><span class="emphasis"><em>N</em></span></code> |
| as the last parameter to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| constructor, it instructs the token iterator to consider as tokens only the |
| <code class="literal"><span class="emphasis"><em>N</em></span></code>-th marked sub-expression of each |
| match. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"01/02/2003 blahblah 04/23/1999 blahblah 11/13/1981"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(\\d{2})/(\\d{2})/(\\d{4})"</span><span class="special">);</span> <span class="comment">// find a date |
| </span> |
| <span class="comment">// iterate over all the years in the input. Note the 3 below, corresponding to the 3rd sub-expression: |
| </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="number">3</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="comment">// write all the words to std::cout |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> |
| </pre> |
| <p> |
| This program displays the following: |
| </p> |
| <pre class="programlisting">2003 |
| 1999 |
| 1981 |
| </pre> |
| <a name="boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_4__not_so_simple_tokenization"></a><h3> |
| <a name="id3109940"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization.example_4__not_so_simple_tokenization">Example |
| 4: Not-So-Simple Tokenization</a> |
| </h3> |
| <p> |
| This example is like the previous one, except that instead of tokenizing |
| just the years, this program turns the days, months and years into tokens. |
| When we pass an array of integers <code class="literal"><span class="emphasis"><em>{I,J,...}</em></span></code> |
| as the last parameter to the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| constructor, it instructs the token iterator to consider as tokens the <code class="literal"><span class="emphasis"><em>I</em></span></code>-th, |
| <code class="literal"><span class="emphasis"><em>J</em></span></code>-th, etc. marked sub-expression |
| of each match. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">input</span><span class="special">(</span><span class="string">"01/02/2003 blahblah 04/23/1999 blahblah 11/13/1981"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(\\d{2})/(\\d{2})/(\\d{4})"</span><span class="special">);</span> <span class="comment">// find a date |
| </span> |
| <span class="comment">// iterate over the days, months and years in the input |
| </span><span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">sub_matches</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">2</span><span class="special">,</span> <span class="number">1</span><span class="special">,</span> <span class="number">3</span> <span class="special">};</span> <span class="comment">// day, month, year |
| </span><span class="identifier">sregex_token_iterator</span> <span class="identifier">begin</span><span class="special">(</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">input</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">re</span><span class="special">,</span> <span class="identifier">sub_matches</span> <span class="special">),</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="comment">// write all the words to std::cout |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="special">></span> <span class="identifier">out_iter</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span><span class="special">,</span> <span class="string">"\n"</span> <span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">copy</span><span class="special">(</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">out_iter</span> <span class="special">);</span> |
| </pre> |
| <p> |
| This program displays the following: |
| </p> |
| <pre class="programlisting">02 |
| 01 |
| 2003 |
| 23 |
| 04 |
| 1999 |
| 13 |
| 11 |
| 1981 |
| </pre> |
| <p> |
| The <code class="computeroutput"><span class="identifier">sub_matches</span></code> array instructs |
| the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| to first take the value of the 2nd sub-match, then the 1st sub-match, and |
| finally the 3rd. Incrementing the iterator again instructs it to use <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| again to find the next match. At that point, the process repeats -- the token |
| iterator takes the value of the 2nd sub-match, then the 1st, et cetera. |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.named_captures"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures" title="Named Captures">Named Captures</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.named_captures.overview"></a><h3> |
| <a name="id3110456"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures.overview">Overview</a> |
| </h3> |
| <p> |
| For complicated regular expressions, dealing with numbered captures can be |
| a pain. Counting left parentheses to figure out which capture to reference |
| is no fun. Less fun is the fact that merely editing a regular expression |
| could cause a capture to be assigned a new number, invaliding code that refers |
| back to it by the old number. |
| </p> |
| <p> |
| Other regular expression engines solve this problem with a feature called |
| <span class="emphasis"><em>named captures</em></span>. This feature allows you to assign a |
| name to a capture, and to refer back to the capture by name rather by number. |
| Xpressive also supports named captures, both in dynamic and in static regexes. |
| </p> |
| <a name="boost_xpressive.user_s_guide.named_captures.dynamic_named_captures"></a><h3> |
| <a name="id3110502"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures.dynamic_named_captures">Dynamic |
| Named Captures</a> |
| </h3> |
| <p> |
| For dynamic regular expressions, xpressive follows the lead of other popular |
| regex engines with the syntax of named captures. You can create a named capture |
| with <code class="computeroutput"><span class="string">"(?P<xxx>...)"</span></code> |
| and refer back to that capture with <code class="computeroutput"><span class="string">"(?P=xxx)"</span></code>. |
| Here, for instance, is a regular expression that creates a named capture |
| and refers back to it: |
| </p> |
| <pre class="programlisting"><span class="comment">// Create a named capture called "char" that matches a single |
| </span><span class="comment">// character and refer back to that capture by name. |
| </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(?P<char>.)(?P=char)"</span><span class="special">);</span> |
| </pre> |
| <p> |
| The effect of the above regular expression is to find the first doubled character. |
| </p> |
| <p> |
| Once you have executed a match or search operation using a regex with named |
| captures, you can access the named capture through the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object using the capture's name. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(?P<char>.)(?P=char)"</span><span class="special">);</span> |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="string">"char = "</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="string">"char"</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| The above code displays: |
| </p> |
| <pre class="programlisting">char = e |
| </pre> |
| <p> |
| You can also refer back to a named capture from within a substitution string. |
| The syntax for that is <code class="computeroutput"><span class="string">"\\g<xxx>"</span></code>. |
| Below is some code that demonstrates how to use named captures when doing |
| string substitution. |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(?P<char>.)(?P=char)"</span><span class="special">);</span> |
| <span class="identifier">str</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">,</span> <span class="string">"**\\g<char>**"</span><span class="special">,</span> <span class="identifier">regex_constants</span><span class="special">::</span><span class="identifier">format_perl</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">str</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| </pre> |
| <p> |
| Notice that you have to specify <code class="computeroutput"><span class="identifier">format_perl</span></code> |
| when using named captures. Only the perl syntax recognizes the <code class="computeroutput"><span class="string">"\\g<xxx>"</span></code> syntax. The above |
| code displays: |
| </p> |
| <pre class="programlisting">tw**e**t |
| </pre> |
| <a name="boost_xpressive.user_s_guide.named_captures.static_named_captures"></a><h3> |
| <a name="id3111388"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.named_captures.static_named_captures">Static |
| Named Captures</a> |
| </h3> |
| <p> |
| If you're using static regular expressions, creating and using named captures |
| is even easier. You can use the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/mark_tag.html" title="Struct mark_tag">mark_tag</a></code></code> |
| type to create a variable that you can use like <code class="computeroutput"><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s1</a></code>, <code class="computeroutput"><a class="link" href="../boost/xpressive/s1.html" title="Global s1">s2</a></code> and friends, but with a |
| name that is more meaningful. Below is how the above example would look using |
| static regexes: |
| </p> |
| <pre class="programlisting"><span class="identifier">mark_tag</span> <span class="identifier">char_</span><span class="special">(</span><span class="number">1</span><span class="special">);</span> <span class="comment">// char_ is now a synonym for s1 |
| </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">char_</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">char_</span><span class="special">;</span> |
| </pre> |
| <p> |
| After a match operation, you can use the <code class="computeroutput"><span class="identifier">mark_tag</span></code> |
| to index into the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| to access the named capture: |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> |
| <span class="identifier">mark_tag</span> <span class="identifier">char_</span><span class="special">(</span><span class="number">1</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">char_</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">char_</span><span class="special">;</span> |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">char_</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| The above code displays: |
| </p> |
| <pre class="programlisting">char = e |
| </pre> |
| <p> |
| When doing string substitutions with <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_replace.html" title="Function regex_replace">regex_replace()</a></code></code>, |
| you can use named captures to create <span class="emphasis"><em>format expressions</em></span> |
| as below: |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"tweet"</span><span class="special">);</span> |
| <span class="identifier">mark_tag</span> <span class="identifier">char_</span><span class="special">(</span><span class="number">1</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">char_</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">char_</span><span class="special">;</span> |
| <span class="identifier">str</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">,</span> <span class="string">"**"</span> <span class="special">+</span> <span class="identifier">char_</span> <span class="special">+</span> <span class="string">"**"</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">str</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| </pre> |
| <p> |
| The above code displays: |
| </p> |
| <pre class="programlisting">tw**e**t |
| </pre> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| You need to include <code class="literal"><boost/xpressive/regex_actions.hpp></code> |
| to use format expressions. |
| </p></td></tr> |
| </table></div> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches" title="Grammars and Nested Matches">Grammars |
| and Nested Matches</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.overview"></a><h3> |
| <a name="id3112133"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.overview">Overview</a> |
| </h3> |
| <p> |
| One of the key benefits of representing regexes as C++ expressions is the |
| ability to easily refer to other C++ code and data from within the regex. |
| This enables programming idioms that are not possible with other regular |
| expression libraries. Of particular note is the ability for one regex to |
| refer to another regex, allowing you to build grammars out of regular expressions. |
| This section describes how to embed one regex in another by value and by |
| reference, how regex objects behave when they refer to other regexes, and |
| how to access the tree of results after a successful parse. |
| </p> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_value"></a><h3> |
| <a name="id3112160"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_value">Embedding |
| a Regex by Value</a> |
| </h3> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| object has value semantics. When a regex object appears on the right-hand |
| side in the definition of another regex, it is as if the regex were embedded |
| by value; that is, a copy of the nested regex is stored by the enclosing |
| regex. The inner regex is invoked by the outer regex during pattern matching. |
| The inner regex participates fully in the match, back-tracking as needed |
| to make the match succeed. |
| </p> |
| <p> |
| Consider a text editor that has a regex-find feature with a whole-word option. |
| You can implement this with xpressive as follows: |
| </p> |
| <pre class="programlisting"><span class="identifier">find_dialog</span> <span class="identifier">dlg</span><span class="special">;</span> |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">dialog_ok</span> <span class="special">==</span> <span class="identifier">dlg</span><span class="special">.</span><span class="identifier">do_modal</span><span class="special">()</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">pattern</span> <span class="special">=</span> <span class="identifier">dlg</span><span class="special">.</span><span class="identifier">get_text</span><span class="special">();</span> <span class="comment">// the pattern the user entered |
| </span> <span class="keyword">bool</span> <span class="identifier">whole_word</span> <span class="special">=</span> <span class="identifier">dlg</span><span class="special">.</span><span class="identifier">whole_word</span><span class="special">.</span><span class="identifier">is_checked</span><span class="special">();</span> <span class="comment">// did the user select the whole-word option? |
| </span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="identifier">pattern</span> <span class="special">);</span> <span class="comment">// try to compile the pattern |
| </span> |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">whole_word</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="comment">// wrap the regex in begin-word / end-word assertions |
| </span> <span class="identifier">re</span> <span class="special">=</span> <span class="identifier">bow</span> <span class="special">>></span> <span class="identifier">re</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="comment">// ... use re ... |
| </span><span class="special">}</span> |
| </pre> |
| <p> |
| Look closely at this line: |
| </p> |
| <pre class="programlisting"><span class="comment">// wrap the regex in begin-word / end-word assertions |
| </span><span class="identifier">re</span> <span class="special">=</span> <span class="identifier">bow</span> <span class="special">>></span> <span class="identifier">re</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">;</span> |
| </pre> |
| <p> |
| This line creates a new regex that embeds the old regex by value. Then, the |
| new regex is assigned back to the original regex. Since a copy of the old |
| regex was made on the right-hand side, this works as you might expect: the |
| new regex has the behavior of the old regex wrapped in begin- and end-word |
| assertions. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Note that <code class="computeroutput"><span class="identifier">re</span> <span class="special">=</span> |
| <span class="identifier">bow</span> <span class="special">>></span> |
| <span class="identifier">re</span> <span class="special">>></span> |
| <span class="identifier">eow</span></code> does <span class="emphasis"><em>not</em></span> |
| define a recursive regular expression, since regex objects embed by value |
| by default. The next section shows how to define a recursive regular expression |
| by embedding a regex by reference. |
| </p></td></tr> |
| </table></div> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference"></a><h3> |
| <a name="id3112648"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference">Embedding |
| a Regex by Reference</a> |
| </h3> |
| <p> |
| If you want to be able to build recursive regular expressions and context-free |
| grammars, embedding a regex by value is not enough. You need to be able to |
| make your regular expressions self-referential. Most regular expression engines |
| don't give you that power, but xpressive does. |
| </p> |
| <div class="tip"><table border="0" summary="Tip"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> |
| <th align="left">Tip</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| The theoretical computer scientists out there will correctly point out |
| that a self-referential regular expression is not "regular", |
| so in the strict sense, xpressive isn't really a <span class="emphasis"><em>regular</em></span> |
| expression engine at all. But as Larry Wall once said, "the term [regular expression] has |
| grown with the capabilities of our pattern matching engines, so I'm not |
| going to try to fight linguistic necessity here." |
| </p></td></tr> |
| </table></div> |
| <p> |
| Consider the following code, which uses the <code class="computeroutput"><span class="identifier">by_ref</span><span class="special">()</span></code> helper to define a recursive regular expression |
| that matches balanced, nested parentheses: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">parentheses</span><span class="special">;</span> |
| <span class="identifier">parentheses</span> <span class="comment">// A balanced set of parentheses ... |
| </span> <span class="special">=</span> <span class="char">'('</span> <span class="comment">// is an opening parenthesis ... |
| </span> <span class="special">>></span> <span class="comment">// followed by ... |
| </span> <span class="special">*(</span> <span class="comment">// zero or more ... |
| </span> <span class="identifier">keep</span><span class="special">(</span> <span class="special">+~(</span><span class="identifier">set</span><span class="special">=</span><span class="char">'('</span><span class="special">,</span><span class="char">')'</span><span class="special">)</span> <span class="special">)</span> <span class="comment">// of a bunch of things that are not parentheses ... |
| </span> <span class="special">|</span> <span class="comment">// or ... |
| </span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">parentheses</span><span class="special">)</span> <span class="comment">// a balanced set of parentheses |
| </span> <span class="special">)</span> <span class="comment">// (ooh, recursion!) ... |
| </span> <span class="special">>></span> <span class="comment">// followed by ... |
| </span> <span class="char">')'</span> <span class="comment">// a closing parenthesis |
| </span> <span class="special">;</span> |
| </pre> |
| <p> |
| Matching balanced, nested tags is an important text processing task, and |
| it is one that "classic" regular expressions cannot do. The <code class="computeroutput"><span class="identifier">by_ref</span><span class="special">()</span></code> |
| helper makes it possible. It allows one regex object to be embedded in another |
| <span class="emphasis"><em>by reference</em></span>. Since the right-hand side holds <code class="computeroutput"><span class="identifier">parentheses</span></code> by reference, assigning the |
| right-hand side back to <code class="computeroutput"><span class="identifier">parentheses</span></code> |
| creates a cycle, which will execute recursively. |
| </p> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.building_a_grammar"></a><h3> |
| <a name="id3112962"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.building_a_grammar">Building |
| a Grammar</a> |
| </h3> |
| <p> |
| Once we allow self-reference in our regular expressions, the genie is out |
| of the bottle and all manner of fun things are possible. In particular, we |
| can now build grammars out of regular expressions. Let's have a look at the |
| text-book grammar example: the humble calculator. |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">group</span><span class="special">,</span> <span class="identifier">factor</span><span class="special">,</span> <span class="identifier">term</span><span class="special">,</span> <span class="identifier">expression</span><span class="special">;</span> |
| |
| <span class="identifier">group</span> <span class="special">=</span> <span class="char">'('</span> <span class="special">>></span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">expression</span><span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> |
| <span class="identifier">factor</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span> <span class="special">|</span> <span class="identifier">group</span><span class="special">;</span> |
| <span class="identifier">term</span> <span class="special">=</span> <span class="identifier">factor</span> <span class="special">>></span> <span class="special">*((</span><span class="char">'*'</span> <span class="special">>></span> <span class="identifier">factor</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><span class="char">'/'</span> <span class="special">>></span> <span class="identifier">factor</span><span class="special">));</span> |
| <span class="identifier">expression</span> <span class="special">=</span> <span class="identifier">term</span> <span class="special">>></span> <span class="special">*((</span><span class="char">'+'</span> <span class="special">>></span> <span class="identifier">term</span><span class="special">)</span> <span class="special">|</span> <span class="special">(</span><span class="char">'-'</span> <span class="special">>></span> <span class="identifier">term</span><span class="special">));</span> |
| </pre> |
| <p> |
| The regex <code class="computeroutput"><span class="identifier">expression</span></code> defined |
| above does something rather remarkable for a regular expression: it matches |
| mathematical expressions. For example, if the input string were <code class="computeroutput"><span class="string">"foo 9*(10+3) bar"</span></code>, this pattern |
| would match <code class="computeroutput"><span class="string">"9*(10+3)"</span></code>. |
| It only matches well-formed mathematical expressions, where the parentheses |
| are balanced and the infix operators have two arguments each. Don't try this |
| with just any regular expression engine! |
| </p> |
| <p> |
| Let's take a closer look at this regular expression grammar. Notice that |
| it is cyclic: <code class="computeroutput"><span class="identifier">expression</span></code> |
| is implemented in terms of <code class="computeroutput"><span class="identifier">term</span></code>, |
| which is implemented in terms of <code class="computeroutput"><span class="identifier">factor</span></code>, |
| which is implemented in terms of <code class="computeroutput"><span class="identifier">group</span></code>, |
| which is implemented in terms of <code class="computeroutput"><span class="identifier">expression</span></code>, |
| closing the loop. In general, the way to define a cyclic grammar is to forward-declare |
| the regex objects and embed by reference those regular expressions that have |
| not yet been initialized. In the above grammar, there is only one place where |
| we need to reference a regex object that has not yet been initialized: the |
| definition of <code class="computeroutput"><span class="identifier">group</span></code>. In that |
| place, we use <code class="computeroutput"><span class="identifier">by_ref</span><span class="special">()</span></code> |
| to embed <code class="computeroutput"><span class="identifier">expression</span></code> by reference. |
| In all other places, it is sufficient to embed the other regex objects by |
| value, since they have already been initialized and their values will not |
| change. |
| </p> |
| <div class="tip"><table border="0" summary="Tip"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> |
| <th align="left">Tip</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| <span class="bold"><strong>Embed by value if possible</strong></span> <br> <br> |
| In general, prefer embedding regular expressions by value rather than by |
| reference. It involves one less indirection, making your patterns match |
| a little faster. Besides, value semantics are simpler and will make your |
| grammars easier to reason about. Don't worry about the expense of "copying" |
| a regex. Each regex object shares its implementation with all of its copies. |
| </p></td></tr> |
| </table></div> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.dynamic_regex_grammars"></a><h3> |
| <a name="id3113448"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.dynamic_regex_grammars">Dynamic |
| Regex Grammars</a> |
| </h3> |
| <p> |
| Using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>, |
| you can also build grammars out of dynamic regular expressions. You do that |
| by creating named regexes, and referring to other regexes by name. Each |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code> |
| instance keeps a mapping from names to regexes that have been created with |
| it. |
| </p> |
| <p> |
| You can create a named dynamic regex by prefacing your regex with <code class="computeroutput"><span class="string">"(?$name=)"</span></code>, where <span class="emphasis"><em>name</em></span> |
| is the name of the regex. You can refer to a named regex from another regex |
| with <code class="computeroutput"><span class="string">"(?$name)"</span></code>. The |
| named regex does not need to exist yet at the time it is referenced in another |
| regex, but it must exist by the time you use the regex. |
| </p> |
| <p> |
| Below is a code fragment that uses dynamic regex grammars to implement the |
| calculator example from above. |
| </p> |
| <pre class="programlisting"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">regex_constants</span><span class="special">;</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">expr</span><span class="special">;</span> |
| |
| <span class="special">{</span> |
| <span class="identifier">sregex_compiler</span> <span class="identifier">compiler</span><span class="special">;</span> |
| <span class="identifier">syntax_option_type</span> <span class="identifier">x</span> <span class="special">=</span> <span class="identifier">ignore_white_space</span><span class="special">;</span> |
| |
| <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $group = ) \\( (? $expr ) \\) "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> |
| <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $factor = ) \\d+ | (? $group ) "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> |
| <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $term = ) (? $factor )"</span> |
| <span class="string">" ( \\* (? $factor ) | / (? $factor ) )* "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> |
| <span class="identifier">expr</span> <span class="special">=</span> <span class="identifier">compiler</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span><span class="string">"(? $expr = ) (? $term )"</span> |
| <span class="string">" ( \\+ (? $term ) | - (? $term ) )* "</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span> |
| <span class="special">}</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"foo 9*(10+3) bar"</span><span class="special">);</span> |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| |
| <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">expr</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="comment">// This prints "9*(10+3)": |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| As with static regex grammars, nested regex invocations create nested match |
| results (see <span class="emphasis"><em>Nested Results</em></span> below). The result is a |
| complete parse tree for string that matched. Unlike static regexes, dynamic |
| regexes are always embedded by reference, not by value. |
| </p> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.cyclic_patterns__copying_and_memory_management__oh_my_"></a><h3> |
| <a name="id3114027"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.cyclic_patterns__copying_and_memory_management__oh_my_">Cyclic |
| Patterns, Copying and Memory Management, Oh My!</a> |
| </h3> |
| <p> |
| The calculator examples above raises a number of very complicated memory-management |
| issues. Each of the four regex objects refer to each other, some directly |
| and some indirectly, some by value and some by reference. What if we were |
| to return one of them from a function and let the others go out of scope? |
| What becomes of the references? The answer is that the regex objects are |
| internally reference counted, such that they keep their referenced regex |
| objects alive as long as they need them. So passing a regex object by value |
| is never a problem, even if it refers to other regex objects that have gone |
| out of scope. |
| </p> |
| <p> |
| Those of you who have dealt with reference counting are probably familiar |
| with its Achilles Heel: cyclic references. If regex objects are reference |
| counted, what happens to cycles like the one created in the calculator examples? |
| Are they leaked? The answer is no, they are not leaked. The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| object has some tricky reference tracking code that ensures that even cyclic |
| regex grammars are cleaned up when the last external reference goes away. |
| So don't worry about it. Create cyclic grammars, pass your regex objects |
| around and copy them all you want. It is fast and efficient and guaranteed |
| not to leak or result in dangling references. |
| </p> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_regexes_and_sub_match_scoping"></a><h3> |
| <a name="id3114084"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_regexes_and_sub_match_scoping">Nested |
| Regexes and Sub-Match Scoping</a> |
| </h3> |
| <p> |
| Nested regular expressions raise the issue of sub-match scoping. If both |
| the inner and outer regex write to and read from the same sub-match vector, |
| chaos would ensue. The inner regex would stomp on the sub-matches written |
| by the outer regex. For example, what does this do? |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">inner</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(.)\\1"</span> <span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">outer</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="identifier">_</span><span class="special">)</span> <span class="special">>></span> <span class="identifier">inner</span> <span class="special">>></span> <span class="identifier">s1</span><span class="special">;</span> |
| </pre> |
| <p> |
| The author probably didn't intend for the inner regex to overwrite the sub-match |
| written by the outer regex. The problem is particularly acute when the inner |
| regex is accepted from the user as input. The author has no way of knowing |
| whether the inner regex will stomp the sub-match vector or not. This is clearly |
| not acceptable. |
| </p> |
| <p> |
| Instead, what actually happens is that each invocation of a nested regex |
| gets its own scope. Sub-matches belong to that scope. That is, each nested |
| regex invocation gets its own copy of the sub-match vector to play with, |
| so there is no way for an inner regex to stomp on the sub-matches of an outer |
| regex. So, for example, the regex <code class="computeroutput"><span class="identifier">outer</span></code> |
| defined above would match <code class="computeroutput"><span class="string">"ABBA"</span></code>, |
| as it should. |
| </p> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_results"></a><h3> |
| <a name="id3114265"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.nested_results">Nested |
| Results</a> |
| </h3> |
| <p> |
| If nested regexes have their own sub-matches, there should be a way to access |
| them after a successful match. In fact, there is. After a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code>, |
| the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| struct behaves like the head of a tree of nested results. The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| class provides a <code class="computeroutput"><span class="identifier">nested_results</span><span class="special">()</span></code> member function that returns an ordered |
| sequence of <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| structures, representing the results of the nested regexes. The order of |
| the nested results is the same as the order in which the nested regex objects |
| matched. |
| </p> |
| <p> |
| Take as an example the regex for balanced, nested parentheses we saw earlier: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">parentheses</span><span class="special">;</span> |
| <span class="identifier">parentheses</span> <span class="special">=</span> <span class="char">'('</span> <span class="special">>></span> <span class="special">*(</span> <span class="identifier">keep</span><span class="special">(</span> <span class="special">+~(</span><span class="identifier">set</span><span class="special">=</span><span class="char">'('</span><span class="special">,</span><span class="char">')'</span><span class="special">)</span> <span class="special">)</span> <span class="special">|</span> <span class="identifier">by_ref</span><span class="special">(</span><span class="identifier">parentheses</span><span class="special">)</span> <span class="special">)</span> <span class="special">>></span> <span class="char">')'</span><span class="special">;</span> |
| |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"blah blah( a(b)c (c(e)f (g)h )i (j)6 )blah"</span> <span class="special">);</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_search</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">parentheses</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="comment">// display the whole match |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| |
| <span class="comment">// display the nested results |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> |
| <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">begin</span><span class="special">(),</span> |
| <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">end</span><span class="special">(),</span> |
| <span class="identifier">output_nested_results</span><span class="special">()</span> <span class="special">);</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program displays the following: |
| </p> |
| <pre class="programlisting">( a(b)c (c(e)f (g)h )i (j)6 ) |
| (b) |
| (c(e)f (g)h ) |
| (e) |
| (g) |
| (j) |
| </pre> |
| <p> |
| Here you can see how the results are nested and that they are stored in the |
| order in which they are found. |
| </p> |
| <div class="tip"><table border="0" summary="Tip"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../doc/src/images/tip.png"></td> |
| <th align="left">Tip</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| See the definition of <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.display_a_tree_of_nested_results">output_nested_results</a> |
| in the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">Examples</a> |
| section. |
| </p></td></tr> |
| </table></div> |
| <a name="boost_xpressive.user_s_guide.grammars_and_nested_matches.filtering_nested_results"></a><h3> |
| <a name="id3114840"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.filtering_nested_results">Filtering |
| Nested Results</a> |
| </h3> |
| <p> |
| Sometimes a regex will have several nested regex objects, and you want to |
| know which result corresponds to which regex object. That's where <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">regex_id</span><span class="special">()</span></code> |
| and <code class="computeroutput"><span class="identifier">match_results</span><span class="special"><>::</span><span class="identifier">regex_id</span><span class="special">()</span></code> |
| come in handy. When iterating over the nested results, you can compare the |
| regex id from the results to the id of the regex object you're interested |
| in. |
| </p> |
| <p> |
| To make this a bit easier, xpressive provides a predicate to make it simple |
| to iterate over just the results that correspond to a certain nested regex. |
| It is called <code class="computeroutput"><span class="identifier">regex_id_filter_predicate</span></code>, |
| and it is intended to be used with <a href="../../../libs/iterator/doc/index.html" target="_top">Boost.Iterator</a>. |
| You can use it as follows: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">name</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">alpha</span><span class="special">;</span> |
| <span class="identifier">sregex</span> <span class="identifier">integer</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">;</span> |
| <span class="identifier">sregex</span> <span class="identifier">re</span> <span class="special">=</span> <span class="special">*(</span> <span class="special">*</span><span class="identifier">_s</span> <span class="special">>></span> <span class="special">(</span> <span class="identifier">name</span> <span class="special">|</span> <span class="identifier">integer</span> <span class="special">)</span> <span class="special">);</span> |
| |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"marsha 123 jan 456 cindy 789"</span> <span class="special">);</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">re</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">smatch</span><span class="special">::</span><span class="identifier">nested_results_type</span><span class="special">::</span><span class="identifier">const_iterator</span> <span class="identifier">begin</span> <span class="special">=</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">begin</span><span class="special">();</span> |
| <span class="identifier">smatch</span><span class="special">::</span><span class="identifier">nested_results_type</span><span class="special">::</span><span class="identifier">const_iterator</span> <span class="identifier">end</span> <span class="special">=</span> <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">end</span><span class="special">();</span> |
| |
| <span class="comment">// declare filter predicates to select just the names or the integers |
| </span> <span class="identifier">sregex_id_filter_predicate</span> <span class="identifier">name_id</span><span class="special">(</span> <span class="identifier">name</span><span class="special">.</span><span class="identifier">regex_id</span><span class="special">()</span> <span class="special">);</span> |
| <span class="identifier">sregex_id_filter_predicate</span> <span class="identifier">integer_id</span><span class="special">(</span> <span class="identifier">integer</span><span class="special">.</span><span class="identifier">regex_id</span><span class="special">()</span> <span class="special">);</span> |
| |
| <span class="comment">// iterate over only the results from the name regex |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> |
| <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">name_id</span><span class="special">,</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> |
| <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">name_id</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> |
| <span class="identifier">output_result</span> |
| <span class="special">);</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| |
| <span class="comment">// iterate over only the results from the integer regex |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> |
| <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">integer_id</span><span class="special">,</span> <span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> |
| <span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">(</span> <span class="identifier">integer_id</span><span class="special">,</span> <span class="identifier">end</span><span class="special">,</span> <span class="identifier">end</span> <span class="special">),</span> |
| <span class="identifier">output_result</span> |
| <span class="special">);</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| where <code class="computeroutput"><span class="identifier">output_results</span></code> is a |
| simple function that takes a <code class="computeroutput"><span class="identifier">smatch</span></code> |
| and displays the full match. Notice how we use the <code class="computeroutput"><span class="identifier">regex_id_filter_predicate</span></code> |
| together with <code class="computeroutput"><span class="identifier">basic_regex</span><span class="special"><>::</span><span class="identifier">regex_id</span><span class="special">()</span></code> and <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">make_filter_iterator</span><span class="special">()</span></code> from the <a href="../../../libs/iterator/doc/index.html" target="_top">Boost.Iterator</a> |
| to select only those results corresponding to a particular nested regex. |
| This program displays the following: |
| </p> |
| <pre class="programlisting">marsha |
| jan |
| cindy |
| 123 |
| 456 |
| 789 |
| </pre> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions" title="Semantic Actions and User-Defined Assertions">Semantic |
| Actions and User-Defined Assertions</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.overview"></a><h3> |
| <a name="id3115810"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.overview">Overview</a> |
| </h3> |
| <p> |
| Imagine you want to parse an input string and build a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code> |
| from it. For something like that, matching a regular expression isn't enough. |
| You want to <span class="emphasis"><em>do something</em></span> when parts of your regular |
| expression match. Xpressive lets you attach semantic actions to parts of |
| your static regular expressions. This section shows you how. |
| </p> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.semantic_actions"></a><h3> |
| <a name="id3115869"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.semantic_actions">Semantic |
| Actions</a> |
| </h3> |
| <p> |
| Consider the following code, which uses xpressive's semantic actions to parse |
| a string of word/integer pairs and stuffs them into a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>. |
| It is described below. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">result</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"aaa=>1 bbb=>23 ccc=>456"</span><span class="special">);</span> |
| |
| <span class="comment">// Match a word and an integer, separated by =>, |
| </span> <span class="comment">// and then stuff the result into a std::map<> |
| </span> <span class="identifier">sregex</span> <span class="identifier">pair</span> <span class="special">=</span> <span class="special">(</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="string">"=>"</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">)</span> |
| <span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">];</span> |
| |
| <span class="comment">// Match one or more word/integer pairs, separated |
| </span> <span class="comment">// by whitespace. |
| </span> <span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">pair</span> <span class="special">>></span> <span class="special">*(+</span><span class="identifier">_s</span> <span class="special">>></span> <span class="identifier">pair</span><span class="special">);</span> |
| |
| <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"aaa"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"bbb"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"ccc"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program prints the following: |
| </p> |
| <pre class="programlisting">1 |
| 23 |
| 456 |
| </pre> |
| <p> |
| The regular expression <code class="computeroutput"><span class="identifier">pair</span></code> |
| has two parts: the pattern and the action. The pattern says to match a word, |
| capturing it in sub-match 1, and an integer, capturing it in sub-match 2, |
| separated by <code class="computeroutput"><span class="string">"=>"</span></code>. |
| The action is the part in square brackets: <code class="computeroutput"><span class="special">[</span> |
| <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> |
| <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">]</span></code>. It says |
| to take sub-match one and use it to index into the <code class="computeroutput"><span class="identifier">results</span></code> |
| map, and assign to it the result of converting sub-match 2 to an integer. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| To use semantic actions with your static regexes, you must <code class="computeroutput"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code> |
| </p></td></tr> |
| </table></div> |
| <p> |
| How does this work? Just as the rest of the static regular expression, the |
| part between brackets is an expression template. It encodes the action and |
| executes it later. The expression <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)</span></code> creates a lazy reference to the <code class="computeroutput"><span class="identifier">result</span></code> object. The larger expression <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span></code> |
| is a lazy map index operation. Later, when this action is getting executed, |
| <code class="computeroutput"><span class="identifier">s1</span></code> gets replaced with the |
| first <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>. |
| Likewise, when <code class="computeroutput"><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span></code> gets executed, <code class="computeroutput"><span class="identifier">s2</span></code> |
| is replaced with the second <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>. |
| The <code class="computeroutput"><span class="identifier">as</span><span class="special"><></span></code> |
| action converts its argument to the requested type using Boost.Lexical_cast. |
| The effect of the whole action is to insert a new word/integer pair into |
| the map. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| There is an important difference between the function <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> in <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">ref</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code> |
| and <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> |
| in <code class="computeroutput"><span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span></code>. The first returns a plain <code class="computeroutput"><span class="identifier">reference_wrapper</span><span class="special"><></span></code> |
| which behaves in many respects like an ordinary reference. By contrast, |
| <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code> |
| returns a <span class="emphasis"><em>lazy</em></span> reference that you can use in expressions |
| that are executed lazily. That is why we can say <code class="computeroutput"><span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)[</span><span class="identifier">s1</span><span class="special">]</span></code>, even though <code class="computeroutput"><span class="identifier">result</span></code> |
| doesn't have an <code class="computeroutput"><span class="keyword">operator</span><span class="special">[]</span></code> |
| that would accept <code class="computeroutput"><span class="identifier">s1</span></code>. |
| </p></td></tr> |
| </table></div> |
| <p> |
| In addition to the sub-match placeholders <code class="computeroutput"><span class="identifier">s1</span></code>, |
| <code class="computeroutput"><span class="identifier">s2</span></code>, etc., you can also use |
| the placeholder <code class="computeroutput"><span class="identifier">_</span></code> within |
| an action to refer back to the string matched by the sub-expression to which |
| the action is attached. For instance, you can use the following regex to |
| match a bunch of digits, interpret them as an integer and assign the result |
| to a local variable: |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="comment">// Here, _ refers back to all the |
| </span><span class="comment">// characters matched by (+_d) |
| </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(+</span><span class="identifier">_d</span><span class="special">)[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">];</span> |
| </pre> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_action_execution"></a><h4> |
| <a name="id3117436"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_action_execution">Lazy |
| Action Execution</a> |
| </h4> |
| <p> |
| What does it mean, exactly, to attach an action to part of a regular expression |
| and perform a match? When does the action execute? If the action is part |
| of a repeated sub-expression, does the action execute once or many times? |
| And if the sub-expression initially matches, but ultimately fails because |
| the rest of the regular expression fails to match, is the action executed |
| at all? |
| </p> |
| <p> |
| The answer is that by default, actions are executed <span class="emphasis"><em>lazily</em></span>. |
| When a sub-expression matches a string, its action is placed on a queue, |
| along with the current values of any sub-matches to which the action refers. |
| If the match algorithm must backtrack, actions are popped off the queue as |
| necessary. Only after the entire regex has matched successfully are the actions |
| actually exeucted. They are executed all at once, in the order in which they |
| were added to the queue, as the last step before <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| returns. |
| </p> |
| <p> |
| For example, consider the following regex that increments a counter whenever |
| it finds a digit. |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"1!2!3?"</span><span class="special">);</span> |
| <span class="comment">// count the exciting digits, but not the |
| </span><span class="comment">// questionable ones. |
| </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> |
| <span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> |
| <span class="identifier">assert</span><span class="special">(</span> <span class="identifier">i</span> <span class="special">==</span> <span class="number">2</span> <span class="special">);</span> |
| </pre> |
| <p> |
| The action <code class="computeroutput"><span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span></code> |
| is queued three times: once for each found digit. But it is only <span class="emphasis"><em>executed</em></span> |
| twice: once for each digit that precedes a <code class="computeroutput"><span class="char">'!'</span></code> |
| character. When the <code class="computeroutput"><span class="char">'?'</span></code> character |
| is encountered, the match algorithm backtracks, removing the final action |
| from the queue. |
| </p> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.immediate_action_execution"></a><h4> |
| <a name="id3117770"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.immediate_action_execution">Immediate |
| Action Execution</a> |
| </h4> |
| <p> |
| When you want semantic actions to execute immediately, you can wrap the sub-expression |
| containing the action in a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/keep.html" title="Function template keep">keep()</a></code></code>. |
| <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code> |
| turns off back-tracking for its sub-expression, but it also causes any actions |
| queued by the sub-expression to execute at the end of the <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code>. It is as if the sub-expression in the |
| <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code> |
| were compiled into an independent regex object, and matching the <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code> |
| is like a separate invocation of <code class="computeroutput"><span class="identifier">regex_search</span><span class="special">()</span></code>. It matches characters and executes actions |
| but never backtracks or unwinds. For example, imagine the above example had |
| been written as follows: |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"1!2!3?"</span><span class="special">);</span> |
| <span class="comment">// count all the digits. |
| </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">keep</span><span class="special">(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">)</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> |
| <span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> |
| <span class="identifier">assert</span><span class="special">(</span> <span class="identifier">i</span> <span class="special">==</span> <span class="number">3</span> <span class="special">);</span> |
| </pre> |
| <p> |
| We have wrapped the sub-expression <code class="computeroutput"><span class="identifier">_d</span> |
| <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span></code> in <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code>. |
| Now, whenever this regex matches a digit, the action will be queued and then |
| immediately executed before we try to match a <code class="computeroutput"><span class="char">'!'</span></code> |
| character. In this case, the action executes three times. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Like <code class="computeroutput"><span class="identifier">keep</span><span class="special">()</span></code>, |
| actions within <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/before.html" title="Function template before">before()</a></code></code> |
| and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/after.html" title="Function template after">after()</a></code></code> |
| are also executed early when their sub-expressions have matched. |
| </p></td></tr> |
| </table></div> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_functions"></a><h4> |
| <a name="id3118228"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.lazy_functions">Lazy |
| Functions</a> |
| </h4> |
| <p> |
| So far, we've seen how to write semantic actions consisting of variables |
| and operators. But what if you want to be able to call a function from a |
| semantic action? Xpressive provides a mechanism to do this. |
| </p> |
| <p> |
| The first step is to define a function object type. Here, for instance, is |
| a function object type that calls <code class="computeroutput"><span class="identifier">push</span><span class="special">()</span></code> on its argument: |
| </p> |
| <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">push_impl</span> |
| <span class="special">{</span> |
| <span class="comment">// Result type, needed for tr1::result_of |
| </span> <span class="keyword">typedef</span> <span class="keyword">void</span> <span class="identifier">result_type</span><span class="special">;</span> |
| |
| <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Sequence</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Value</span><span class="special">></span> |
| <span class="keyword">void</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">Sequence</span> <span class="special">&</span><span class="identifier">seq</span><span class="special">,</span> <span class="identifier">Value</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">val</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="identifier">seq</span><span class="special">.</span><span class="identifier">push</span><span class="special">(</span><span class="identifier">val</span><span class="special">);</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| The next step is to use xpressive's <code class="computeroutput"><span class="identifier">function</span><span class="special"><></span></code> template to define a function object |
| named <code class="computeroutput"><span class="identifier">push</span></code>: |
| </p> |
| <pre class="programlisting"><span class="comment">// Global "push" function object. |
| </span><span class="identifier">function</span><span class="special"><</span><span class="identifier">push_impl</span><span class="special">>::</span><span class="identifier">type</span> <span class="keyword">const</span> <span class="identifier">push</span> <span class="special">=</span> <span class="special">{{}};</span> |
| </pre> |
| <p> |
| The initialization looks a bit odd, but this is because <code class="computeroutput"><span class="identifier">push</span></code> |
| is being statically initialized. That means it doesn't need to be constructed |
| at runtime. We can use <code class="computeroutput"><span class="identifier">push</span></code> |
| in semantic actions as follows: |
| </p> |
| <pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">stack</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">ints</span><span class="special">;</span> |
| <span class="comment">// Match digits, cast them to an int |
| </span><span class="comment">// and push it on the stack. |
| </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(+</span><span class="identifier">_d</span><span class="special">)[</span><span class="identifier">push</span><span class="special">(</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">ints</span><span class="special">),</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">))];</span> |
| </pre> |
| <p> |
| You'll notice that doing it this way causes member function invocations to |
| look like ordinary function invocations. You can choose to write your semantic |
| action in a different way that makes it look a bit more like a member function |
| call: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">(+</span><span class="identifier">_d</span><span class="special">)[</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">ints</span><span class="special">)->*</span><span class="identifier">push</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">))];</span> |
| </pre> |
| <p> |
| Xpressive recognizes the use of the <code class="computeroutput"><span class="special">->*</span></code> |
| and treats this expression exactly the same as the one above. |
| </p> |
| <p> |
| When your function object must return a type that depends on its arguments, |
| you can use a <code class="computeroutput"><span class="identifier">result</span><span class="special"><></span></code> |
| member template instead of the <code class="computeroutput"><span class="identifier">result_type</span></code> |
| typedef. Here, for example, is a <code class="computeroutput"><span class="identifier">first</span></code> |
| function object that returns the <code class="computeroutput"><span class="identifier">first</span></code> |
| member of a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">pair</span><span class="special"><></span></code> |
| or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code>: |
| </p> |
| <pre class="programlisting"><span class="comment">// Function object that returns the |
| </span><span class="comment">// first element of a pair. |
| </span><span class="keyword">struct</span> <span class="identifier">first_impl</span> |
| <span class="special">{</span> |
| <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Sig</span><span class="special">></span> <span class="keyword">struct</span> <span class="identifier">result</span> <span class="special">{};</span> |
| |
| <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">This</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Pair</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">result</span><span class="special"><</span><span class="identifier">This</span><span class="special">(</span><span class="identifier">Pair</span><span class="special">)></span> |
| <span class="special">{</span> |
| <span class="keyword">typedef</span> <span class="keyword">typename</span> <span class="identifier">remove_reference</span><span class="special"><</span><span class="identifier">Pair</span><span class="special">></span> |
| <span class="special">::</span><span class="identifier">type</span><span class="special">::</span><span class="identifier">first_type</span> <span class="identifier">type</span><span class="special">;</span> |
| <span class="special">};</span> |
| |
| <span class="keyword">template</span><span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Pair</span><span class="special">></span> |
| <span class="keyword">typename</span> <span class="identifier">Pair</span><span class="special">::</span><span class="identifier">first_type</span> |
| <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">Pair</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">p</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">p</span><span class="special">.</span><span class="identifier">first</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| |
| <span class="comment">// OK, use as first(s1) to get the begin iterator |
| </span><span class="comment">// of the sub-match referred to by s1. |
| </span><span class="identifier">function</span><span class="special"><</span><span class="identifier">first_impl</span><span class="special">>::</span><span class="identifier">type</span> <span class="keyword">const</span> <span class="identifier">first</span> <span class="special">=</span> <span class="special">{{}};</span> |
| </pre> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_local_variables"></a><h4> |
| <a name="id3119309"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_local_variables">Referring |
| to Local Variables</a> |
| </h4> |
| <p> |
| As we've seen in the examples above, we can refer to local variables within |
| an actions using <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">ref</span><span class="special">()</span></code>. |
| Any such variables are held by reference by the regular expression, and care |
| should be taken to avoid letting those references dangle. For instance, in |
| the following code, the reference to <code class="computeroutput"><span class="identifier">i</span></code> |
| is left to dangle when <code class="computeroutput"><span class="identifier">bad_voodoo</span><span class="special">()</span></code> returns: |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">bad_voodoo</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> |
| <span class="comment">// ERROR! rex refers by reference to a local |
| </span> <span class="comment">// variable, which will dangle after bad_voodoo() |
| </span> <span class="comment">// returns. |
| </span> <span class="keyword">return</span> <span class="identifier">rex</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| When writing semantic actions, it is your responsibility to make sure that |
| all the references do not dangle. One way to do that would be to make the |
| variables shared pointers that are held by the regex by value. |
| </p> |
| <pre class="programlisting"><span class="identifier">sregex</span> <span class="identifier">good_voodoo</span><span class="special">(</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">pi</span><span class="special">)</span> |
| <span class="special">{</span> |
| <span class="comment">// Use val() to hold the shared_ptr by value: |
| </span> <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++*</span><span class="identifier">val</span><span class="special">(</span><span class="identifier">pi</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> |
| <span class="comment">// OK, rex holds a reference count to the integer. |
| </span> <span class="keyword">return</span> <span class="identifier">rex</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| In the above code, we use <code class="computeroutput"><span class="identifier">xpressive</span><span class="special">::</span><span class="identifier">val</span><span class="special">()</span></code> |
| to hold the shared pointer by value. That's not normally necessary because |
| local variables appearing in actions are held by value by default, but in |
| this case, it is necessary. Had we written the action as <code class="computeroutput"><span class="special">++*</span><span class="identifier">pi</span></code>, it would have executed immediately. |
| That's because <code class="computeroutput"><span class="special">++*</span><span class="identifier">pi</span></code> |
| is not an expression template, but <code class="computeroutput"><span class="special">++*</span><span class="identifier">val</span><span class="special">(</span><span class="identifier">pi</span><span class="special">)</span></code> is. |
| </p> |
| <p> |
| It can be tedious to wrap all your variables in <code class="computeroutput"><span class="identifier">ref</span><span class="special">()</span></code> and <code class="computeroutput"><span class="identifier">val</span><span class="special">()</span></code> in your semantic actions. Xpressive provides |
| the <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code> |
| and <code class="computeroutput"><span class="identifier">value</span><span class="special"><></span></code> |
| templates to make things easier. The following table shows the equivalencies: |
| </p> |
| <div class="table"> |
| <a name="id3119864"></a><p class="title"><b>Table 29.12. reference<> and value<></b></p> |
| <div class="table-contents"><table class="table" summary="reference<> and value<>"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| This ... |
| </p> |
| </th> |
| <th> |
| <p> |
| ... is equivalent to this ... |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">i</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> |
| <p> |
| </p> |
| </td> |
| <td> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="identifier">reference</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">ri</span><span class="special">(</span><span class="identifier">i</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ri</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> |
| <p> |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">pi</span><span class="special">(</span><span class="keyword">new</span> <span class="keyword">int</span><span class="special">(</span><span class="number">0</span><span class="special">));</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++*</span><span class="identifier">val</span><span class="special">(</span><span class="identifier">pi</span><span class="special">)</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> |
| <p> |
| </p> |
| </td> |
| <td> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">pi</span><span class="special">(</span><span class="keyword">new</span> <span class="keyword">int</span><span class="special">(</span><span class="number">0</span><span class="special">));</span> |
| <span class="identifier">value</span><span class="special"><</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">shared_ptr</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="special">></span> <span class="identifier">vpi</span><span class="special">(</span><span class="identifier">pi</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++*</span><span class="identifier">vpi</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> |
| <p> |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| As you can see, when using <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code>, you need to first declare a local |
| variable and then declare a <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code> to it. These two steps can be combined |
| into one using <code class="computeroutput"><span class="identifier">local</span><span class="special"><></span></code>. |
| </p> |
| <div class="table"> |
| <a name="id3120542"></a><p class="title"><b>Table 29.13. local<> vs. reference<></b></p> |
| <div class="table-contents"><table class="table" summary="local<> vs. reference<>"> |
| <colgroup> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| This ... |
| </p> |
| </th> |
| <th> |
| <p> |
| ... is equivalent to this ... |
| </p> |
| </th> |
| </tr></thead> |
| <tbody><tr> |
| <td> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="identifier">local</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">i</span><span class="special">(</span><span class="number">0</span><span class="special">);</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">i</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> |
| <p> |
| </p> |
| </td> |
| <td> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="identifier">reference</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">ri</span><span class="special">(</span><span class="identifier">i</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">ri</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span></pre> |
| <p> |
| </p> |
| </td> |
| </tr></tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><p> |
| We can use <code class="computeroutput"><span class="identifier">local</span><span class="special"><></span></code> |
| to rewrite the above example as follows: |
| </p> |
| <pre class="programlisting"><span class="identifier">local</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">i</span><span class="special">(</span><span class="number">0</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"1!2!3?"</span><span class="special">);</span> |
| <span class="comment">// count the exciting digits, but not the |
| </span><span class="comment">// questionable ones. |
| </span><span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="special">+(</span> <span class="identifier">_d</span> <span class="special">[</span> <span class="special">++</span><span class="identifier">i</span> <span class="special">]</span> <span class="special">>></span> <span class="char">'!'</span> <span class="special">);</span> |
| <span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> |
| <span class="identifier">assert</span><span class="special">(</span> <span class="identifier">i</span><span class="special">.</span><span class="identifier">get</span><span class="special">()</span> <span class="special">==</span> <span class="number">2</span> <span class="special">);</span> |
| </pre> |
| <p> |
| Notice that we use <code class="computeroutput"><span class="identifier">local</span><span class="special"><>::</span><span class="identifier">get</span><span class="special">()</span></code> to access the value of the local variable. |
| Also, beware that <code class="computeroutput"><span class="identifier">local</span><span class="special"><></span></code> |
| can be used to create a dangling reference, just as <code class="computeroutput"><span class="identifier">reference</span><span class="special"><></span></code> can. |
| </p> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_non_local_variables"></a><h4> |
| <a name="id3121134"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.referring_to_non_local_variables">Referring |
| to Non-Local Variables</a> |
| </h4> |
| <p> |
| In the beginning of this section, we used a regex with a semantic action |
| to parse a string of word/integer pairs and stuff them into a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>. That required that the map and the |
| regex be defined together and used before either could go out of scope. What |
| if we wanted to define the regex once and use it to fill lots of different |
| maps? We would rather pass the map into the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| algorithm rather than embed a reference to it directly in the regex object. |
| What we can do instead is define a placeholder and use that in the semantic |
| action instead of the map itself. Later, when we call one of the regex algorithms, |
| we can bind the reference to an actual map object. The following code shows |
| how. |
| </p> |
| <pre class="programlisting"><span class="comment">// Define a placeholder for a map object: |
| </span><span class="identifier">placeholder</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="special">></span> <span class="identifier">_map</span><span class="special">;</span> |
| |
| <span class="comment">// Match a word and an integer, separated by =>, |
| </span><span class="comment">// and then stuff the result into a std::map<> |
| </span><span class="identifier">sregex</span> <span class="identifier">pair</span> <span class="special">=</span> <span class="special">(</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="string">"=>"</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">)</span> |
| <span class="special">[</span> <span class="identifier">_map</span><span class="special">[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">];</span> |
| |
| <span class="comment">// Match one or more word/integer pairs, separated |
| </span><span class="comment">// by whitespace. |
| </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="identifier">pair</span> <span class="special">>></span> <span class="special">*(+</span><span class="identifier">_s</span> <span class="special">>></span> <span class="identifier">pair</span><span class="special">);</span> |
| |
| <span class="comment">// The string to parse |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"aaa=>1 bbb=>23 ccc=>456"</span><span class="special">);</span> |
| |
| <span class="comment">// Here is the actual map to fill in: |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">result</span><span class="special">;</span> |
| |
| <span class="comment">// Bind the _map placeholder to the actual map |
| </span><span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="identifier">what</span><span class="special">.</span><span class="identifier">let</span><span class="special">(</span> <span class="identifier">_map</span> <span class="special">=</span> <span class="identifier">result</span> <span class="special">);</span> |
| |
| <span class="comment">// Execute the match and fill in result map |
| </span><span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"aaa"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"bbb"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"ccc"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program displays: |
| </p> |
| <pre class="programlisting">1 |
| 23 |
| 456 |
| </pre> |
| <p> |
| We use <code class="computeroutput"><span class="identifier">placeholder</span><span class="special"><></span></code> |
| here to define <code class="computeroutput"><span class="identifier">_map</span></code>, which |
| stands in for a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code> |
| variable. We can use the placeholder in the semantic action as if it were |
| a map. Then, we define a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| struct and bind an actual map to the placeholder with "<code class="computeroutput"><span class="identifier">what</span><span class="special">.</span><span class="identifier">let</span><span class="special">(</span> <span class="identifier">_map</span> <span class="special">=</span> <span class="identifier">result</span> <span class="special">);</span></code>". The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| call behaves as if the placeholder in the semantic action had been replaced |
| with a reference to <code class="computeroutput"><span class="identifier">result</span></code>. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Placeholders in semantic actions are not <span class="emphasis"><em>actually</em></span> |
| replaced at runtime with references to variables. The regex object is never |
| mutated in any way during any of the regex algorithms, so they are safe |
| to use in multiple threads. |
| </p></td></tr> |
| </table></div> |
| <p> |
| The syntax for late-bound action arguments is a little different if you are |
| using <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_iterator.html" title="Struct template regex_iterator">regex_iterator<></a></code></code> |
| or <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code>. |
| The regex iterators accept an extra constructor parameter for specifying |
| the argument bindings. There is a <code class="computeroutput"><span class="identifier">let</span><span class="special">()</span></code> function that you can use to bind variables |
| to their placeholders. The following code demonstrates how. |
| </p> |
| <pre class="programlisting"><span class="comment">// Define a placeholder for a map object: |
| </span><span class="identifier">placeholder</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="special">></span> <span class="identifier">_map</span><span class="special">;</span> |
| |
| <span class="comment">// Match a word and an integer, separated by =>, |
| </span><span class="comment">// and then stuff the result into a std::map<> |
| </span><span class="identifier">sregex</span> <span class="identifier">pair</span> <span class="special">=</span> <span class="special">(</span> <span class="special">(</span><span class="identifier">s1</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_w</span><span class="special">)</span> <span class="special">>></span> <span class="string">"=>"</span> <span class="special">>></span> <span class="special">(</span><span class="identifier">s2</span><span class="special">=</span> <span class="special">+</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">)</span> |
| <span class="special">[</span> <span class="identifier">_map</span><span class="special">[</span><span class="identifier">s1</span><span class="special">]</span> <span class="special">=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">s2</span><span class="special">)</span> <span class="special">];</span> |
| |
| <span class="comment">// The string to parse |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"aaa=>1 bbb=>23 ccc=>456"</span><span class="special">);</span> |
| |
| <span class="comment">// Here is the actual map to fill in: |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">result</span><span class="special">;</span> |
| |
| <span class="comment">// Create a regex_iterator to find all the matches |
| </span><span class="identifier">sregex_iterator</span> <span class="identifier">it</span><span class="special">(</span><span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">pair</span><span class="special">,</span> <span class="identifier">let</span><span class="special">(</span><span class="identifier">_map</span><span class="special">=</span><span class="identifier">result</span><span class="special">));</span> |
| <span class="identifier">sregex_iterator</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="comment">// step through all the matches, and fill in |
| </span><span class="comment">// the result map |
| </span><span class="keyword">while</span><span class="special">(</span><span class="identifier">it</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">)</span> |
| <span class="special">++</span><span class="identifier">it</span><span class="special">;</span> |
| |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"aaa"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"bbb"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span><span class="special">[</span><span class="string">"ccc"</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| </pre> |
| <p> |
| This program displays: |
| </p> |
| <pre class="programlisting">1 |
| 23 |
| 456 |
| </pre> |
| <a name="boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.user_defined_assertions"></a><h3> |
| <a name="id3122803"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions.user_defined_assertions">User-Defined |
| Assertions</a> |
| </h3> |
| <p> |
| You are probably already familiar with regular expression <span class="emphasis"><em>assertions</em></span>. |
| In Perl, some examples are the <code class="literal">^</code> and <code class="literal">$</code> |
| assertions, which you can use to match the beginning and end of a string, |
| respectively. Xpressive lets you define your own assertions. A custom assertion |
| is a contition which must be true at a point in the match in order for the |
| match to succeed. You can check a custom assertion with xpressive's <code class="literal"><code class="computeroutput">check()</code></code> function. |
| </p> |
| <p> |
| There are a couple of ways to define a custom assertion. The simplest is |
| to use a function object. Let's say that you want to ensure that a sub-expression |
| matches a sub-string that is either 3 or 6 characters long. The following |
| struct defines such a predicate: |
| </p> |
| <pre class="programlisting"><span class="comment">// A predicate that is true IFF a sub-match is |
| </span><span class="comment">// either 3 or 6 characters long. |
| </span><span class="keyword">struct</span> <span class="identifier">three_or_six</span> |
| <span class="special">{</span> |
| <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">ssub_match</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">sub</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">sub</span><span class="special">.</span><span class="identifier">length</span><span class="special">()</span> <span class="special">==</span> <span class="number">3</span> <span class="special">||</span> <span class="identifier">sub</span><span class="special">.</span><span class="identifier">length</span><span class="special">()</span> <span class="special">==</span> <span class="number">6</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| You can use this predicate within a regular expression as follows: |
| </p> |
| <pre class="programlisting"><span class="comment">// match words of 3 characters or 6 characters. |
| </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">bow</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">)[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">three_or_six</span><span class="special">())</span> <span class="special">]</span> <span class="special">;</span> |
| </pre> |
| <p> |
| The above regular expression will find whole words that are either 3 or 6 |
| characters long. The <code class="computeroutput"><span class="identifier">three_or_six</span></code> |
| predicate accepts a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/sub_match.html" title="Struct template sub_match">sub_match<></a></code></code> |
| that refers back to the part of the string matched by the sub-expression |
| to which the custom assertion is attached. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| The custom assertion participates in determining whether the match succeeds |
| or fails. Unlike actions, which execute lazily, custom assertions execute |
| immediately while the regex engine is searching for a match. |
| </p></td></tr> |
| </table></div> |
| <p> |
| Custom assertions can also be defined inline using the same syntax as for |
| semantic actions. Below is the same custom assertion written inline: |
| </p> |
| <pre class="programlisting"><span class="comment">// match words of 3 characters or 6 characters. |
| </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">bow</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">>></span> <span class="identifier">eow</span><span class="special">)[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">length</span><span class="special">(</span><span class="identifier">_</span><span class="special">)==</span><span class="number">3</span> <span class="special">||</span> <span class="identifier">length</span><span class="special">(</span><span class="identifier">_</span><span class="special">)==</span><span class="number">6</span><span class="special">)</span> <span class="special">]</span> <span class="special">;</span> |
| </pre> |
| <p> |
| In the above, <code class="computeroutput"><span class="identifier">length</span><span class="special">()</span></code> |
| is a lazy function that calls the <code class="computeroutput"><span class="identifier">length</span><span class="special">()</span></code> member function of its argument, and <code class="computeroutput"><span class="identifier">_</span></code> is a placeholder that receives the <code class="computeroutput"><span class="identifier">sub_match</span></code>. |
| </p> |
| <p> |
| Once you get the hang of writing custom assertions inline, they can be very |
| powerful. For example, you can write a regular expression that only matches |
| valid dates (for some suitably liberal definition of the term <span class="quote">“<span class="quote">valid</span>”</span>). |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">days_per_month</span><span class="special">[]</span> <span class="special">=</span> |
| <span class="special">{</span><span class="number">31</span><span class="special">,</span> <span class="number">29</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">30</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">30</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">30</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">31</span><span class="special">,</span> <span class="number">31</span><span class="special">};</span> |
| |
| <span class="identifier">mark_tag</span> <span class="identifier">month</span><span class="special">(</span><span class="number">1</span><span class="special">),</span> <span class="identifier">day</span><span class="special">(</span><span class="number">2</span><span class="special">);</span> |
| <span class="comment">// find a valid date of the form month/day/year. |
| </span><span class="identifier">sregex</span> <span class="identifier">date</span> <span class="special">=</span> |
| <span class="special">(</span> |
| <span class="comment">// Month must be between 1 and 12 inclusive |
| </span> <span class="special">(</span><span class="identifier">month</span><span class="special">=</span> <span class="identifier">_d</span> <span class="special">>></span> <span class="special">!</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">>=</span> <span class="number">1</span> |
| <span class="special">&&</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special"><=</span> <span class="number">12</span><span class="special">)</span> <span class="special">]</span> |
| <span class="special">>></span> <span class="char">'/'</span> |
| <span class="comment">// Day must be between 1 and 31 inclusive |
| </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">day</span><span class="special">=</span> <span class="identifier">_d</span> <span class="special">>></span> <span class="special">!</span><span class="identifier">_d</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">>=</span> <span class="number">1</span> |
| <span class="special">&&</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special"><=</span> <span class="number">31</span><span class="special">)</span> <span class="special">]</span> |
| <span class="special">>></span> <span class="char">'/'</span> |
| <span class="comment">// Only consider years between 1970 and 2038 |
| </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">)</span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special">>=</span> <span class="number">1970</span> |
| <span class="special">&&</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">_</span><span class="special">)</span> <span class="special"><=</span> <span class="number">2038</span><span class="special">)</span> <span class="special">]</span> |
| <span class="special">)</span> |
| <span class="comment">// Ensure the month actually has that many days! |
| </span> <span class="special">[</span> <span class="identifier">check</span><span class="special">(</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">days_per_month</span><span class="special">)[</span><span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">month</span><span class="special">)-</span><span class="number">1</span><span class="special">]</span> <span class="special">>=</span> <span class="identifier">as</span><span class="special"><</span><span class="keyword">int</span><span class="special">>(</span><span class="identifier">day</span><span class="special">)</span> <span class="special">)</span> <span class="special">]</span> |
| <span class="special">;</span> |
| |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span><span class="string">"99/99/9999 2/30/2006 2/28/2006"</span><span class="special">);</span> |
| |
| <span class="keyword">if</span><span class="special">(</span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">date</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">endl</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| The above program prints out the following: |
| </p> |
| <pre class="programlisting">2/28/2006 |
| </pre> |
| <p> |
| Notice how the inline custom assertions are used to range-check the values |
| for the month, day and year. The regular expression doesn't match <code class="computeroutput"><span class="string">"99/99/9999"</span></code> or <code class="computeroutput"><span class="string">"2/30/2006"</span></code> |
| because they are not valid dates. (There is no 99th month, and February doesn't |
| have 30 days.) |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes" title="Symbol Tables and Attributes">Symbol |
| Tables and Attributes</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes.overview"></a><h3> |
| <a name="id3124450"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes.overview">Overview</a> |
| </h3> |
| <p> |
| Symbol tables can be built into xpressive regular expressions with just a |
| <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>. |
| The map keys are the strings to be matched and the map values are the data |
| to be returned to your semantic action. Xpressive attributes, named <code class="computeroutput"><span class="identifier">a1</span></code>, <code class="computeroutput"><span class="identifier">a2</span></code>, |
| through <code class="computeroutput"><span class="identifier">a9</span></code>, hold the value |
| corresponding to a matching key so that it can be used in a semantic action. |
| A default value can be specified for an attribute if a symbol is not found. |
| </p> |
| <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes.symbol_tables"></a><h3> |
| <a name="id3124533"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes.symbol_tables">Symbol |
| Tables</a> |
| </h3> |
| <p> |
| An xpressive symbol table is just a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><></span></code>, |
| where the key is a string type and the value can be anything. For example, |
| the following regular expression matches a key from map1 and assigns the |
| corresponding value to the attribute <code class="computeroutput"><span class="identifier">a1</span></code>. |
| Then, in the semantic action, it assigns the value stored in attribute <code class="computeroutput"><span class="identifier">a1</span></code> to an integer result. |
| </p> |
| <pre class="programlisting"><span class="keyword">int</span> <span class="identifier">result</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">map1</span><span class="special">;</span> |
| <span class="comment">// ... (fill the map) |
| </span><span class="identifier">sregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">(</span> <span class="identifier">a1</span> <span class="special">=</span> <span class="identifier">map1</span> <span class="special">)</span> <span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)</span> <span class="special">=</span> <span class="identifier">a1</span> <span class="special">];</span> |
| </pre> |
| <p> |
| Consider the following example code, which translates number names into integers. |
| It is described below. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">string</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">regex_actions</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">map</span><span class="special"><</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">,</span> <span class="keyword">int</span><span class="special">></span> <span class="identifier">number_map</span><span class="special">;</span> |
| <span class="identifier">number_map</span><span class="special">[</span><span class="string">"one"</span><span class="special">]</span> <span class="special">=</span> <span class="number">1</span><span class="special">;</span> |
| <span class="identifier">number_map</span><span class="special">[</span><span class="string">"two"</span><span class="special">]</span> <span class="special">=</span> <span class="number">2</span><span class="special">;</span> |
| <span class="identifier">number_map</span><span class="special">[</span><span class="string">"three"</span><span class="special">]</span> <span class="special">=</span> <span class="number">3</span><span class="special">;</span> |
| <span class="comment">// Match a string from number_map |
| </span> <span class="comment">// and store the integer value in 'result' |
| </span> <span class="comment">// if not found, store -1 in 'result' |
| </span> <span class="keyword">int</span> <span class="identifier">result</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> |
| <span class="identifier">cregex</span> <span class="identifier">rx</span> <span class="special">=</span> <span class="special">((</span><span class="identifier">a1</span> <span class="special">=</span> <span class="identifier">number_map</span> <span class="special">)</span> <span class="special">|</span> <span class="special">*</span><span class="identifier">_</span><span class="special">)</span> |
| <span class="special">[</span> <span class="identifier">ref</span><span class="special">(</span><span class="identifier">result</span><span class="special">)</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">a1</span> <span class="special">|</span> <span class="special">-</span><span class="number">1</span><span class="special">)];</span> |
| |
| <span class="identifier">regex_match</span><span class="special">(</span><span class="string">"three"</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">regex_match</span><span class="special">(</span><span class="string">"two"</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="identifier">regex_match</span><span class="special">(</span><span class="string">"stuff"</span><span class="special">,</span> <span class="identifier">rx</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">result</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program prints the following: |
| </p> |
| <pre class="programlisting">3 |
| 2 |
| -1 |
| </pre> |
| <p> |
| First the program builds a number map, with number names as string keys and |
| the corresponding integers as values. Then it constructs a static regular |
| expression using an attribute <code class="computeroutput"><span class="identifier">a1</span></code> |
| to represent the result of the symbol table lookup. In the semantic action, |
| the attribute is assigned to an integer variable <code class="computeroutput"><span class="identifier">result</span></code>. |
| If the symbol was not found, a default value of <code class="computeroutput"><span class="special">-</span><span class="number">1</span></code> is assigned to <code class="computeroutput"><span class="identifier">result</span></code>. |
| A wildcard, <code class="computeroutput"><span class="special">*</span><span class="identifier">_</span></code>, |
| makes sure the regex matches even if the symbol is not found. |
| </p> |
| <p> |
| A more complete version of this example can be found in <code class="literal">libs/xpressive/example/numbers.cpp</code><sup>[<a name="id3125582" href="#ftn.id3125582" class="footnote">5</a>]</sup>. It translates number names up to "nine hundred ninety nine |
| million nine hundred ninety nine thousand nine hundred ninety nine" |
| along with some special number names like "dozen". |
| </p> |
| <p> |
| Symbol table matches are case sensitive by default, but they can be made |
| case-insensitive by enclosing the expression in <code class="computeroutput"><span class="identifier">icase</span><span class="special">()</span></code>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.symbol_tables_and_attributes.attributes"></a><h3> |
| <a name="id3125623"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.symbol_tables_and_attributes.attributes">Attributes</a> |
| </h3> |
| <p> |
| Up to nine attributes can be used in a regular expression. They are named |
| <code class="computeroutput"><span class="identifier">a1</span></code>, <code class="computeroutput"><span class="identifier">a2</span></code>, |
| ..., <code class="computeroutput"><span class="identifier">a9</span></code> in the <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span></code> namespace. The attribute type |
| is the same as the second component of the map that is assigned to it. A |
| default value for an attribute can be specified in a semantic action with |
| the syntax <code class="computeroutput"><span class="special">(</span><span class="identifier">a1</span> |
| <span class="special">|</span> <em class="replaceable"><code>default-value</code></em><span class="special">)</span></code>. |
| </p> |
| <p> |
| Attributes are properly scoped, so you can do crazy things like: <code class="computeroutput"><span class="special">(</span> <span class="special">(</span><span class="identifier">a1</span><span class="special">=</span><span class="identifier">sym1</span><span class="special">)</span> |
| <span class="special">>></span> <span class="special">(</span><span class="identifier">a1</span><span class="special">=</span><span class="identifier">sym2</span><span class="special">)[</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">x</span><span class="special">)=</span><span class="identifier">a1</span><span class="special">]</span> <span class="special">)[</span><span class="identifier">ref</span><span class="special">(</span><span class="identifier">y</span><span class="special">)=</span><span class="identifier">a1</span><span class="special">]</span></code>. The |
| inner semantic action sees the inner <code class="computeroutput"><span class="identifier">a1</span></code>, |
| and the outer semantic action sees the outer one. They can even have different |
| types. |
| </p> |
| <div class="note"><table border="0" summary="Note"> |
| <tr> |
| <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../doc/src/images/note.png"></td> |
| <th align="left">Note</th> |
| </tr> |
| <tr><td align="left" valign="top"><p> |
| Xpressive builds a hidden ternary search trie from the map so it can search |
| quickly. If BOOST_DISABLE_THREADS is defined, the hidden ternary search |
| trie "self adjusts", so after each search it restructures itself |
| to improve the efficiency of future searches based on the frequency of |
| previous searches. |
| </p></td></tr> |
| </table></div> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.localization_and_regex_traits"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits" title="Localization and Regex Traits">Localization |
| and Regex Traits</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.overview"></a><h3> |
| <a name="id3125887"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.overview">Overview</a> |
| </h3> |
| <p> |
| Matching a regular expression against a string often requires locale-dependent |
| information. For example, how are case-insensitive comparisons performed? |
| The locale-sensitive behavior is captured in a traits class. xpressive provides |
| three traits class templates: <code class="computeroutput"><span class="identifier">cpp_regex_traits</span><span class="special"><></span></code>, <code class="computeroutput"><span class="identifier">c_regex_traits</span><span class="special"><></span></code> and <code class="computeroutput"><span class="identifier">null_regex_traits</span><span class="special"><></span></code>. The first wraps a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>, |
| the second wraps the global C locale, and the third is a stub traits type |
| for use when searching non-character data. All traits templates conform to |
| the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.traits_requirements">Regex |
| Traits Concept</a>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.setting_the_default_regex_trait"></a><h3> |
| <a name="id3125990"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.setting_the_default_regex_trait">Setting |
| the Default Regex Trait</a> |
| </h3> |
| <p> |
| By default, xpressive uses <code class="computeroutput"><span class="identifier">cpp_regex_traits</span><span class="special"><></span></code> for all patterns. This causes all |
| regex objects to use the global <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span></code>. |
| If you compile with <code class="computeroutput"><span class="identifier">BOOST_XPRESSIVE_USE_C_TRAITS</span></code> |
| defined, then xpressive will use <code class="computeroutput"><span class="identifier">c_regex_traits</span><span class="special"><></span></code> by default. |
| </p> |
| <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_dynamic_regexes"></a><h3> |
| <a name="id3126077"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_dynamic_regexes">Using |
| Custom Traits with Dynamic Regexes</a> |
| </h3> |
| <p> |
| To create a dynamic regex that uses a custom traits object, you must use |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_compiler.html" title="Struct template regex_compiler">regex_compiler<></a></code></code>. |
| The basic steps are shown in the following example: |
| </p> |
| <pre class="programlisting"><span class="comment">// Declare a regex_compiler that uses the global C locale |
| </span><span class="identifier">regex_compiler</span><span class="special"><</span><span class="keyword">char</span> <span class="keyword">const</span> <span class="special">*,</span> <span class="identifier">c_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="special">></span> <span class="identifier">crxcomp</span><span class="special">;</span> |
| <span class="identifier">cregex</span> <span class="identifier">crx</span> <span class="special">=</span> <span class="identifier">crxcomp</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"\\w+"</span> <span class="special">);</span> |
| |
| <span class="comment">// Declare a regex_compiler that uses a custom std::locale |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">loc</span> <span class="special">=</span> <span class="comment">/* ... create a locale here ... */</span><span class="special">;</span> |
| <span class="identifier">regex_compiler</span><span class="special"><</span><span class="keyword">char</span> <span class="keyword">const</span> <span class="special">*,</span> <span class="identifier">cpp_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="special">></span> <span class="identifier">cpprxcomp</span><span class="special">(</span><span class="identifier">loc</span><span class="special">);</span> |
| <span class="identifier">cregex</span> <span class="identifier">cpprx</span> <span class="special">=</span> <span class="identifier">cpprxcomp</span><span class="special">.</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"\\w+"</span> <span class="special">);</span> |
| </pre> |
| <p> |
| The <code class="computeroutput"><span class="identifier">regex_compiler</span></code> objects |
| act as regex factories. Once they have been imbued with a locale, every regex |
| object they create will use that locale. |
| </p> |
| <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_static_regexes"></a><h3> |
| <a name="id3126409"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.using_custom_traits_with_static_regexes">Using |
| Custom Traits with Static Regexes</a> |
| </h3> |
| <p> |
| If you want a particular static regex to use a different set of traits, you |
| can use the special <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> pattern modifier. For instance: |
| </p> |
| <pre class="programlisting"><span class="comment">// Define a regex that uses the global C locale |
| </span><span class="identifier">c_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="identifier">ctraits</span><span class="special">;</span> |
| <span class="identifier">sregex</span> <span class="identifier">crx</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">ctraits</span><span class="special">)(</span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">);</span> |
| |
| <span class="comment">// Define a regex that uses a customized std::locale |
| </span><span class="identifier">std</span><span class="special">::</span><span class="identifier">locale</span> <span class="identifier">loc</span> <span class="special">=</span> <span class="comment">/* ... create a locale here ... */</span><span class="special">;</span> |
| <span class="identifier">cpp_regex_traits</span><span class="special"><</span><span class="keyword">char</span><span class="special">></span> <span class="identifier">cpptraits</span><span class="special">(</span><span class="identifier">loc</span><span class="special">);</span> |
| <span class="identifier">sregex</span> <span class="identifier">cpprx1</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">cpptraits</span><span class="special">)(</span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">);</span> |
| |
| <span class="comment">// A shorthand for above |
| </span><span class="identifier">sregex</span> <span class="identifier">cpprx2</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">loc</span><span class="special">)(</span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">);</span> |
| </pre> |
| <p> |
| The <code class="computeroutput"><span class="identifier">imbue</span><span class="special">()</span></code> |
| pattern modifier must wrap the entire pattern. It is an error to <code class="computeroutput"><span class="identifier">imbue</span></code> only part of a static regex. For |
| example: |
| </p> |
| <pre class="programlisting"><span class="comment">// ERROR! Cannot imbue() only part of a regex |
| </span><span class="identifier">sregex</span> <span class="identifier">error</span> <span class="special">=</span> <span class="identifier">_w</span> <span class="special">>></span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">loc</span><span class="special">)(</span> <span class="identifier">_w</span> <span class="special">);</span> |
| </pre> |
| <a name="boost_xpressive.user_s_guide.localization_and_regex_traits.searching_non_character_data_with__literal_null_regex_traits__literal_"></a><h3> |
| <a name="id3126821"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.localization_and_regex_traits.searching_non_character_data_with__literal_null_regex_traits__literal_">Searching |
| Non-Character Data With <code class="literal">null_regex_traits</code></a> |
| </h3> |
| <p> |
| With xpressive static regexes, you are not limitted to searching for patterns |
| in character sequences. You can search for patterns in raw bytes, integers, |
| or anything that conforms to the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.chart_requirements">Char |
| Concept</a>. The <code class="computeroutput"><span class="identifier">null_regex_traits</span><span class="special"><></span></code> makes it simple. It is a stub implementation |
| of the <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.traits_requirements">Regex |
| Traits Concept</a>. It recognizes no character classes and does no case-sensitive |
| mappings. |
| </p> |
| <p> |
| For example, with <code class="computeroutput"><span class="identifier">null_regex_traits</span><span class="special"><></span></code>, you can write a static regex to |
| find a pattern in a sequence of integers as follows: |
| </p> |
| <pre class="programlisting"><span class="comment">// some integral data to search |
| </span><span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">data</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span><span class="number">0</span><span class="special">,</span> <span class="number">1</span><span class="special">,</span> <span class="number">2</span><span class="special">,</span> <span class="number">3</span><span class="special">,</span> <span class="number">4</span><span class="special">,</span> <span class="number">5</span><span class="special">,</span> <span class="number">6</span><span class="special">};</span> |
| |
| <span class="comment">// create a null_regex_traits<> object for searching integers ... |
| </span><span class="identifier">null_regex_traits</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="identifier">nul</span><span class="special">;</span> |
| |
| <span class="comment">// imbue a regex object with the null_regex_traits ... |
| </span><span class="identifier">basic_regex</span><span class="special"><</span><span class="keyword">int</span> <span class="keyword">const</span> <span class="special">*></span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">imbue</span><span class="special">(</span><span class="identifier">nul</span><span class="special">)(</span><span class="number">1</span> <span class="special">>></span> <span class="special">+((</span><span class="identifier">set</span><span class="special">=</span> <span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">)</span> <span class="special">|</span> <span class="number">4</span><span class="special">)</span> <span class="special">>></span> <span class="number">5</span><span class="special">);</span> |
| <span class="identifier">match_results</span><span class="special"><</span><span class="keyword">int</span> <span class="keyword">const</span> <span class="special">*></span> <span class="identifier">what</span><span class="special">;</span> |
| |
| <span class="comment">// search for the pattern in the array of integers ... |
| </span><span class="identifier">regex_search</span><span class="special">(</span><span class="identifier">data</span><span class="special">,</span> <span class="identifier">data</span> <span class="special">+</span> <span class="number">7</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rex</span><span class="special">);</span> |
| |
| <span class="identifier">assert</span><span class="special">(</span><span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">].</span><span class="identifier">matched</span><span class="special">);</span> |
| <span class="identifier">assert</span><span class="special">(*</span><span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">].</span><span class="identifier">first</span> <span class="special">==</span> <span class="number">1</span><span class="special">);</span> |
| <span class="identifier">assert</span><span class="special">(*</span><span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">].</span><span class="identifier">second</span> <span class="special">==</span> <span class="number">6</span><span class="special">);</span> |
| </pre> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks" title="Tips 'N Tricks">Tips 'N Tricks</a> |
| </h3></div></div></div> |
| <p> |
| Squeeze the most performance out of xpressive with these tips and tricks. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.compile_patterns_once_and_reuse_them"></a><h3> |
| <a name="id3127440"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.compile_patterns_once_and_reuse_them">Compile |
| Patterns Once And Reuse Them</a> |
| </h3> |
| <p> |
| Compiling a regex (dynamic or static) is <span class="emphasis"><em>far</em></span> more expensive |
| than executing a match or search. If you have the option, prefer to compile |
| a pattern into a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| object once and reuse it rather than recreating it over and over. |
| </p> |
| <p> |
| Since <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| objects are not mutated by any of the regex algorithms, they are completely |
| thread-safe once their initialization (and that of any grammars of which |
| they are members) completes. The easiest way to reuse your patterns is to |
| simply make your <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code> |
| objects "static const". |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.reuse__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__objects"></a><h3> |
| <a name="id3127530"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.reuse__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__objects">Reuse |
| <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| Objects</a> |
| </h3> |
| <p> |
| The <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object caches dynamically allocated memory. For this reason, it is far better |
| to reuse the same <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object if you have to do many regex searches. |
| </p> |
| <p> |
| Caveat: <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| objects are not thread-safe, so don't go wild reusing them across threads. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_take_a__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__object"></a><h3> |
| <a name="id3127628"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_take_a__literal__classname_alt__boost__xpressive__match_results__match_results_lt__gt___classname___literal__object">Prefer |
| Algorithms That Take A <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| Object</a> |
| </h3> |
| <p> |
| This is a corollary to the previous tip. If you are doing multiple searches, |
| you should prefer the regex algorithms that accept a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object over the ones that don't, and you should reuse the same <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object each time. If you don't provide a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code> |
| object, a temporary one will be created for you and discarded when the algorithm |
| returns. Any memory cached in the object will be deallocated and will have |
| to be reallocated the next time. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_accept_iterator_ranges_over_null_terminated_strings"></a><h3> |
| <a name="id3127724"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.prefer_algorithms_that_accept_iterator_ranges_over_null_terminated_strings">Prefer |
| Algorithms That Accept Iterator Ranges Over Null-Terminated Strings</a> |
| </h3> |
| <p> |
| xpressive provides overloads of the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_match.html" title="Function regex_match">regex_match()</a></code></code> |
| and <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_search.html" title="Function regex_search">regex_search()</a></code></code> |
| algorithms that operate on C-style null-terminated strings. You should prefer |
| the overloads that take iterator ranges. When you pass a null-terminated |
| string to a regex algorithm, the end iterator is calculated immediately by |
| calling <code class="computeroutput"><span class="identifier">strlen</span></code>. If you already |
| know the length of the string, you can avoid this overhead by calling the |
| regex algorithms with a <code class="computeroutput"><span class="special">[</span><span class="identifier">begin</span><span class="special">,</span> <span class="identifier">end</span><span class="special">)</span></code> |
| pair. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.use_static_regexes"></a><h3> |
| <a name="id3127821"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.use_static_regexes">Use |
| Static Regexes</a> |
| </h3> |
| <p> |
| On average, static regexes execute about 10 to 15% faster than their dynamic |
| counterparts. It's worth familiarizing yourself with the static regex dialect. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.understand__literal_syntax_option_type__optimize__literal_"></a><h3> |
| <a name="id3127854"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.understand__literal_syntax_option_type__optimize__literal_">Understand |
| <code class="literal">syntax_option_type::optimize</code></a> |
| </h3> |
| <p> |
| The <code class="computeroutput"><span class="identifier">optimize</span></code> flag tells the |
| regex compiler to spend some extra time analyzing the pattern. It can cause |
| some patterns to execute faster, but it increases the time to compile the |
| pattern, and often increases the amount of memory consumed by the pattern. |
| If you plan to reuse your pattern, <code class="computeroutput"><span class="identifier">optimize</span></code> |
| is usually a win. If you will only use the pattern once, don't use <code class="computeroutput"><span class="identifier">optimize</span></code>. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.common_pitfalls"></a><h2> |
| <a name="id3127861"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.common_pitfalls">Common |
| Pitfalls</a> |
| </h2> |
| <p> |
| Keep the following tips in mind to avoid stepping in potholes with xpressive. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.create_grammars_on_a_single_thread"></a><h3> |
| <a name="id3127946"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.create_grammars_on_a_single_thread">Create |
| Grammars On A Single Thread</a> |
| </h3> |
| <p> |
| With static regexes, you can create grammars by nesting regexes inside one |
| another. When compiling the outer regex, both the outer and inner regex objects, |
| and all the regex objects to which they refer either directly or indirectly, |
| are modified. For this reason, it's dangerous for global regex objects to |
| participate in grammars. It's best to build regex grammars from a single |
| thread. Once built, the resulting regex grammar can be executed from multiple |
| threads without problems. |
| </p> |
| <a name="boost_xpressive.user_s_guide.tips_n_tricks.beware_nested_quantifiers"></a><h3> |
| <a name="id3127972"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks.beware_nested_quantifiers">Beware |
| Nested Quantifiers</a> |
| </h3> |
| <p> |
| This is a pitfall common to many regular expression engines. Some patterns |
| can cause exponentially bad performance. Often these patterns involve one |
| quantified term nested withing another quantifier, such as <code class="computeroutput"><span class="string">"(a*)*"</span></code>, although in many cases, |
| the problem is harder to spot. Beware of patterns that have nested quantifiers. |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.concepts"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts" title="Concepts">Concepts</a> |
| </h3></div></div></div> |
| <a name="boost_xpressive.user_s_guide.concepts.chart_requirements"></a><h3> |
| <a name="id3128300"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.chart_requirements">CharT |
| requirements</a> |
| </h3> |
| <p> |
| If type <code class="computeroutput"><span class="identifier">BidiIterT</span></code> is used |
| as a template argument to <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>, |
| then <code class="computeroutput"><span class="identifier">CharT</span></code> is <code class="computeroutput"><span class="identifier">iterator_traits</span><span class="special"><</span><span class="identifier">BidiIterT</span><span class="special">>::</span><span class="identifier">value_type</span></code>. Type <code class="computeroutput"><span class="identifier">CharT</span></code> |
| must have a trivial default constructor, copy constructor, assignment operator, |
| and destructor. In addition the following requirements must be met for objects; |
| <code class="computeroutput"><span class="identifier">c</span></code> of type <code class="computeroutput"><span class="identifier">CharT</span></code>, |
| <code class="computeroutput"><span class="identifier">c1</span></code> and <code class="computeroutput"><span class="identifier">c2</span></code> |
| of type <code class="computeroutput"><span class="identifier">CharT</span> <span class="keyword">const</span></code>, |
| and <code class="computeroutput"><span class="identifier">i</span></code> of type <code class="computeroutput"><span class="keyword">int</span></code>: |
| </p> |
| <div class="table"> |
| <a name="id3128457"></a><p class="title"><b>Table 29.14. CharT Requirements</b></p> |
| <div class="table-contents"><table class="table" summary="CharT Requirements"> |
| <colgroup> |
| <col> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| <span class="bold"><strong>Expression</strong></span> |
| </p> |
| </th> |
| <th> |
| <p> |
| <span class="bold"><strong>Return type</strong></span> |
| </p> |
| </th> |
| <th> |
| <p> |
| <span class="bold"><strong>Assertion / Note / Pre- / Post-condition</strong></span> |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span> <span class="identifier">c</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Default constructor (must be trivial). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span> <span class="identifier">c</span><span class="special">(</span><span class="identifier">c1</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Copy constructor (must be trivial). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special">=</span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Assignment operator (must be trivial). |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special">==</span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> has the same value as <code class="computeroutput"><span class="identifier">c2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special">!=</span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> and <code class="computeroutput"><span class="identifier">c2</span></code> |
| are not equal. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special"><</span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">true</span></code> if the value |
| of <code class="computeroutput"><span class="identifier">c1</span></code> is less than |
| <code class="computeroutput"><span class="identifier">c2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special">></span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">true</span></code> if the value |
| of <code class="computeroutput"><span class="identifier">c1</span></code> is greater |
| than <code class="computeroutput"><span class="identifier">c2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special"><=</span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> is less than or equal to |
| <code class="computeroutput"><span class="identifier">c2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">c1</span> <span class="special">>=</span> |
| <span class="identifier">c2</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">c1</span></code> is greater than or equal to |
| <code class="computeroutput"><span class="identifier">c2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">intmax_t</span> <span class="identifier">i</span> |
| <span class="special">=</span> <span class="identifier">c1</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">int</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> must be convertible |
| to an integral type. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span> <span class="identifier">c</span><span class="special">(</span><span class="identifier">i</span><span class="special">);</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> must be constructable |
| from an integral type. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><a name="boost_xpressive.user_s_guide.concepts.traits_requirements"></a><h3> |
| <a name="id3129286"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.traits_requirements">Traits |
| Requirements</a> |
| </h3> |
| <p> |
| In the following table <code class="computeroutput"><span class="identifier">X</span></code> |
| denotes a traits class defining types and functions for the character container |
| type <code class="computeroutput"><span class="identifier">CharT</span></code>; <code class="computeroutput"><span class="identifier">u</span></code> is an object of type <code class="computeroutput"><span class="identifier">X</span></code>; |
| <code class="computeroutput"><span class="identifier">v</span></code> is an object of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">X</span></code>; |
| <code class="computeroutput"><span class="identifier">p</span></code> is a value of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">CharT</span><span class="special">*</span></code>; <code class="computeroutput"><span class="identifier">I1</span></code> |
| and <code class="computeroutput"><span class="identifier">I2</span></code> are <code class="computeroutput"><span class="identifier">Input</span> <span class="identifier">Iterators</span></code>; |
| <code class="computeroutput"><span class="identifier">c</span></code> is a value of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">CharT</span></code>; |
| <code class="computeroutput"><span class="identifier">s</span></code> is an object of type <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code>; |
| <code class="computeroutput"><span class="identifier">cs</span></code> is an object of type |
| <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code>; |
| <code class="computeroutput"><span class="identifier">b</span></code> is a value of type <code class="computeroutput"><span class="keyword">bool</span></code>; <code class="computeroutput"><span class="identifier">i</span></code> |
| is a value of type <code class="computeroutput"><span class="keyword">int</span></code>; <code class="computeroutput"><span class="identifier">F1</span></code> and <code class="computeroutput"><span class="identifier">F2</span></code> |
| are values of type <code class="computeroutput"><span class="keyword">const</span> <span class="identifier">CharT</span><span class="special">*</span></code>; <code class="computeroutput"><span class="identifier">loc</span></code> |
| is an object of type <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code>; and <code class="computeroutput"><span class="identifier">ch</span></code> |
| is an object of <code class="computeroutput"><span class="keyword">const</span> <span class="keyword">char</span></code>. |
| </p> |
| <div class="table"> |
| <a name="id3129631"></a><p class="title"><b>Table 29.15. Traits Requirements</b></p> |
| <div class="table-contents"><table class="table" summary="Traits Requirements"> |
| <colgroup> |
| <col> |
| <col> |
| <col> |
| </colgroup> |
| <thead><tr> |
| <th> |
| <p> |
| <span class="bold"><strong>Expression</strong></span> |
| </p> |
| </th> |
| <th> |
| <p> |
| <span class="bold"><strong>Return type</strong></span> |
| </p> |
| </th> |
| <th> |
| <p> |
| <span class="bold"><strong>Assertion / Note<br> Pre / Post condition</strong></span> |
| </p> |
| </th> |
| </tr></thead> |
| <tbody> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| The character container type used in the implementation of class |
| template <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/basic_regex.html" title="Struct template basic_regex">basic_regex<></a></code></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span><span class="special"><</span><span class="identifier">CharT</span><span class="special">></span></code> |
| or <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special"><</span><span class="identifier">CharT</span><span class="special">></span></code> |
| </p> |
| </td> |
| <td> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <span class="emphasis"><em>Implementation defined</em></span> |
| </p> |
| </td> |
| <td> |
| <p> |
| A copy constructible type that represents the locale used by the |
| traits class. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_class_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <span class="emphasis"><em>Implementation defined</em></span> |
| </p> |
| </td> |
| <td> |
| <p> |
| A bitmask type representing a particular character classification. |
| Multiple values of this type can be bitwise-or'ed together to obtain |
| a new valid value. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">hash</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">unsigned</span> <span class="keyword">char</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Yields a value between <code class="computeroutput"><span class="number">0</span></code> |
| and <code class="computeroutput"><span class="identifier">UCHAR_MAX</span></code> inclusive. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">widen</span><span class="special">(</span><span class="identifier">ch</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">CharT</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Widens the specified <code class="computeroutput"><span class="keyword">char</span></code> |
| and returns the resulting <code class="computeroutput"><span class="identifier">CharT</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">in_range</span><span class="special">(</span><span class="identifier">r1</span><span class="special">,</span> |
| <span class="identifier">r2</span><span class="special">,</span> |
| <span class="identifier">c</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| For any characters <code class="computeroutput"><span class="identifier">r1</span></code> |
| and <code class="computeroutput"><span class="identifier">r2</span></code>, returns |
| <code class="computeroutput"><span class="keyword">true</span></code> if <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> |
| <span class="identifier">c</span> <span class="special">&&</span> |
| <span class="identifier">c</span> <span class="special"><=</span> |
| <span class="identifier">r2</span></code>. Requires that <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> |
| <span class="identifier">r2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">in_range_nocase</span><span class="special">(</span><span class="identifier">r1</span><span class="special">,</span> |
| <span class="identifier">r2</span><span class="special">,</span> |
| <span class="identifier">c</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| For characters <code class="computeroutput"><span class="identifier">r1</span></code> |
| and <code class="computeroutput"><span class="identifier">r2</span></code>, returns |
| <code class="computeroutput"><span class="keyword">true</span></code> if there is some |
| character <code class="computeroutput"><span class="identifier">d</span></code> for |
| which <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">d</span><span class="special">)</span> |
| <span class="special">==</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> and <code class="computeroutput"><span class="identifier">r1</span> |
| <span class="special"><=</span> <span class="identifier">d</span> |
| <span class="special">&&</span> <span class="identifier">d</span> |
| <span class="special"><=</span> <span class="identifier">r2</span></code>. |
| Requires that <code class="computeroutput"><span class="identifier">r1</span> <span class="special"><=</span> <span class="identifier">r2</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a character such that for any character <code class="computeroutput"><span class="identifier">d</span></code> |
| that is to be considered equivalent to <code class="computeroutput"><span class="identifier">c</span></code> |
| then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span> |
| <span class="special">==</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">translate</span><span class="special">(</span><span class="identifier">d</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| For all characters <code class="computeroutput"><span class="identifier">C</span></code> |
| that are to be considered equivalent to <code class="computeroutput"><span class="identifier">c</span></code> |
| when comparisons are to be performed without regard to case, then |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">c</span><span class="special">)</span> |
| <span class="special">==</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">translate_nocase</span><span class="special">(</span><span class="identifier">C</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> |
| <span class="identifier">F2</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a sort key for the character sequence designated by the |
| iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> such that if the character sequence |
| <code class="computeroutput"><span class="special">[</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span></code> sorts before the character sequence |
| <code class="computeroutput"><span class="special">[</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code> then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform</span><span class="special">(</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span> <span class="special"><</span> |
| <span class="identifier">v</span><span class="special">.</span><span class="identifier">transform</span><span class="special">(</span><span class="identifier">H1</span><span class="special">,</span> |
| <span class="identifier">H2</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform_primary</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> |
| <span class="identifier">F2</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a sort key for the character sequence designated by the |
| iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code> such that if the character sequence |
| <code class="computeroutput"><span class="special">[</span><span class="identifier">G1</span><span class="special">,</span> <span class="identifier">G2</span><span class="special">)</span></code> sorts before the character sequence |
| <code class="computeroutput"><span class="special">[</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code> when character case is not considered |
| then <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">transform_primary</span><span class="special">(</span><span class="identifier">G1</span><span class="special">,</span> |
| <span class="identifier">G2</span><span class="special">)</span> |
| <span class="special"><</span> <span class="identifier">v</span><span class="special">.</span><span class="identifier">transform_primary</span><span class="special">(</span><span class="identifier">H1</span><span class="special">,</span> <span class="identifier">H2</span><span class="special">)</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">lookup_classname</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> |
| <span class="identifier">F2</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">char_class_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Converts the character sequence designated by the iterator range |
| <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span><span class="identifier">F2</span><span class="special">)</span></code> into a bitmask type that can subsequently |
| be passed to <code class="computeroutput"><span class="identifier">isctype</span></code>. |
| Values returned from <code class="computeroutput"><span class="identifier">lookup_classname</span></code> |
| can be safely bitwise or'ed together. Returns <code class="computeroutput"><span class="number">0</span></code> |
| if the character sequence is not the name of a character class |
| recognized by <code class="computeroutput"><span class="identifier">X</span></code>. |
| The value returned shall be independent of the case of the characters |
| in the sequence. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">lookup_collatename</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> |
| <span class="identifier">F2</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">string_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns a sequence of characters that represents the collating |
| element consisting of the character sequence designated by the |
| iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code>. Returns an empty string if the |
| character sequence is not a valid collating element. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">isctype</span><span class="special">(</span><span class="identifier">c</span><span class="special">,</span> |
| <span class="identifier">v</span><span class="special">.</span><span class="identifier">lookup_classname</span><span class="special">(</span><span class="identifier">F1</span><span class="special">,</span> |
| <span class="identifier">F2</span><span class="special">))</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">bool</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns <code class="computeroutput"><span class="keyword">true</span></code> if character |
| <code class="computeroutput"><span class="identifier">c</span></code> is a member of |
| the character class designated by the iterator range <code class="computeroutput"><span class="special">[</span><span class="identifier">F1</span><span class="special">,</span> <span class="identifier">F2</span><span class="special">)</span></code>, <code class="computeroutput"><span class="keyword">false</span></code> |
| otherwise. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">value</span><span class="special">(</span><span class="identifier">c</span><span class="special">,</span> |
| <span class="identifier">i</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="keyword">int</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the value represented by the digit <code class="computeroutput"><span class="identifier">c</span></code> |
| in base <code class="computeroutput"><span class="identifier">i</span></code> if the |
| character <code class="computeroutput"><span class="identifier">c</span></code> is |
| a valid digit in base <code class="computeroutput"><span class="identifier">i</span></code>; |
| otherwise returns <code class="computeroutput"><span class="special">-</span><span class="number">1</span></code>.<br> [Note: the value of <code class="computeroutput"><span class="identifier">i</span></code> will only be <code class="computeroutput"><span class="number">8</span></code>, <code class="computeroutput"><span class="number">10</span></code>, |
| or <code class="computeroutput"><span class="number">16</span></code>. -end note] |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">u</span><span class="special">.</span><span class="identifier">imbue</span><span class="special">(</span><span class="identifier">loc</span><span class="special">)</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Imbues <code class="computeroutput"><span class="identifier">u</span></code> with the |
| locale <code class="computeroutput"><span class="identifier">loc</span></code>, returns |
| the previous locale used by <code class="computeroutput"><span class="identifier">u</span></code>. |
| </p> |
| </td> |
| </tr> |
| <tr> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">v</span><span class="special">.</span><span class="identifier">getloc</span><span class="special">()</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| <code class="computeroutput"><span class="identifier">X</span><span class="special">::</span><span class="identifier">locale_type</span></code> |
| </p> |
| </td> |
| <td> |
| <p> |
| Returns the current locale used by <code class="computeroutput"><span class="identifier">v</span></code>. |
| </p> |
| </td> |
| </tr> |
| </tbody> |
| </table></div> |
| </div> |
| <br class="table-break"><a name="boost_xpressive.user_s_guide.concepts.acknowledgements"></a><h3> |
| <a name="id3132089"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.concepts.acknowledgements">Acknowledgements</a> |
| </h3> |
| <p> |
| This section is adapted from the equivalent page in the <a href="../../../libs/regex" target="_top">Boost.Regex</a> |
| documentation and from the <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1429.htm" target="_top">proposal</a> |
| to add regular expressions to the Standard Library. |
| </p> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="boost_xpressive.user_s_guide.examples"></a><a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">Examples</a> |
| </h3></div></div></div> |
| <p> |
| Below you can find six complete sample programs. <br> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex"></a><h5> |
| <a name="id3132157"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_whole_string_matches_a_regex">See |
| if a whole string matches a regex</a> |
| </h5> |
| <p> |
| This is the example from the Introduction. It is reproduced here for your |
| convenience. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">hello</span><span class="special">(</span> <span class="string">"hello world!"</span> <span class="special">);</span> |
| |
| <span class="identifier">sregex</span> <span class="identifier">rex</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\w+) (\\w+)!"</span> <span class="special">);</span> |
| <span class="identifier">smatch</span> <span class="identifier">what</span><span class="special">;</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_match</span><span class="special">(</span> <span class="identifier">hello</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">rex</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// whole match |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">1</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// first capture |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">2</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// second capture |
| </span> <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">hello world! |
| hello |
| world |
| </pre> |
| <p> |
| <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex"></a><h5> |
| <a name="id3132694"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.see_if_a_string_contains_a_sub_string_that_matches_a_regex">See |
| if a string contains a sub-string that matches a regex</a> |
| </h5> |
| <p> |
| Notice in this example how we use custom <code class="computeroutput"><span class="identifier">mark_tag</span></code>s |
| to make the pattern more readable. We can use the <code class="computeroutput"><span class="identifier">mark_tag</span></code>s |
| later to index into the <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/match_results.html" title="Struct template match_results">match_results<></a></code></code>. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="keyword">char</span> <span class="keyword">const</span> <span class="special">*</span><span class="identifier">str</span> <span class="special">=</span> <span class="string">"I was born on 5/30/1973 at 7am."</span><span class="special">;</span> |
| |
| <span class="comment">// define some custom mark_tags with names more meaningful than s1, s2, etc. |
| </span> <span class="identifier">mark_tag</span> <span class="identifier">day</span><span class="special">(</span><span class="number">1</span><span class="special">),</span> <span class="identifier">month</span><span class="special">(</span><span class="number">2</span><span class="special">),</span> <span class="identifier">year</span><span class="special">(</span><span class="number">3</span><span class="special">),</span> <span class="identifier">delim</span><span class="special">(</span><span class="number">4</span><span class="special">);</span> |
| |
| <span class="comment">// this regex finds a date |
| </span> <span class="identifier">cregex</span> <span class="identifier">date</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">month</span><span class="special">=</span> <span class="identifier">repeat</span><span class="special"><</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">>(</span><span class="identifier">_d</span><span class="special">))</span> <span class="comment">// find the month ... |
| </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">delim</span><span class="special">=</span> <span class="special">(</span><span class="identifier">set</span><span class="special">=</span> <span class="char">'/'</span><span class="special">,</span><span class="char">'-'</span><span class="special">))</span> <span class="comment">// followed by a delimiter ... |
| </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">day</span><span class="special">=</span> <span class="identifier">repeat</span><span class="special"><</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">>(</span><span class="identifier">_d</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">delim</span> <span class="comment">// and a day followed by the same delimiter ... |
| </span> <span class="special">>></span> <span class="special">(</span><span class="identifier">year</span><span class="special">=</span> <span class="identifier">repeat</span><span class="special"><</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">>(</span><span class="identifier">_d</span> <span class="special">>></span> <span class="identifier">_d</span><span class="special">));</span> <span class="comment">// and the year. |
| </span> |
| <span class="identifier">cmatch</span> <span class="identifier">what</span><span class="special">;</span> |
| |
| <span class="keyword">if</span><span class="special">(</span> <span class="identifier">regex_search</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">what</span><span class="special">,</span> <span class="identifier">date</span> <span class="special">)</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// whole match |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">day</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the day |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">month</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the month |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">year</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the year |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="identifier">delim</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> <span class="comment">// the delimiter |
| </span> <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">5/30/1973 |
| 30 |
| 5 |
| 1973 |
| / |
| </pre> |
| <p> |
| <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex"></a><h5> |
| <a name="id3133704"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex">Replace |
| all sub-strings that match a regex</a> |
| </h5> |
| <p> |
| The following program finds dates in a string and marks them up with pseudo-HTML. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"I was born on 5/30/1973 at 7am."</span> <span class="special">);</span> |
| |
| <span class="comment">// essentially the same regex as in the previous example, but using a dynamic regex |
| </span> <span class="identifier">sregex</span> <span class="identifier">date</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\d{1,2})([/-])(\\d{1,2})\\2((?:\\d{2}){1,2})"</span> <span class="special">);</span> |
| |
| <span class="comment">// As in Perl, $& is a reference to the sub-string that matched the regex |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">format</span><span class="special">(</span> <span class="string">"<date>$&</date>"</span> <span class="special">);</span> |
| |
| <span class="identifier">str</span> <span class="special">=</span> <span class="identifier">regex_replace</span><span class="special">(</span> <span class="identifier">str</span><span class="special">,</span> <span class="identifier">date</span><span class="special">,</span> <span class="identifier">format</span> <span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">str</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">I was born on <date>5/30/1973</date> at 7am. |
| </pre> |
| <p> |
| <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.find_all_the_sub_strings_that_match_a_regex_and_step_through_them_one_at_a_time"></a><h5> |
| <a name="id3134127"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.find_all_the_sub_strings_that_match_a_regex_and_step_through_them_one_at_a_time">Find |
| all the sub-strings that match a regex and step through them one at a time</a> |
| </h5> |
| <p> |
| The following program finds the words in a wide-character string. It uses |
| <code class="computeroutput"><span class="identifier">wsregex_iterator</span></code>. Notice |
| that dereferencing a <code class="computeroutput"><span class="identifier">wsregex_iterator</span></code> |
| yields a <code class="computeroutput"><span class="identifier">wsmatch</span></code> object. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">wstring</span> <span class="identifier">str</span><span class="special">(</span> <span class="identifier">L</span><span class="string">"This is his face."</span> <span class="special">);</span> |
| |
| <span class="comment">// find a whole word |
| </span> <span class="identifier">wsregex</span> <span class="identifier">token</span> <span class="special">=</span> <span class="special">+</span><span class="identifier">alnum</span><span class="special">;</span> |
| |
| <span class="identifier">wsregex_iterator</span> <span class="identifier">cur</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">token</span> <span class="special">);</span> |
| <span class="identifier">wsregex_iterator</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="keyword">for</span><span class="special">(</span> <span class="special">;</span> <span class="identifier">cur</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">;</span> <span class="special">++</span><span class="identifier">cur</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">wsmatch</span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span> <span class="special">=</span> <span class="special">*</span><span class="identifier">cur</span><span class="special">;</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">wcout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="identifier">L</span><span class="char">'\n'</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">This |
| is |
| his |
| face |
| </pre> |
| <p> |
| <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.split_a_string_into_tokens_that_each_match_a_regex"></a><h5> |
| <a name="id3134666"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_into_tokens_that_each_match_a_regex">Split |
| a string into tokens that each match a regex</a> |
| </h5> |
| <p> |
| The following program finds race times in a string and displays first the |
| minutes and then the seconds. It uses <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code>. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"Eric: 4:40, Karl: 3:35, Francesca: 2:32"</span> <span class="special">);</span> |
| |
| <span class="comment">// find a race time |
| </span> <span class="identifier">sregex</span> <span class="identifier">time</span> <span class="special">=</span> <span class="identifier">sregex</span><span class="special">::</span><span class="identifier">compile</span><span class="special">(</span> <span class="string">"(\\d):(\\d\\d)"</span> <span class="special">);</span> |
| |
| <span class="comment">// for each match, the token iterator should first take the value of |
| </span> <span class="comment">// the first marked sub-expression followed by the value of the second |
| </span> <span class="comment">// marked sub-expression |
| </span> <span class="keyword">int</span> <span class="keyword">const</span> <span class="identifier">subs</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">1</span><span class="special">,</span> <span class="number">2</span> <span class="special">};</span> |
| |
| <span class="identifier">sregex_token_iterator</span> <span class="identifier">cur</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">time</span><span class="special">,</span> <span class="identifier">subs</span> <span class="special">);</span> |
| <span class="identifier">sregex_token_iterator</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="keyword">for</span><span class="special">(</span> <span class="special">;</span> <span class="identifier">cur</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">;</span> <span class="special">++</span><span class="identifier">cur</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="special">*</span><span class="identifier">cur</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">4 |
| 40 |
| 3 |
| 35 |
| 2 |
| 32 |
| </pre> |
| <p> |
| <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.split_a_string_using_a_regex_as_a_delimiter"></a><h5> |
| <a name="id3135229"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.split_a_string_using_a_regex_as_a_delimiter">Split |
| a string using a regex as a delimiter</a> |
| </h5> |
| <p> |
| The following program takes some text that has been marked up with html and |
| strips out the mark-up. It uses a regex that matches an HTML tag and a <code class="literal"><code class="computeroutput"><a class="link" href="../boost/xpressive/regex_token_iterator.html" title="Struct template regex_token_iterator">regex_token_iterator<></a></code></code> |
| that returns the parts of the string that do <span class="emphasis"><em>not</em></span> match |
| the regex. |
| </p> |
| <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">iostream</span><span class="special">></span> |
| <span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">/</span><span class="identifier">xpressive</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span> |
| |
| <span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">xpressive</span><span class="special">;</span> |
| |
| <span class="keyword">int</span> <span class="identifier">main</span><span class="special">()</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">str</span><span class="special">(</span> <span class="string">"Now <bold>is the time <i>for all good men</i> to come to the aid of their</bold> country."</span> <span class="special">);</span> |
| |
| <span class="comment">// find a HTML tag |
| </span> <span class="identifier">sregex</span> <span class="identifier">html</span> <span class="special">=</span> <span class="char">'<'</span> <span class="special">>></span> <span class="identifier">optional</span><span class="special">(</span><span class="char">'/'</span><span class="special">)</span> <span class="special">>></span> <span class="special">+</span><span class="identifier">_w</span> <span class="special">>></span> <span class="char">'>'</span><span class="special">;</span> |
| |
| <span class="comment">// the -1 below directs the token iterator to display the parts of |
| </span> <span class="comment">// the string that did NOT match the regular expression. |
| </span> <span class="identifier">sregex_token_iterator</span> <span class="identifier">cur</span><span class="special">(</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">str</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">html</span><span class="special">,</span> <span class="special">-</span><span class="number">1</span> <span class="special">);</span> |
| <span class="identifier">sregex_token_iterator</span> <span class="identifier">end</span><span class="special">;</span> |
| |
| <span class="keyword">for</span><span class="special">(</span> <span class="special">;</span> <span class="identifier">cur</span> <span class="special">!=</span> <span class="identifier">end</span><span class="special">;</span> <span class="special">++</span><span class="identifier">cur</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="char">'{'</span> <span class="special"><<</span> <span class="special">*</span><span class="identifier">cur</span> <span class="special"><<</span> <span class="char">'}'</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| </pre> |
| <p> |
| This program outputs the following: |
| </p> |
| <pre class="programlisting">{Now }{is the time }{for all good men}{ to come to the aid of their}{ country.} |
| </pre> |
| <p> |
| <br> <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| <p></p> |
| <a name="boost_xpressive.user_s_guide.examples.display_a_tree_of_nested_results"></a><h5> |
| <a name="id3135817"></a> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples.display_a_tree_of_nested_results">Display |
| a tree of nested results</a> |
| </h5> |
| <p> |
| Here is a helper class to demonstrate how you might display a tree of nested |
| results: |
| </p> |
| <pre class="programlisting"><span class="comment">// Displays nested results to std::cout with indenting |
| </span><span class="keyword">struct</span> <span class="identifier">output_nested_results</span> |
| <span class="special">{</span> |
| <span class="keyword">int</span> <span class="identifier">tabs_</span><span class="special">;</span> |
| |
| <span class="identifier">output_nested_results</span><span class="special">(</span> <span class="keyword">int</span> <span class="identifier">tabs</span> <span class="special">=</span> <span class="number">0</span> <span class="special">)</span> |
| <span class="special">:</span> <span class="identifier">tabs_</span><span class="special">(</span> <span class="identifier">tabs</span> <span class="special">)</span> |
| <span class="special">{</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">template</span><span class="special"><</span> <span class="keyword">typename</span> <span class="identifier">BidiIterT</span> <span class="special">></span> |
| <span class="keyword">void</span> <span class="keyword">operator</span> <span class="special">()(</span> <span class="identifier">match_results</span><span class="special"><</span> <span class="identifier">BidiIterT</span> <span class="special">></span> <span class="keyword">const</span> <span class="special">&</span><span class="identifier">what</span> <span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="comment">// first, do some indenting |
| </span> <span class="keyword">typedef</span> <span class="keyword">typename</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">iterator_traits</span><span class="special"><</span> <span class="identifier">BidiIterT</span> <span class="special">>::</span><span class="identifier">value_type</span> <span class="identifier">char_type</span><span class="special">;</span> |
| <span class="identifier">char_type</span> <span class="identifier">space_ch</span> <span class="special">=</span> <span class="identifier">char_type</span><span class="special">(</span><span class="char">' '</span><span class="special">);</span> |
| <span class="identifier">std</span><span class="special">::</span><span class="identifier">fill_n</span><span class="special">(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">ostream_iterator</span><span class="special"><</span><span class="identifier">char_type</span><span class="special">>(</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special">),</span> <span class="identifier">tabs_</span> <span class="special">*</span> <span class="number">4</span><span class="special">,</span> <span class="identifier">space_ch</span> <span class="special">);</span> |
| |
| <span class="comment">// output the match |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">cout</span> <span class="special"><<</span> <span class="identifier">what</span><span class="special">[</span><span class="number">0</span><span class="special">]</span> <span class="special"><<</span> <span class="char">'\n'</span><span class="special">;</span> |
| |
| <span class="comment">// output any nested matches |
| </span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">for_each</span><span class="special">(</span> |
| <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">begin</span><span class="special">(),</span> |
| <span class="identifier">what</span><span class="special">.</span><span class="identifier">nested_results</span><span class="special">().</span><span class="identifier">end</span><span class="special">(),</span> |
| <span class="identifier">output_nested_results</span><span class="special">(</span> <span class="identifier">tabs_</span> <span class="special">+</span> <span class="number">1</span> <span class="special">)</span> <span class="special">);</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| <a class="link" href="user_s_guide.html#boost_xpressive.user_s_guide.examples" title="Examples">top</a> |
| </p> |
| </div> |
| <div class="footnotes"> |
| <br><hr width="100" align="left"> |
| <div class="footnote"><p><sup>[<a name="ftn.id3091425" href="#id3091425" class="para">4</a>] </sup> |
| See <a href="http://www.osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html" target="_top">Expression |
| Templates</a> |
| </p></div> |
| <div class="footnote"><p><sup>[<a name="ftn.id3125582" href="#id3125582" class="para">5</a>] </sup> |
| Many thanks to David Jenkins, who contributed this example. |
| </p></div> |
| </div> |
| </div> |
| <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> |
| <td align="left"></td> |
| <td align="right"><div class="copyright-footer">Copyright © 2007 Eric Niebler<p> |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) |
| </p> |
| </div></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="../xpressive.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../xpressive.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="reference.html"><img src="../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| </body> |
| </html> |