| <html> |
| <head> |
| <title>In-depth: The Parser</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| <link rel="stylesheet" href="theme/style.css" type="text/css"> |
| </head> |
| |
| <body> |
| <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
| <tr> |
| <td width="10"> |
| </td> |
| <td width="85%"> |
| <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>In-depth: The Parser</b></font> |
| </td> |
| <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
| </tr> |
| </table> |
| <br> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
| <td width="30"><a href="semantic_actions.html"><img src="theme/l_arr.gif" border="0"></a></td> |
| <td width="30"><a href="indepth_the_scanner.html"><img src="theme/r_arr.gif" border="0"></a></td> |
| </tr> |
| </table> |
| <p>What makes Spirit tick? Now on to some details... The parser class is the most |
| fundamental entity in the framework. A parser accepts a scanner comprised of |
| a first-last iterator pair and returns a match object as its result. The iterators |
| delimit the data currently being parsed. The match object evaluates to true |
| if the parse succeeds, in which case the input is advanced accordingly. Each |
| parser can represent a specific pattern or algorithm, or it can be a more complex |
| parser formed as a composition of other parsers.</p> |
| <p>All parsers inherit from the base template class, parser:</p> |
| <pre> |
| <span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>parser |
| </span><span class=special>{ |
| </span><span class=comment>/*...*/ |
| |
| </span><span class=identifier>DerivedT</span><span class=special>& </span><span class=identifier>derived</span><span class=special>(); |
| </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>derived</span><span class=special>() </span><span class=keyword>const</span><span class=special>; |
| </span><span class=special>};</span></pre> |
| <p>This class is a protocol base class for all parsers. The parser class does |
| not really know how to parse anything but instead relies on the template parameter |
| <tt>DerivedT</tt> to do the actual parsing. This technique is known as the <a href="references.html#curious_recurring">"Curiously |
| Recurring Template Pattern"</a> in template meta-programming circles. This |
| inheritance strategy gives us the power of polymorphism without the virtual |
| function overhead. In essence this is a way to implement <a href="references.html#generic_patterns">compile |
| time polymorphism</a>.</p> |
| <h2> parser_category_t</h2> |
| <p> Each derived parser has a typedef <tt>parser_category_t</tt> that defines |
| its category. By default, if one is not specified, it will inherit from the |
| base parser class which typedefs its parser_category_t as <tt>plain_parser_category</tt>. |
| Some template classes are provided to distinguish different types of parsers. |
| The following categories are the most generic. More specific types may inherit |
| from these.</p> |
| <table width="90%" border="0" align="center"> |
| <tr> |
| <td colspan="2" class="table_title">Parser categories</td> |
| </tr> |
| <tr> |
| <td class="table_cells" width="33%"><tt>plain_parser_category</tt></td> |
| <td class="table_cells" width="67%">Your plain vanilla parser</td> |
| </tr> |
| <tr> |
| <td class="table_cells" width="33%"><tt>binary_parser_category</tt></td> |
| <td class="table_cells" width="67%">A parser that has subject a and b (e.g. |
| alternative)</td> |
| </tr> |
| <tr> |
| <td class="table_cells" width="33%"><tt>unary_parser_category</tt></td> |
| <td class="table_cells" width="67%">A parser that has single subject (e.g. |
| kleene star)</td> |
| </tr> |
| <tr> |
| <td class="table_cells" width="33%"><tt>action_parser_category</tt></td> |
| <td class="table_cells" width="67%">A parser with an attached semantic action</td> |
| </tr> |
| </table> |
| <pre><span class=identifier> </span><span class=keyword>struct </span><span class=identifier>plain_parser_category </span><span class=special>{}; |
| </span><span class=keyword>struct </span><span class=identifier>binary_parser_category </span><span class=special>: </span><span class=identifier>plain_parser_category </span><span class=special>{}; |
| </span><span class=keyword>struct </span><span class=identifier>unary_parser_category </span><span class=special>: </span><span class=identifier>plain_parser_category </span><span class=special>{}; |
| </span><span class=keyword>struct </span><span class=identifier>action_parser_category </span><span class=special>: </span><span class=identifier>unary_parser_category </span><span class=special>{};</span></pre> |
| <h2>embed_t</h2> |
| <p>Each parser has a typedef <tt>embed_t</tt>. This typedef specifies how a parser |
| is embedded in a composite. By default, if one is not specified, the parser |
| will be embedded by value. That is, a copy of the parser is placed as a member |
| variable of the composite. Most parsers are embedded by value. In certain situations |
| however, this is not desirable or possible. One particular example is the <a href="rule.html">rule</a>. |
| The rule, unlike other parsers is embedded by reference.</p> |
| <h2><a name="match"></a>The match</h2> |
| <p>The match holds the result of a parser. A match object evaluates to true when |
| a succesful match is found, otherwise false. The length of the match is the |
| number of characters (or tokens) that is successfully matched. This can be queried |
| through its <tt>length()</tt> member function. A negative value means that the |
| match is unsucessful. </p> |
| <p> Each parser may have an associated attribute. This attribute is also returned |
| back to the client on a successful parse through the match object. We can get |
| this attribute via the match's <tt>value()</tt> member function. Be warned though |
| that the match's attribute may be invalid, in which case, getting the attribute |
| will result in an exception. The member function <tt>has_valid_attribute()</tt> |
| can be queried to know if it is safe to get the match's attribute. The attribute |
| may be set anytime through the member function <tt>value(v)</tt>where <tt>v</tt> |
| is the new attribute value.<br> |
| <br> |
| A match attribute is valid:</p> |
| <ul> |
| <li> on a successful match</li> |
| <li>when its value is set through the <tt>value(val)</tt> member function</li> |
| <li> if it is assigned or copied from a compatible match object (e.g. <tt>match<double></tt> |
| from <tt>match<int></tt>) with a valid attribute. A match object <tt>A</tt> |
| is compatible with another match object <tt>B</tt> if the attribute type of |
| <tt>A</tt> can be assigned from the attribute type of <tt></tt> <tt>B</tt> |
| (i.e. <tt>a = b;</tt> must compile).</li> |
| </ul> |
| <p>The match attribute is undefined:</p> |
| <ul> |
| <li>on an unsuccessful match </li> |
| <li>when an attempt to copy or assign from another match object with an incompatible |
| attribute type (e.g. <tt>match<std::string></tt> from <tt>match<int></tt>).</li> |
| </ul> |
| <h3>The match class:</h3> |
| <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>T</span><span class=special>> |
| </span><span class=keyword> class </span><span class=identifier>match |
| </span><span class=keyword> </span><span class=special>{ |
| </span><span class=keyword> public</span><span class=special>: |
| |
| </span><span class=keyword> </span><span class=comment>/*...*/ |
| |
| </span><span class=special> </span><span class=keyword> typedef</span><span class="identifier"> T attr_t</span><span class="special">;<br> |
| </span><span class=keyword> </span><span class="special"> </span><span class=keyword>operator safe_bool</span><span class=special>() </span><span class=keyword>const</span>; <span class="comment">// convertible to a bool</span> |
| <span class=keyword> int </span><span class=identifier>length</span><span class=special>() </span><span class=keyword>const</span>; |
| <span class="keyword">bool</span> has_valid_attribute<span class="special">()</span> <span class="keyword">const</span><span class="special">;</span> |
| <span class=keyword> </span> <span class=identifier>void</span><span class=special> </span><span class=identifier>value</span><span class=special>(</span><span class="identifier">T </span><span class="keyword">const</span><span class=special>&) </span><span class=keyword>const; |
| </span><span class=identifier>T </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>value</span><span class=special>(); |
| </span><span class=keyword> </span><span class=special>};</span></pre> |
| <h2>match_result</h2> |
| <p>It has been mentioned repeatedly that the parser returns a match object as |
| its result. This is a simplification. Actually, for the sake of genericity, |
| parsers are really not hard-coded to return a match object. More accurately, |
| a parser returns an object that adheres to a conceptual interface, of which |
| the match is an example. Nevertheless, we shall call the result type of a parser |
| a match object regardless if it is actually a match class, a derivative or a |
| totally unrelated type.</p> |
| <table width="80%" border="0" align="center"> |
| <tr> |
| <td class="note_box"><img src="theme/lens.gif" width="15" height="16"> <b>Meta-functions</b><br> |
| <br> |
| What are meta-functions? We all know how functions look like. In simplest |
| terms, a function accepts some arguments and returns a result. Here is the |
| function we all love so much:<br> |
| <br> |
| <code><span class="keyword">int</span> identity_func<span class="special">(</span><span class="keyword">int</span> |
| arg<span class="special">)</span><br> |
| <span class="special">{</span> <span class="keyword">return</span> arg<span class="special">; |
| }</span> <span class="comment">// return the argument arg</span><br> |
| </code><br> |
| Meta-functions are essentially the same. These beasts also accept arguments |
| and return a result. However, while functions work at runtime on values, |
| meta-functions work at compile time on types (or constants, but we shall |
| deal only with types). The meta-function is a template class (or struct). |
| The template parameters are the arguments to the meta-function and a typedef |
| within the class is the meta-function's return type. Here is the corresponding |
| meta-function:<code><br> |
| <br> |
| <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> |
| ArgT<span class="special">></span><br> |
| <span class="keyword">struct</span> identity_meta_func<br> |
| <span class="special">{</span> <span class="keyword">typedef</span> ArgT |
| type<span class="special">; } </span><span class="comment">// return the |
| argument ArgT</span><br> |
| <br> |
| </code>The meta-function above is invoked as:<br> |
| <br> |
| <code><span class="keyword">typename</span> identity_meta_func<span class="special"><</span>ArgT<span class="special">>::</span>type</code><br> |
| <br> |
| By convention, meta-functions return the result through the typedef <tt>type</tt>. |
| Take note that <tt>typename</tt> is only required within templates.</td> |
| </tr> |
| </table> |
| <p>The actual match type used by the parser depends on two types: the parser's |
| attribute type and the scanner type. <tt>match_result</tt> is the meta-function |
| that returns the desired match type given an attribute type and a scanner type. |
| </p> |
| <p>Usage:</p> |
| <pre> <span class=keyword>typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class=identifier>T</span><span class=special>>::</span><span class=identifier>type</span></pre> |
| <p>The meta-function basically answers the question "given a scanner type |
| <tt>ScannerT</tt> and an attribute type <tt>T</tt>, what is the desired match |
| type?" [<img src="theme/note.gif" width="16" height="16"> <tt>typename</tt> |
| is only required within templates ].</p> |
| <h2>The parse member function</h2> |
| <p>Concrete sub-classes inheriting from parser must have a corresponding member |
| function <tt>parse(...)</tt> compatible with the conceptual Interface:<br> |
| </p> |
| <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
| </span><span class=identifier>RT |
| </span><span class=identifier>parse</span><span class=special>(</span><span class=identifier>ScannerT</span><span class=special></span> const<span class=special>& </span>scan<span class=identifier></span><span class=special>) </span><span class=keyword>const</span><span class=special>;</span></pre> |
| <p>where <tt>RT</tt> is the desired return type of the parser. </p> |
| <h2>The parser result</h2> |
| <p>Concrete sub-classes inheriting from parser in most cases need to have a nested |
| meta-function <tt>result</tt> that returns the result <tt>type</tt> of the parser's |
| parse member function, given a scanner type. The meta-function has the form:</p> |
| <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>result |
| </span><span class=special>{ |
| </span><span class=keyword>typedef </span>RT <span class=identifier></span><span class=identifier>type</span><span class=special>; |
| </span><span class=special>};</span></pre> |
| <p>where <tt>RT</tt> is the desired return type of the parser. This is usually, |
| but not always, dependent on the template parameter <tt>ScannerT</tt>. For example, |
| given an attribute type <tt>int</tt>, we can use the match_result metafunction:</p> |
| <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>result |
| </span><span class=special>{ |
| </span><span class=keyword>typedef typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class="keyword">int</span><span class=special>>::</span><span class=identifier>type type</span><span class=special>; |
| };</span></pre> |
| <p>If a parser does not supply a result metafunction, a default is provided by |
| the base parser class.<span class=special> </span>The default is declared as:</p> |
| <pre><span class=keyword> template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>result |
| </span><span class=special>{ |
| </span><span class=keyword>typedef typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class="identifier">nil_t</span><span class=special>>::</span><span class=identifier>type type</span><span class=special>; |
| };</span></pre> |
| <p>Without a result metafunction, notice that the parser's default attribute is |
| <tt>nil_t</tt> (i.e. the parser has no attribute).</p> |
| <h2><span class=special></span>parser_result</h2> |
| <p>Given a a scanner type <tt>ScannerT</tt> and a parser type <tt>ParserT</tt>, |
| what will be the actual result of the parser? The answer to this question is |
| provided to by the <tt>parser_result</tt> meta-function.</p> |
| <p>Usage:</p> |
| <pre> <span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><</span><span class=identifier>ParserT, ScannerT</span><span class=special>>::</span><span class=identifier>type</span></pre> |
| <p>In general, the meta-function just forwards the invocation to the parser's |
| result meta-function:</p> |
| <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ParserT</span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>parser_result |
| </span><span class=special>{ |
| </span><span class=keyword>typedef </span><span class=keyword>typename </span><span class=identifier>ParserT</span><span class=special>::</span><span class=keyword>template </span><span class=identifier>result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type </span><span class=identifier>type</span><span class=special>; |
| </span><span class=special>};</span></pre> |
| <p>This is similar to a global function calling a member function. Most of the |
| time, the usage above is equivalent to:</p> |
| <pre><span class=keyword> typename </span><span class=identifier>ParserT</span><span class=special>::</span><span class=keyword>template </span><span class=identifier>result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type</span></pre> |
| <p>Yet, this should not be relied upon to be true all the time because the parser_result |
| metafunction might be specialized for specific parser and/or scanner types.</p> |
| <p>The parser_result metafunction makes the signature of the required parse member |
| function almost canonical:</p> |
| <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>> |
| </span><span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><</span><span class=identifier>self_t, ScannerT</span><span class=special>>::</span><span class=identifier>type</span><br> <span class=identifier>parse</span><span class=special>(</span><span class=identifier>ScannerT</span><span class=special></span> const<span class=special>& </span>scan<span class=identifier></span><span class=special>) </span><span class=keyword>const</span><span class=special>;</span></pre> |
| <p>where<span class=special></span> <tt>self_t</tt> is a typedef to the parser.</p> |
| <h2>parser class declaration</h2> |
| <pre><span class=identifier> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>parser |
| </span><span class=special>{ |
| </span><span class=keyword>typedef </span><span class=identifier>DerivedT embed_t</span><span class=special>; |
| </span><span class=keyword>typedef </span><span class=identifier>DerivedT derived_t</span><span class=special>; |
| </span><span class=keyword>typedef </span><span class=identifier>plain_parser_category parser_category_t</span><span class=special>; |
| |
| </span><span class=keyword>template </span><span class=special><</span><span class="keyword">typename</span> ScannerT<span class=special>> |
| </span><span class=keyword>struct </span><span class=identifier>result |
| </span><span class=special>{ |
| </span><span class=keyword>typedef typename </span><span class=identifier>match_result</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>, </span><span class=identifier>nil_t</span><span class=special>>::</span><span class=identifier>type type</span><span class=special>; |
| }; |
| |
| </span><span class=identifier>DerivedT</span><span class=special>& </span><span class=identifier>derived</span><span class=special>(); |
| </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>derived</span><span class=special>() </span><span class=keyword>const</span><span class=special>; |
| |
| </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ActionT</span><span class=special>> |
| </span><span class=identifier>action</span><span class=special><</span><span class=identifier>DerivedT</span><span class=special>, </span><span class=identifier>ActionT</span><span class=special>> |
| </span><span class=keyword>operator</span><span class=special>[](</span><span class=identifier>ActionT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>actor</span><span class=special>) </span><span class=keyword>const</span><span class=special>; |
| };</span></pre> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
| <td width="30"><a href="semantic_actions.html"><img src="theme/l_arr.gif" border="0"></a></td> |
| <td width="30"><a href="indepth_the_scanner.html"><img src="theme/r_arr.gif" border="0"></a></td> |
| </tr> |
| </table> |
| <br> |
| <hr size="1"> |
| <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
| <br> |
| <font size="2">Use, modification and distribution is subject to the Boost Software |
| License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
| http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
| </body> |
| </html> |