| <html> |
| <head> |
| <title>Rationale</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| <link rel="stylesheet" href="theme/style.css" type="text/css"> |
| </head> |
| |
| <body> |
| <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
| <tr> |
| <td width="10"> |
| </td> |
| <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Rationale</b></font></td> |
| <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
| </tr> |
| </table> |
| <br> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
| <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td> |
| <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td> |
| </tr> |
| </table> |
| <p><img src="theme/lens.gif" width="15" height="16"> <strong>Virtual functions: |
| From static to dynamic C++</strong></p> |
| <p>Rules straddle the border between static and dynamic C++. In effect, a rule |
| transforms compile-time polymorphism (using templates) into run-time polymorphism |
| (using virtual functions). This is necessary due to C++'s inability to automatically |
| declare a variable of a type deduced from an arbitrarily complex expression |
| in the right-hand side (rhs) of an assignment. Basically, we want to do something |
| like:</p> |
| <pre><code><font color="#000000"> <span class=identifier>T </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>an_arbitrarily_complex_expression</span><span class=special>;</span></font></code></pre> |
| <p>without having to know or care about the resulting type of the right-hand side |
| (rhs) of the assignment expression. Apart from this, we also need a facility |
| to forward declare an unknown type:</p> |
| <pre><code><font color="#000000"><span class=special> </span><span class=identifier>T </span><span class=identifier>rule</span><span class=special>; |
| </span><span class=special>... |
| </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></font></code></pre> |
| <p>These limitations lead us to this implementation of rules. This comes at the |
| expense of the overhead of a virtual function call, once through each invocation |
| of a rule.</p> |
| <p><img src="theme/lens.gif" width="15" height="16"> <strong>Multiple declaration |
| </strong> </p> |
| <p>Some BNF variants allow multiple declarations of a <tt>rule</tt>. The declarations |
| are taken as alternatives. Example:</p> |
| <pre> |
| <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a</span><span class=special>; </span><span class=identifier> |
| r </span><span class=special>= </span><span class=identifier>b</span><span class=special>;</span></code></pre> |
| <p> is equivalent to: </p> |
| <pre> |
| <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></code></pre> |
| <p>Spirit v1.3 allowed this behavior. However, the current version of Spirit <b>no |
| longer</b> allows this because experience shows that this behavior leads to |
| unwanted gotchas (for instance, it does not allow rules to be held in containers). |
| In the current release of Spirit, a second assignment to a rule will simply |
| redefine it. The old definition is destructed. This follows more closely C++ |
| semantics and is more in line with what the user expects the rule to behave.</p> |
| <p><img src="theme/lens.gif" width="15" height="16"> <b>Sequencing Syntax</b> |
| <br> |
| <br> |
| The comma operator as in a, b seems to be a better candidate, syntax-wise. But |
| then the problem is with its precedence. It has the lowest precedence in C/C++, |
| which makes it virtually useless. <br> |
| <br> |
| Bjarne Stroustrup, in his article <a href="references.html#generalized_overloading">"Generalizing |
| Overloading for C++2000"</a> talks about overloading whitespace. Such a |
| feature would allow juxtapositioning of parser objects exactly as we do in (E)BNF |
| (e.g. a b | c instead of a >> b | c). Unfortunately, the article was dated |
| April 1, 1998. Oh well.</p> |
| <p><img src="theme/lens.gif" width="15" height="16"> <b>Forward iterators</b><br> |
| <br> |
| In general, the scanner expects at least a standard conforming forward iterator. |
| Forward iterators are needed for backtracking where the iterator needs to be |
| saved and restored later. Generally speaking, Spirit is a backtracking parser. |
| The implication of this is that at some point, the iterator position will have |
| to be saved to allow the parser to backtrack to a previous point. Thus, for |
| backtracking to work, the framework requires at least a forward iterator.<br> |
| <br> |
| Some parsers might require more specialized iterators (bi-directional or even |
| random access). Perhaps in the future, deterministic parsers when added to the |
| framework, will perform no backtracking and will need just a single token lookahead, |
| hence will require input iterators only.</p> |
| <p><img src="theme/lens.gif" width="15" height="16"><b> Why are subrules important?</b><br> |
| <br> |
| Subrules open up the oportunity to do aggressive meta programming as well because |
| they do not rely on virtual functions. The virtual function is the meta-programmer's |
| hell. Not only does it slow down the program due to the virtual function indirect |
| call, it is also an opaque wall where no metaprogram can get past. It kills |
| all meta-information beyond the virtual function call. Worse, the virtual function |
| cannot be templated. Which means that its arguments have to be tied to a actual |
| types. Many problems stem from this limitation. <br> |
| <br> |
| While Spirit is a currently classified as a non-deterministic recursive-descent |
| parser, Doug Gregor first noted that other parsing techniques apart from top-down |
| recursive descent may be applied. For instance, apart from non-deterministic |
| recursive descent, deterministic LL(1) and LR(1) can theoretically be implemented |
| using the same expression template front end. Spirit rules use virtual functions |
| to encode the RHS parser expression in an opaque abstract parser type. While |
| it serves its purpose well, the rule's virtual functions are the stumbling blocks |
| to more advanced metaprogramming. Subrules are free from virtual functions.</p> |
| <p><img src="theme/lens.gif" width="15" height="16"><b> <a name="exhaustive_rd"></a>Exhaustive |
| backtracking and greedy RD</b></p> |
| <p>Spirit doesn't do exhaustive backtracking like regular expressions are expected |
| to. For example:</p> |
| <pre> <span class="special">*</span>chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">) >></span> chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">);</span></pre> |
| <p>will always fail to match because Spirit's Kleene star does not back off when |
| the rest of the rule fails to match. </p> |
| <p>Actually, there's a solution to this greedy RD problem. Such a scheme is discussed |
| in section 6.6.2 of <a |
| href="http://www.cs.vu.nl/%7Edick/PTAPG.html">Parsing Techniques: A Practical |
| Guide</a>. The trick involves passing a <em>tail</em> parser (in addition to |
| the scanner) to each parser. The start parser will then simply be: <tt>start |
| >> end_p;</tt> (end_p is the start's tail). </p> |
| <p>Spirit is greedy --using straight forward, naive RD. It is certainly possible |
| to implement the fully backtracking scheme presented above, but there will be |
| also certainly be a performance hit. The scheme will always try to match all |
| possible parser paths (full parser hierarchy traversal) until it reaches a point |
| of certainty, that the whole thing matches or fails to match. </p> |
| <table border="0" width="80%" align="center"> |
| <tr> |
| <td class="note_box"><p><img src="theme/note.gif" width="16" height="16"><b>Backtracking |
| and Greedy RD </b><br> |
| <br> |
| Spirit is quite consistent and intuitive about when it backtracks and |
| to where, although it may not be obvious to those coming from different |
| backgrounds. In general, any (sub)parser will, given the same input, always |
| match the same portion of the input (or fail to match the input at all). |
| This means that Spirit is inherently greedy. Spirit will only backtrack |
| when a (sub)parser fails to match the input, and it will always backtrack |
| to the next choice point upward (not backward) in the parser structure. |
| In other words abb|ab will match "ab", as will a(bb|b), but |
| (ab|a)b won't because the (ab|a) subparser will always match the 'b' after |
| the 'a' if it is available.</p> |
| <p>--Rainer Deyke</p> |
| </td> |
| </tr> |
| </table> |
| <p>There's a strong preference on "simplicity with all the knobs when you |
| need them" approach, right now. On the other hand, the flexibility of Spirit |
| makes it possible to have different optional schemes available. It might be |
| possible to implement an exhaustive backtracking RD scheme as an optional feature |
| in the future. </p> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
| <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td> |
| <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td> |
| </tr> |
| </table> |
| <br> |
| <hr size="1"> |
| <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br> |
| <br> |
| <font size="2">Use, modification and distribution is subject to the Boost Software |
| License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
| http://www.boost.org/LICENSE_1_0.txt)</font></p> |
| <p class="copyright"> </p> |
| </body> |
| </html> |