| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html> |
| <head> |
| <title>Confix Parsers</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| <link rel="stylesheet" href="theme/style.css" type="text/css"> |
| </head> |
| |
| <body> |
| <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
| <tr> |
| <td width="10"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b> </b></font></td> |
| <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Confix Parsers</b></font></td> |
| <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td> |
| </tr> |
| </table> |
| <br> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
| <td width="30"><a href="character_sets.html"><img src="theme/l_arr.gif" border="0"></a></td> |
| <td width="30"><a href="list_parsers.html"><img src="theme/r_arr.gif" border="0"></a></td> |
| </tr> |
| </table> |
| <p><a name="confix_parser"></a><b>Confix Parsers</b></p> |
| <p>Confix Parsers recognize a sequence out of three independent elements: an |
| opening, an expression and a closing. A simple example is a C comment: |
| </p> |
| <pre><code class="comment"> /* This is a C comment */</code></pre> |
| <p>which could be parsed through the following rule definition:<code><font color="#000000"> |
| </font></code> </p> |
| <pre><span class=identifier> </span><span class=identifier>rule</span><span class=special><> </span><span class=identifier>c_comment_rule |
| </span><span class=special>= </span><span class=identifier>confix_p</span><span class=special>(</span><span class=literal>"/*"</span><span class=special>, </span><span class=special>*</span><span class=identifier>anychar_p</span><span class=special>, </span><span class=literal>"*/"</span><span class=special>) |
| </span><span class=special>;</span></pre> |
| <p>The <tt>confix_p</tt> parser generator |
| should be used for generating the required Confix Parser. The |
| three parameters to <tt>confix_p</tt> can be single |
| characters (as above), strings or, if more complex parsing logic is required, |
| auxiliary parsers, each of which is automatically converted to the corresponding |
| parser type needed for successful parsing.</p> |
| <p>The generated parser is equivalent to the following rule: </p> |
| <pre><code> <span class=identifier>open </span><span class=special>>> (</span><span class=identifier>expr </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
| <p>If the expr parser is an <tt>action_parser_category</tt> type parser (a parser |
| with an attached semantic action) we have to do something special. This happens, |
| if the user wrote something like:</p> |
| <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, </span><span class=identifier>expr</span><span class=special>[</span><span class=identifier>func</span><span class=special>], </span><span class=identifier>close</span><span class=special>)</span></code></pre> |
| <p>where <code>expr</code> is the parser matching the expr of the confix sequence |
| and <code>func</code> is a functor to be called after matching the <code>expr</code>. |
| If we would do nothing, the resulting code would parse the sequence as follows:</p> |
| <pre><code> <span class=identifier>open </span><span class=special>>> (</span><span class=identifier>expr</span><span class=special>[</span><span class=identifier>func</span><span class=special>] - </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
| <p>which in most cases is not what the user expects. (If this <u>is</u> what you've |
| expected, then please use the <tt>confix_p</tt> generator |
| function <tt>direct()</tt>, which will inhibit the parser refactoring). To make |
| the confix parser behave as expected:</p> |
| <pre><code><span class=identifier> open </span><span class=special>>> (</span><span class=identifier>expr </span><span class=special>- </span><span class=identifier>close</span><span class=special>)[</span><span class=identifier>func</span><span class=special>] >> </span><span class=identifier>close</span></code></pre> |
| <p>the actor attached to the <code>expr</code> parser has to be re-attached to |
| the <code>(expr - close)</code> parser construct, which will make the resulting |
| confix parser 'do the right thing'. This refactoring is done by the help of |
| the <a href="refactoring.html">Refactoring Parsers</a>. Additionally special |
| care must be taken, if the expr parser is a <tt>unary_parser_category</tt> type |
| parser as </p> |
| <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, *</span><span class=identifier>anychar_p</span><span class=special>, </span><span class=identifier>close</span><span class=special>)</span></code></pre> |
| <p>which without any refactoring would result in </p> |
| <pre><code> <span class=identifier>open</span> <span class=special>>> (*</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
| <p>and will not give the expected result (*anychar_p will eat up all the input up |
| to the end of the input stream). So we have to refactor this into: |
| <pre><code><span class=identifier> open </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>close</span><span class=special>) >> </span><span class=identifier>close</span></code></pre> |
| <p>what will give the correct result. </p> |
| <p>The case, where the expr parser is a combination of the two mentioned problems |
| (i.e. the expr parser is a unary parser with an attached action), is handled |
| accordingly too, so: </p> |
| <pre><code><span class=identifier> confix_p</span><span class=special>(</span><span class=identifier>open</span><span class=special>, (*</span><span class=identifier>anychar_p</span><span class=special>)[</span><span class=identifier>func</span><span class=special>], </span>close<span class=special>)</span></code></pre> |
| <p>will be parsed as expected: </p> |
| <pre><code> <span class=identifier>open</span> <span class=special>>> (*(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=identifier>end</span><span class=special>))[</span><span class=identifier>func</span><span class=special>] >> </span>close</code></pre> |
| <p>The required refactoring is implemented here with the help of the <a href="refactoring.html">Refactoring |
| Parsers</a> too.</p> |
| <table width="90%" border="0" align="center"> |
| <tr> |
| <td colspan="2" class="table_title"><b>Summary of Confix Parser refactorings</b></td> |
| </tr> |
| <tr class="table_title"> |
| <td width="40%"><b>You write it as:</b></td> |
| <td width="60%"><code><font face="Verdana, Arial, Helvetica, sans-serif">It |
| is refactored to:</font></code></td> |
| </tr> |
| <tr> |
| <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">,</span> |
| expr<span class="special">,</span> close<span class="special">)</span></code></td> |
| <td width="60%" class="table_cells"> <p><code>open <span class=special>>> |
| (</span>expr <span class=special>-</span> close<span class=special>)</span><font color="#0000FF"> |
| </font><span class=special>>></span> close</code></p> |
| </td> |
| </tr> |
| <tr> |
| <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">,</span> |
| expr<span class="special">[</span>func<span class="special">],</span> close<span class="special">)</span></code></td> |
| <td width="60%" class="table_cells"> <p><code>open <span class=special>>> |
| (</span>expr <span class=special>-</span> close<span class="special">)[</span>func<span class="special">] |
| <font color="#0000FF" class="special">>></font></span> close</code></p> |
| </td> |
| </tr> |
| <tr> |
| <td width="40%" class="table_cells" height="9"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">, |
| *</span>expr<span class="special">,</span> close<span class="special">)</span></code></td> |
| <td width="60%" class="table_cells" height="9"> <p><code>open <font color="#0000FF"><span class="special">>></span></font> |
| <span class="special"><font color="#0000FF" class="special">*</font>(</span>expr |
| <font color="#0000FF" class="special">-</font> close<span class="special">) |
| <font color="#0000FF" class="special">>></font></span> close</code></p> |
| </td> |
| </tr> |
| <tr> |
| <td width="40%" class="table_cells"><code>confix_p<span class="special">(</span><span class=identifier>open</span><span class="special">, |
| (*</span>expr<span class="special">)[</span>func<span class="special">], |
| close</span><span class="special">)</span></code></td> |
| <td width="60%" class="table_cells"> <p><code>open <font color="#0000FF"><span class="special">>></span></font><span class="special"> |
| (<font color="#0000FF" class="special">*</font>(</span>expr <font color="#0000FF" class="special">-</font> |
| close<span class="special">))[</span>func<span class="special">] <font color="#0000FF" class="special">>></font></span> |
| close</code></p> |
| </td> |
| </tr> |
| </table> |
| <p><a name="comment_parsers"></a><b>Comment Parsers</b></p> |
| <p>The Comment Parser generator template <tt>comment_p</tt> |
| is helper for generating a correct <a href="#confix_parser">Confix Parser</a> |
| from auxiliary parameters, which is able to parse comment constructs as follows: |
| </p> |
| <pre><code> StartCommentToken <span class="special">>></span> Comment text <span class="special">>></span> EndCommentToken</code></pre> |
| <p>There are the following types supported as parameters: parsers, single |
| characters and strings (see as_parser). If it |
| is used with one parameter, a comment starting with the given first parser |
| parameter up to the end of the line is matched. So for instance the following |
| parser matches C++ style comments:</p> |
| |
| <pre><code><span class=identifier> comment_p</span><span class=special>(</span><span class=string>"//"</span><span class=special>)</span></code></pre> |
| <p>If it is used with two parameters, a comment starting with the first parser |
| parameter up to the second parser parameter is matched. For instance a C style |
| comment parser could be constrcuted as:</p> |
| <pre><code> <span class=identifier>comment_p</span><span class=special>(</span><span class=string>"/*"</span><span class=special>, </span><span class=string>"*/"</span><span class=special>)</span></code></pre> |
| <p>The <tt>comment_p</tt> parser generator allows to generate parsers for matching |
| non-nested comments (as for C/C++ comments). Sometimes it is necessary to parse |
| nested comments as for instance allowed in Pascal.</p> |
| <pre><code class="comment"> { This is a { nested } PASCAL-comment }</code></pre> |
| <p>Such nested comments are |
| parseable through parsers generated by the <tt>comment_nest_p</tt> generator |
| template functor. The following example shows a parser, which can be used for |
| parsing the two different (nestable) Pascal comment styles:</p> |
| <pre><code> <span class=identifier>rule</span><span class=special><> </span><span class=identifier>pascal_comment |
| </span><span class=special>= </span><span class=identifier>comment_nest_p</span><span class=special>(</span><span class=string>"(*"</span><span class=special>, </span><span class=string>"*)"</span><span class=special>) |
| | </span><span class=identifier>comment_nest_p</span><span class=special>(</span><span class=literal>'{'</span><span class=special>, </span><span class=literal>'}'</span><span class=special>) |
| ;</span></code></pre> |
| <p>Please note, that a comment is parsed implicitly as if the whole <tt>comment_p(...)</tt> |
| statement were embedded into a <tt>lexeme_d[]</tt> directive, i.e. during parsing |
| of a comment no token skipping will occur, even if you've defined a skip parser |
| for your whole parsing process.</p> |
| <p> <img height="16" width="15" src="theme/lens.gif"> <a href="../example/fundamental/comments.cpp">comments.cpp</a> demonstrates various comment parsing schemes: </p> |
| <ol> |
| <li>Parsing of different comment styles </li> |
| <ul> |
| <li>parsing C/C++-style comment</li> |
| <li>parsing C++-style comment</li> |
| <li>parsing PASCAL-style comment</li> |
| </ul> |
| <li>Parsing tagged data with the help of the confix_parser</li> |
| <li>Parsing tagged data with the help of the confix_parser but the semantic<br> |
| action is directly attached to the body sequence parser</li> |
| </ol> |
| <p>This is part of the Spirit distribution.</p> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td> |
| <td width="30"><a href="character_sets.html"><img src="theme/l_arr.gif" border="0"></a></td> |
| <td width="30"><a href="list_parsers.html"><img src="theme/r_arr.gif" border="0"></a></td> |
| </tr> |
| </table> |
| <br> |
| <hr size="1"> |
| <p class="copyright">Copyright © 2001-2002 Hartmut Kaiser<br> |
| <br> |
| <font size="2">Use, modification and distribution is subject to the Boost Software |
| License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
| http://www.boost.org/LICENSE_1_0.txt) </font> </p> |
| </body> |
| </html> |