| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html> |
| <head> |
| <meta content= |
| "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org" |
| name="generator"> |
| <title> |
| Quick Start |
| </title> |
| <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> |
| <link rel="stylesheet" href="theme/style.css" type="text/css"> |
| </head> |
| <body> |
| <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2"> |
| <tr> |
| <td width="10"></td> |
| <td width="85%"> |
| <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Quick |
| Start</b></font> |
| </td> |
| <td width="112"> |
| <a href="http://spirit.sf.net"><img src="theme/spirit.gif" |
| width="112" height="48" align="right" border="0"></a> |
| </td> |
| </tr> |
| </table><br> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"> |
| <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a> |
| </td> |
| <td width="30"> |
| <a href="introduction.html"><img src="theme/l_arr.gif" border="0"> |
| </a> |
| </td> |
| <td width="30"> |
| <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0"> |
| </a> |
| </td> |
| </tr> |
| </table> |
| <h2> |
| <b>Why would you want to use Spirit?</b> |
| </h2> |
| <p> |
| Spirit is designed to be a practical parsing tool. At the very least, the |
| ability to generate a fully-working parser from a formal EBNF |
| specification inlined in C++ significantly reduces development time. |
| While it may be practical to use a full-blown, stand-alone parser such as |
| YACC or ANTLR when we want to develop a computer language such as C or |
| Pascal, it is certainly overkill to bring in the big guns when we wish to |
| write extremely small micro-parsers. At that end of the spectrum, |
| programmers typically approach the job at hand not as a formal parsing |
| task but through ad hoc hacks using primitive tools such as |
| <tt>scanf</tt>. True, there are tools such as regular-expression |
| libraries (such as <a href= |
| "http://www.boost.org/libs/regex/index.html">boost regex</a>) or scanners |
| (such as <a href="http://www.boost.org/libs/tokenizer/index.html">boost |
| tokenizer</a>), but these tools do not scale well when we need to write |
| more elaborate parsers. Attempting to write even a moderately-complex |
| parser using these tools leads to code that is hard to understand and |
| maintain. |
| </p> |
| <p> |
| One prime objective is to make the tool easy to use. When one thinks of a |
| parser generator, the usual reaction is "it must be big and complex with |
| a steep learning curve." Not so. Spirit is designed to be fully scalable. |
| The framework is structured in layers. This permits learning on an |
| as-needed basis, after only learning the minimal core and basic concepts. |
| </p> |
| <p> |
| For development simplicity and ease in deployment, the entire framework |
| consists of only header files, with no libraries to link against or |
| build. Just put the spirit distribution in your include path, compile and |
| run. Code size? -very tight. In the quick start example that we shall |
| present in a short while, the code size is dominated by the instantiation |
| of the <tt>std::vector</tt> and <tt>std::iostream</tt>. |
| </p> |
| <h2> |
| <b>Trivial Example #1</b></h2> |
| <p>Create a parser that will parse |
| a floating-point number. |
| </p> |
| <pre><code><font color="#000000"> </font></code><span class="identifier">real_p</span> |
| </pre> |
| <p> |
| (You've got to admit, that's trivial!) The above code actually generates |
| a Spirit <tt>real_parser</tt> (a built-in parser) which parses a floating |
| point number. Take note that parsers that are meant to be used directly |
| by the user end with "<tt>_p</tt>" in their names as a Spirit convention. |
| Spirit has many pre-defined parsers and consistent naming conventions |
| help you keep from going insane! |
| </p> |
| <h2> |
| <b>Trivial Example #2</b></h2> |
| <p> |
| Create a parser that will accept a line consisting of two floating-point |
| numbers. |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><span class= |
| "identifier">real_p</span> <span class= |
| "special">>></span> <span class="identifier">real_p</span></code> |
| </pre> |
| <p> |
| Here you see the familiar floating-point numeric parser |
| <code><tt>real_p</tt></code> used twice, once for each number. What's |
| that <tt class="operators">>></tt> operator doing in there? Well, |
| they had to be separated by something, and this was chosen as the |
| "followed by" sequence operator. The above program creates a parser from |
| two simpler parsers, glueing them together with the sequence operator. |
| The result is a parser that is a composition of smaller parsers. |
| Whitespace between numbers can implicitly be consumed depending on how |
| the parser is invoked (see below). |
| </p> |
| <p> |
| Note: when we combine parsers, we end up with a "bigger" parser, But it's |
| still a parser. Parsers can get bigger and bigger, nesting more and more, |
| but whenever you glue two parsers together, you end up with one bigger |
| parser. This is an important concept. |
| </p> |
| <h2> |
| <b>Trivial Example #3</b></h2> |
| <p> |
| Create a parser that will accept an arbitrary number of floating-point |
| numbers. (Arbitrary means anything from zero to infinity) |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><span class= |
| "special">*</span><span class="identifier">real_p</span></code> |
| </pre> |
| <p> |
| This is like a regular-expression Kleene Star, though the syntax might |
| look a bit odd for a C++ programmer not used to seeing the <tt class= |
| "operators">*</tt> operator overloaded like this. Actually, if you know |
| regular expressions it may look odd too since the star is <b>before</b> |
| the expression it modifies. C'est la vie. Blame it on the fact that we |
| must work with the syntax rules of C++. |
| </p> |
| <p> |
| Any expression that evaluates to a parser may be used with the Kleene |
| Star. Keep in mind, though, that due to C++ operator precedence rules you |
| may need to put the expression in parentheses for complex expressions. |
| The Kleene Star is also known as a Kleene Closure, but we call it the |
| Star in most places. |
| </p> |
| <h3> |
| <b><a name="list_of_numbers"></a> Example #4 [ A Just Slightly Less Trivial Example</b> |
| ] </h3> |
| <p> |
| This example will create a parser that accepts a comma-delimited list of numbers and put the numbers in a vector. |
| </p> |
| <h4><strong> Step 1. Create the parser</strong></h4> |
| <pre><code><font color="#000000"> </font></code><code><span class= |
| "identifier">real_p</span> <span class= |
| "special">>></span> <span class="special">*(</span><span class= |
| "identifier">ch_p</span><span class="special">(</span><span class= |
| "literal">','</span><span class="special">)</span> <span class= |
| "special">>></span> <span class= |
| "identifier">real_p</span><span class="special">)</span></code> |
| </pre> |
| <p> |
| Notice <tt>ch_p(',')</tt>. It is a literal character parser that can |
| recognize the comma <tt>','</tt>. In this case, the Kleene Star is |
| modifying a more complex parser, namely, the one generated by the |
| expression: |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><span class= |
| "special">(</span><span class="identifier">ch_p</span><span class= |
| "special">(</span><span class="literal">','</span><span class= |
| "special">)</span> <span class="special">>></span> <span class= |
| "identifier">real_p</span><span class="special">)</span></code> |
| </pre> |
| <p> |
| Note that this is a case where the parentheses are necessary. The Kleene |
| star encloses the complete expression above. |
| </p> |
| <h4> |
| <b><strong>Step 2. </strong>Using a Parser (now that it's created)</b></h4> |
| <p> |
| Now that we have created a parser, how do we use it? Like the result of |
| any C++ temporary object, we can either store it in a variable, or call |
| functions directly on it. |
| </p> |
| <p> |
| We'll gloss over some low-level C++ details and just get to the good |
| stuff. |
| </p> |
| <p> |
| If <b><tt>r</tt></b> is a rule (don't worry about what rules exactly are |
| for now. This will be discussed later. Suffice it to say that the rule is |
| a placeholder variable that can hold a parser), then we store the parser |
| as a rule like this: |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= |
| "identifier">r</span> <span class="special">=</span> <span class= |
| "identifier">real_p</span> <span class= |
| "special">>> *(</span><span class= |
| "identifier">ch_p</span><span class="special">(</span><span class= |
| "literal">','</span><span class="special">) >></span> <span class= |
| "identifier">real_p</span><span class="special">);</span></font></code> |
| </pre> |
| <p> |
| Not too exciting, just an assignment like any other C++ expression you've |
| used for years. The cool thing about storing a parser in a rule is this: |
| rules are parsers, and now you can refer to it <b>by name</b>. (In this |
| case the name is <tt><b>r</b></tt>). Notice that this is now a full |
| assignment expression, thus we terminate it with a semicolon, |
| "<tt>;</tt>". |
| </p> |
| <p> |
| That's it. We're done with defining the parser. So the next step is now |
| invoking this parser to do its work. There are a couple of ways to do |
| this. For now, we shall use the free <tt>parse</tt> function that takes |
| in a <tt>char const*</tt>. The function accepts three arguments: |
| </p> |
| <blockquote> |
| <p> |
| <img src="theme/bullet.gif" width="12" height="12"> The null-terminated |
| <tt>const char*</tt> input<br> |
| <img src="theme/bullet.gif" width="12" height="12"> The parser |
| object<br> |
| <img src="theme/bullet.gif" width="12" height="12"> Another parser |
| called the <b>skip parser</b> |
| </p> |
| </blockquote> |
| <p> |
| In our example, we wish to skip spaces and tabs. Another parser named |
| <tt>space_p</tt> is included in Spirit's repertoire of predefined |
| parsers. It is a very simple parser that simply recognizes whitespace. We |
| shall use <tt>space_p</tt> as our skip parser. The skip parser is the one |
| responsible for skipping characters in between parser elements such as |
| the <tt>real_p</tt> and the <tt>ch_p</tt>. |
| </p> |
| <p> |
| Ok, so now let's parse! |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= |
| "identifier">r</span> <span class="special">=</span> <span class= |
| "identifier">real_p</span> <span class= |
| "special">>></span> <span class="special">*(</span><span class= |
| "identifier">ch_p</span><span class="special">(</span><span class= |
| "literal">','</span><span class="special">)</span> <span class= |
| "special">>></span> <span class= |
| "identifier">real_p</span><span class="special">); |
| </span> <span class="identifier"> parse</span><span class= |
| "special">(</span><span class="identifier">str</span><span class= |
| "special">,</span> <span class="identifier">r</span><span class= |
| "special">,</span> <span class="identifier">space_p</span><span class= |
| "special">)</span> <span class= |
| "comment">// Not a full statement yet, patience...</span></font></code> |
| </pre> |
| <p> |
| The parse function returns an object (called <tt>parse_info</tt>) that |
| holds, among other things, the result of the parse. In this example, we |
| need to know: |
| </p> |
| <blockquote> |
| <p> |
| <img src="theme/bullet.gif" width="12" height="12"> Did the parser |
| successfully recognize the input <tt>str</tt>?<br> |
| <img src="theme/bullet.gif" width="12" height="12"> Did the parser |
| <b>fully</b> parse and consume the input up to its end? |
| </p> |
| </blockquote> |
| <p> |
| To get a complete picture of what we have so far, let us also wrap this |
| parser inside a function: |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= |
| "keyword">bool |
| </span> <span class="identifier"> parse_numbers</span><span class= |
| "special">(</span><span class="keyword">char</span> <span class= |
| "keyword">const</span><span class="special">*</span> <span class= |
| "identifier">str</span><span class="special">) |
| { |
| </span> <span class="keyword"> return</span> <span class= |
| "identifier">parse</span><span class="special">(</span><span class= |
| "identifier">str</span><span class="special">,</span> <span class= |
| "identifier">real_p</span> <span class= |
| "special">>></span> <span class="special">*(</span><span class= |
| "literal">','</span> <span class="special">>></span> <span class= |
| "identifier">real_p</span><span class="special">),</span> <span class= |
| "identifier">space_p</span><span class="special">).</span><span class= |
| "identifier">full</span><span class="special">; |
| }</span></font></code> |
| </pre> |
| <p> |
| Note in this case we dropped the named rule and inlined the parser |
| directly in the call to parse. Upon calling parse, the expression |
| evaluates into a temporary, unnamed parser which is passed into the |
| parse() function, used, and then destroyed. |
| </p> |
| <table border="0" width="80%" align="center"> |
| <tr> |
| <td class="note_box"> |
| <img src="theme/note.gif" width="16" height="16"><b>char and wchar_t |
| operands</b><br> |
| <br> |
| The careful reader may notice that the parser expression has |
| <tt class="quotes">','</tt> instead of <tt>ch_p(',')</tt> as the |
| previous examples did. This is ok due to C++ syntax rules of |
| conversion. There are <tt>>></tt> operators that are overloaded |
| to accept a <tt>char</tt> or <tt>wchar_t</tt> argument on its left or |
| right (but not both). An operator may be overloaded if at least one |
| of its parameters is a user-defined type. In this case, the |
| <tt>real_p</tt> is the 2nd argument to <tt>operator<span class= |
| "operators">>></span></tt>, and so the proper overload of |
| <tt class="operators">>></tt> is used, converting |
| <tt class="quotes">','</tt> into a character literal parser.<br> |
| <br> |
| The problem with omiting the <tt>ch_p</tt> call should be obvious: |
| <tt>'a' >> 'b'</tt> is <b>not</b> a spirit parser, it is a |
| numeric expression, right-shifting the ASCII (or another encoding) |
| value of <tt class="quotes">'a'</tt> by the ASCII value of |
| <tt class="quotes">'b'</tt>. However, both <tt>ch_p('a') >> |
| 'b'</tt> and <tt>'a' >> ch_p('b')</tt> are Spirit sequence |
| parsers for the letter <tt class="quotes">'a'</tt> followed by |
| <tt class="quotes">'b'</tt>. You'll get used to it, sooner or |
| later. |
| </td> |
| </tr> |
| </table> |
| <p> |
| Take note that the object returned from the parse function has a member |
| called <tt>full</tt> which returns true if both of our requirements above |
| are met (i.e. the parser fully parsed the input). |
| </p> |
| <h4> |
| <b> Step 3. Semantic Actions</b></h4> |
| <p> |
| Our parser above is really nothing but a recognizer. It answers the |
| question <i class="quotes">"did the input match our grammar?"</i>, but it |
| does not remember any data, nor does it perform any side effects. |
| Remember: we want to put the parsed numbers into a vector. This is done |
| in an <b>action</b> that is linked to a particular parser. For example, |
| whenever we parse a real number, we wish to store the parsed number after |
| a successful match. We now wish to extract information from the parser. |
| Semantic actions do this. Semantic actions may be attached to any point |
| in the grammar specification. These actions are C++ functions or functors |
| that are called whenever a part of the parser successfully recognizes a |
| portion of the input. Say you have a parser <b>P</b>, and a C++ function |
| <b>F</b>, you can make the parser call <b>F</b> whenever it matches an |
| input by attaching <b>F</b>: |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= |
| "identifier">P</span><span class="special">[&</span><span class= |
| "identifier">F</span><span class="special">]</span></font></code> |
| </pre> |
| <p> |
| Or if <b>F</b> is a function object (a functor): |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= |
| "identifier">P</span><span class="special">[</span><span class= |
| "identifier">F</span><span class="special">]</span></font></code> |
| </pre> |
| <p> |
| The function/functor signature depends on the type of the parser to which |
| it is attached. The parser <tt>real_p</tt> passes a single argument: the |
| parsed number. Thus, if we were to attach a function <b>F</b> to |
| <tt>real_p</tt>, we need <b>F</b> to be declared as: |
| </p> |
| |
| <pre><code> </code><code><span class= |
| "keyword">void</span> <span class="identifier">F</span><span class= |
| "special">(</span><span class="keyword">double</span> <span class= |
| "identifier">n</span><span class="special">);</span></code></pre> |
| <p> |
| For our example however, again, we can take advantage of some predefined |
| semantic functors and functor generators (<img src="theme/lens.gif" |
| width="15" height="16"> A functor generator is a function that returns |
| a functor). For our purpose, Spirit has a functor generator |
| <tt>push_back_a(c)</tt>. In brief, this semantic action, when called, |
| <b>appends</b> the parsed value it receives from the parser it is |
| attached to, to the container <tt>c</tt>. |
| </p> |
| <p> |
| Finally, here is our complete comma-separated list parser: |
| </p> |
| |
| <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class= |
| "keyword">bool |
| </span> <span class="identifier">parse_numbers</span><span class= |
| "special">(</span><span class="keyword">char</span> <span class= |
| "keyword">const</span><span class="special">*</span> <span class= |
| "identifier">str</span><span class="special">,</span> <span class= |
| "identifier">vector</span><span class="special"><</span><span class= |
| "keyword">double</span><span class= |
| "special">>&</span> <span class="identifier">v</span><span class= |
| "special">) |
| { |
| </span> <span class="keyword">return</span> <span class= |
| "identifier">parse</span><span class="special">(</span><span class= |
| "identifier">str</span><span class="special">, |
| |
| </span> <span class="comment"> // Begin grammar |
| </span> <span class="special"> ( |
| </span> <span class="identifier">real_p</span><span class= |
| "special">[</span><span class="identifier">push_back_a</span><span class= |
| "special">(</span><span class="identifier">v</span><span class= |
| "special">)]</span> <span class="special">>></span> <span class= |
| "special">*(</span><span class="literal">','</span> <span class= |
| "special">>></span> <span class= |
| "identifier">real_p</span><span class="special">[</span><span class= |
| "identifier">push_back_a</span><span class="special">(</span><span class= |
| "identifier">v</span><span class="special">)]) |
| ) |
| </span> <span class="special"> , |
| </span> <span class="comment"> // End grammar |
| |
| </span> <span class="identifier"> space_p</span><span class= |
| "special">).</span><span class="identifier">full</span><span class="special">; |
| }</span></font></code> |
| </pre> |
| <p> |
| This is the same parser as above. This time with appropriate semantic |
| actions attached to strategic places to extract the parsed numbers and |
| stuff them in the vector <tt>v</tt>. The parse_numbers function returns |
| true when successful. |
| </p> |
| <p> |
| <img src="theme/lens.gif" width="15" height="16"> The full source code |
| can be <a href="../example/fundamental/number_list.cpp">viewed here</a>. |
| This is part of the Spirit distribution. |
| </p> |
| <table border="0"> |
| <tr> |
| <td width="10"></td> |
| <td width="30"> |
| <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a> |
| </td> |
| <td width="30"> |
| <a href="introduction.html"><img src="theme/l_arr.gif" border="0"> |
| </a> |
| </td> |
| <td width="30"> |
| <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0"> |
| </a> |
| </td> |
| </tr> |
| </table><br> |
| <hr size="1"> |
| <p class="copyright"> |
| Copyright © 1998-2003 Joel de Guzman<br> |
| Copyright © 2002 Chris Uzdavinis<br> |
| <br> |
| <font size="2">Use, modification and distribution is subject to the |
| Boost Software License, Version 1.0. (See accompanying file |
| LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)</font> |
| </p> |
| <blockquote> |
| |
| </blockquote> |
| </body> |
| </html> |