| [/============================================================================== |
| Copyright (C) 2001-2010 Joel de Guzman |
| Copyright (C) 2001-2010 Hartmut Kaiser |
| Copyright (C) 2009 Andreas Haberstroh? |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| ===============================================================================/] |
| |
| [section:indepth In Depth] |
| |
| [section:parsers_indepth Parsers in Depth] |
| |
| This section is not for the faint of heart. In here, are distilled the inner |
| workings of __qi__ parsers, using real code from the __spirit__ library as |
| examples. On the other hand, here is no reason to fear reading on, though. |
| We tried to explain things step by step while highlighting the important |
| insights. |
| |
| The `__parser_concept__` class is the base class for all parsers. |
| |
| [import ../../../../boost/spirit/home/qi/parser.hpp] |
| [parser_base_parser] |
| |
| The `__parser_concept__` class does not really know how to parse anything but |
| instead relies on the template parameter `Derived` to do the actual parsing. |
| This technique is known as the "Curiously Recurring Template Pattern" in template |
| meta-programming circles. This inheritance strategy gives us the power of |
| polymorphism without the virtual function overhead. In essence this is a way to |
| implement compile time polymorphism. |
| |
| The Derived parsers, `__primitive_parser_concept__`, `__unary_parser_concept__`, |
| `__binary_parser_concept__` and `__nary_parser_concept__` provide the necessary |
| facilities for parser detection, introspection, transformation and visitation. |
| |
| Derived parsers must support the following: |
| |
| [variablelist bool parse(f, l, context, skip, attr) |
| [[`f`, `l`] [first/last iterator pair]] |
| [[`context`] [enclosing rule context (can be unused_type)]] |
| [[`skip`] [skipper (can be unused_type)]] |
| [[`attr`] [attribute (can be unused_type)]] |
| ] |
| |
| The /parse/ is the main parser entry point. /skipper/ can be an `unused_type`. |
| It's a type used every where in __spirit__ to signify "don't-care". There |
| is an overload for /skip/ for `unused_type` that is simply a no-op. |
| That way, we do not have to write multiple parse functions for |
| phrase and character level parsing. |
| |
| Here are the basic rules for parsing: |
| |
| * The parser returns `true` if successful, `false` otherwise. |
| * If successful, `first` is incremented N number of times, where N |
| is the number of characters parsed. N can be zero --an empty (epsilon) |
| match. |
| * If successful, the parsed attribute is assigned to /attr/ |
| * If unsuccessful, `first` is reset to its position before entering |
| the parser function. /attr/ is untouched. |
| |
| [variablelist void what(context) |
| [[`context`] [enclosing rule context (can be `unused_type`)]] |
| ] |
| |
| The /what/ function should be obvious. It provides some information |
| about ["what] the parser is. It is used as a debugging aid, for |
| example. |
| |
| [variablelist P::template attribute<context>::type |
| [[`P`] [a parser type]] |
| [[`context`] [A context type (can be unused_type)]] |
| ] |
| |
| The /attribute/ metafunction returns the expected attribute type |
| of the parser. In some cases, this is context dependent. |
| |
| In this section, we will dissect two parser types: |
| |
| [variablelist Parsers |
| [[`__primitive_parser_concept__`] [A parser for primitive data (e.g. integer parsing).]] |
| [[`__unary_parser_concept__`] [A parser that has single subject (e.g. kleene star).]] |
| ] |
| |
| [/------------------------------------------------------------------------------] |
| [heading Primitive Parsers] |
| |
| For our dissection study, we will use a __spirit__ primitive, the `int_parser` |
| in the boost::spirit::qi namespace. |
| |
| [import ../../../../boost/spirit/home/qi/numeric/int.hpp] |
| [primitive_parsers_int] |
| |
| The `int_parser` is derived from a `__primitive_parser_concept__<Derived>`, which |
| in turn derives from `parser<Derived>`. Therefore, it supports the following |
| requirements: |
| |
| * The `parse` member function |
| * The `what` member function |
| * The nested `attribute` metafunction |
| |
| /parse/ is the main entry point. For primitive parsers, our first thing to do is |
| call: |
| |
| `` |
| qi::skip(first, last, skipper); |
| `` |
| |
| to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. The |
| actual parsing code is placed in `extract_int<T, Radix, MinDigits, |
| MaxDigits>::call(first, last, attr);` |
| |
| This simple no-frills protocol is one of the reasons why __spirit__ is |
| fast. If you know the internals of __classic__ and perhaps |
| even wrote some parsers with it, this simple __spirit__ mechanism |
| is a joy to work with. There are no scanners and all that crap. |
| |
| The /what/ function just tells us that it is an integer parser. Simple. |
| |
| The /attribute/ metafunction returns the T template parameter. We associate the |
| `int_parser` to some placeholders for `short_`, `int_`, `long_` and `long_long` |
| types. But, first, we enable these placeholders in namespace boost::spirit: |
| |
| [primitive_parsers_enable_short_] |
| [primitive_parsers_enable_int_] |
| [primitive_parsers_enable_long_] |
| [primitive_parsers_enable_long_long_] |
| |
| Notice that `int_parser` is placed in the namespace boost::spirit::qi |
| while these /enablers/ are in namespace boost::spirit. The reason is |
| that these placeholders are shared by other __spirit__ /domains/. __qi__, |
| the parser is one domain. __karma__, the generator is another domain. |
| Other parser technologies may be developed and placed in yet |
| another domain. Yet, all these can potentially share the same |
| placeholders for interoperability. The interpretation of these |
| placeholders is domain-specific. |
| |
| Now that we enabled the placeholders, we have to write generators |
| for them. The make_xxx stuff (in boost::spirit::qi namespace): |
| |
| [primitive_parsers_make_int] |
| |
| This one above is our main generator. It's a simple function object |
| with 2 (unused) arguments. These arguments are |
| |
| # The actual terminal value obtained by proto. In this case, either |
| a short_, int_, long_ or long_long. We don't care about this. |
| |
| # Modifiers. We also don't care about this. This allows directives |
| such as `no_case[p]` to pass information to inner parser nodes. |
| We'll see how that works later. |
| |
| Now: |
| |
| [primitive_parsers_short_] |
| [primitive_parsers_int_] |
| [primitive_parsers_long_] |
| [primitive_parsers_long_long_] |
| |
| These, specialize `qi:make_primitive` for specific tags. They all |
| inherit from `make_int` which does the actual work. |
| |
| [heading Composite Parsers] |
| |
| Let me present the kleene star (also in namespace spirit::qi): |
| |
| [import ../../../../boost/spirit/home/qi/operator/kleene.hpp] |
| [composite_parsers_kleene] |
| |
| Looks similar in form to its primitive cousin, the `int_parser`. And, again, it |
| has the same basic ingredients required by `Derived`. |
| |
| * The nested attribute metafunction |
| * The parse member function |
| * The what member function |
| |
| kleene is a composite parser. It is a parser that composes another |
| parser, its ["subject]. It is a `__unary_parser_concept__` and subclasses from it. |
| Like `__primitive_parser_concept__`, `__unary_parser_concept__<Derived>` derives |
| from `parser<Derived>`. |
| |
| unary_parser<Derived>, has these expression requirements on Derived: |
| |
| * p.subject -> subject parser ( ['p] is a __unary_parser_concept__ parser.) |
| * P::subject_type -> subject parser type ( ['P] is a __unary_parser_concept__ type.) |
| |
| /parse/ is the main parser entry point. Since this is not a primitive |
| parser, we do not need to call `qi::skip(first, last, skipper)`. The |
| ['subject], if it is a primitive, will do the pre-skip. If if it is |
| another composite parser, it will eventually call a primitive parser |
| somewhere down the line which will do the pre-skip. This makes it a |
| lot more efficient than __classic__. __classic__ puts the skipping business |
| into the so-called "scanner" which blindly attempts a pre-skip |
| every time we increment the iterator. |
| |
| What is the /attribute/ of the kleene? In general, it is a `std::vector<T>` |
| where `T` is the attribute of the subject. There is a special case though. |
| If `T` is an `unused_type`, then the attribute of kleene is also `unused_type`. |
| `traits::build_std_vector` takes care of that minor detail. |
| |
| So, let's parse. First, we need to provide a local attribute of for |
| the subject: |
| |
| `` |
| typename traits::attribute_of<Subject, Context>::type val; |
| `` |
| |
| `traits::attribute_of<Subject, Context>` simply calls the subject's |
| `struct attribute<Context>` nested metafunction. |
| |
| /val/ starts out default initialized. This val is the one we'll |
| pass to the subject's parse function. |
| |
| The kleene repeats indefinitely while the subject parser is |
| successful. On each successful parse, we `push_back` the parsed |
| attribute to the kleene's attribute, which is expected to be, |
| at the very least, compatible with a `std::vector`. In other words, |
| although we say that we want our attribute to be a `std::vector`, |
| we try to be more lenient than that. The caller of kleene's |
| parse may pass a different attribute type. For as long as it is |
| also a conforming STL container with `push_back`, we are ok. Here |
| is the kleene loop: |
| |
| `` |
| while (subject.parse(first, last, context, skipper, val)) |
| { |
| // push the parsed value into our attribute |
| traits::push_back(attr, val); |
| traits::clear(val); |
| } |
| return true; |
| `` |
| Take note that we didn't call attr.push_back(val). Instead, we |
| called a Spirit provided function: |
| |
| `` |
| traits::push_back(attr, val); |
| `` |
| |
| This is a recurring pattern. The reason why we do it this way is |
| because attr [*can] be `unused_type`. `traits::push_back` takes care |
| of that detail. The overload for unused_type is a no-op. Now, you |
| can imagine why __spirit__ is fast! The parsers are so simple and the |
| generated code is as efficient as a hand rolled loop. All these |
| parser compositions and recursive parse invocations are extensively |
| inlined by a modern C++ compiler. In the end, you get a tight loop |
| when you use the kleene. No more excess baggage. If the attribute |
| is unused, then there is no code generated for that. That's how |
| __spirit__ is designed. |
| |
| The /what/ function simply wraps the output of the subject in a |
| "kleene[" ... "]". |
| |
| Ok, now, like the `int_parser`, we have to hook our parser to the |
| _qi_ engine. Here's how we do it: |
| |
| First, we enable the prefix star operator. In proto, it's called |
| the "dereference": |
| |
| [composite_parsers_kleene_enable_] |
| |
| This is done in namespace `boost::spirit` like its friend, the `use_terminal` |
| specialization for our `int_parser`. Obviously, we use /use_operator/ to |
| enable the dereference for the qi::domain. |
| |
| Then, we need to write our generator (in namespace qi): |
| |
| [composite_parsers_kleene_generator] |
| |
| This essentially says; for all expressions of the form: `*p`, to build a kleene |
| parser. Elements is a __fusion__ sequence. For the kleene, which is a unary |
| operator, expect only one element in the sequence. That element is the subject |
| of the kleene. |
| |
| We still don't care about the Modifiers. We'll see how the modifiers is |
| all about when we get to deep directives. |
| |
| [endsect] |
| |
| [endsect] |