boost_1_45_0/libs/spirit/doc/introduction.qbk - nest-learning-thermostat/5.0/boost - Git at Google

 [/==============================================================================
     Copyright (C) 2001-2010 Joel de Guzman
     Copyright (C) 2001-2010 Hartmut Kaiser

     Distributed under the Boost Software License, Version 1.0. (See accompanying
     file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
 ===============================================================================/]

 [section Introduction]

 Boost Spirit is an object-oriented, recursive-descent parser and
 output generation library for C++. It allows you to write grammars and
 format descriptions using a format similar to Extended Backus Naur
 Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
 ISO-EBNF]] directly in C++. These inline grammar
 specifications can mix freely with other C++ code and, thanks to the
 generative power of C++ templates, are immediately executable.  In
 retrospect, conventional compiler-compilers or parser-generators have
 to perform an additional translation step from the source EBNF code to
 C or C++ code.

 The syntax and semantics of the libraries' API directly form domain-specific
 embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
 user:

 * one for creating parser grammars,
 * one for the specification of the required tokens to be used for parsing,
 * and one for the description of the required output formats.

 Since the target input grammars and output formats are written entirely in C++
 we do not need any separate tools to compile, preprocess or integrate those
 into the build process. __spirit__ allows seamless integration of the parsing
 and output generation process with other C++ code. This often allows for
 simpler and more efficient code.

 Both the created parsers and generators are fully attributed, which allows you
 to easily build and handle hierarchical data structures in memory. These data
 structures resemble the structure of the input data and can directly be used
 to generate arbitrarily-formatted output.

 The [link spirit.spiritstructure figure] below depicts the overall structure
 of the Boost Spirit library. The library consists of 4 major parts:

 * __classic__: This is the almost-unchanged code base taken from the
   former Boost Spirit V1.8 distribution. It has been moved into the namespace
   boost::spirit::classic. A special compatibility layer has been added to
   ensure complete compatibility with existing code using Spirit V1.8.
 * __qi__: This is the parser library allowing you to build recursive
   descent parsers. The exposed domain-specific language can be used to describe
   the grammars to implement, and the rules for storing the parsed information.
 * __lex__: This is the library usable to create tokenizers (lexers). The
   domain-specific language exposed by __lex__ allows you to define regular
   expressions used to match tokens (create token definitions), associate these
   regular expressions with code to be executed whenever they are matched, and
   to add the token definitions to the lexical analyzer.
 * __karma__: This is the generator library allowing you to create code for
   recursive descent, data type-driven output formatting. The exposed
   domain-specific language is almost equivalent to the parser description language
   used in __qi__, except that it is used to describe the required output
   format to generate from a given data structure.

 [fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]


 The three components, __qi__, __karma__ and __lex__, are designed to be used
 either stand alone, or together. The general methodology is to use the token
 sequence generated by __lex__ as the input for a parser generated by __qi__.
 On the opposite side of the equation, the hierarchical data structures generated
 by __qi__ are used for the output generators created using __karma__.
 However, there is nothing to stop you from using any of these components all
 by themselves.

 The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
 some input being converted to some internal representation. After some
 (optional) transformation these data are converted back into some different,
 external representation. The picture highlights Spirit's place in this data
 transformation flow.

 [fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]

 [heading A Quick Overview of Parsing with __qi__]

 __qi__ is Spirit's sublibrary dealing with generating parsers based on a given
 target grammar (essentially a format description of the input data to read).

 A simple EBNF grammar snippet:

     group       ::= '(' expression ')'
     factor      ::= integer | group
     term        ::= factor (('*' factor) | ('/' factor))*
     expression  ::= term (('+' term) | ('-' term))*

 is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
 code snippet:

     group       = '(' >> expression >> ')';
     factor      = integer | group;
     term        = factor >> *(('*' >> factor) | ('/' >> factor));
     expression  = term >> *(('+' >> term) | ('-' >> term));

 Through the magic of expression templates, this is perfectly valid and
 executable C++ code. The production rule `expression` is, in fact, an object that
 has a member function `parse` that does the work given a source code written in
 the grammar that we have just declared. Yes, it's a calculator. We shall
 simplify for now by skipping the type declarations and the definition of the
 rule `integer` invoked by `factor`. Now, the production rule `expression` in our
 grammar specification, traditionally called the `start` symbol, can recognize
 inputs such as:

     12345
     -12345
     +12345
     1 + 2
     1 * 2
     1/2 + 3/4
     1 + 2 + 3 + 4
     1 * 2 * 3 * 4
     (1 + 2) * (3 + 4)
     (-1 + 2) * (3 + -4)
     1 + ((6 * 200) - 20) / 6
     (1 + (2 + (3 + (4 + 5))))

 Certainly we have modified the original EBNF syntax. This is done to
 conform to C++ syntax rules. Most notably we see the abundance of
 shift >> operators. Since there are no 'empty' operators in C++, it is
 simply not possible to write something like:

     a b

 as seen in math syntax, for example, to mean multiplication or, in our case,
 as seen in EBNF syntax to mean sequencing (b should follow a). __qi__
 uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
 with arrows pointing to the right, to mean "is followed by". Thus we write:

     a >> b

 The alternative operator `|` and the parentheses `()` remain as is. The
 assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
 the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a
 prefix. Instead of:

     a* //... in EBNF syntax,

 we write:

     *a //... in Spirit.

 since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
 rule with the ubiquitous semi-colon, `;`.


 [heading A Quick Overview of Output Generation with __karma__]

 Spirit not only allows you to describe the structure of the input, it also enables
 the specification of the output format for your data in a similar way, and based
 on a single syntax and compatible semantics.

 Let's assume we need to generate a textual representation from a simple data
 structure such as a `std::vector<int>`. Conventional code probably would look like:

     std::vector<int> v (initialize_and_fill());
     std::vector<int>::iterator end = v.end();
     for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
         std::cout << *it << std::endl;

 which is not very flexible and quite difficult to maintain when it comes to
 changing the required output format. Spirit's sublibrary /Karma/ allows you to
 specify output formats for arbitrary data structures in a very flexible way.
 The following snippet is the /Karma/ format description used to create the
 same output as the traditional code above:

     *(int_ << eol)

 Here are some more examples of format descriptions for different output
 representations of the same `std::vector<int>`:

 [table Different output formats for `std::vector<int>`
     [ [Format]                         [Example]          [Description] ]
     [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`]      [Comma separated list of integers] ]
     [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`]  [Comma separated list of integers in parenthesis] ]
     [ [`*hex`]                         [`18a`]            [A list of hexadecimal numbers] ]
     [ [`*(double_ << ',')`]            [`1.0,8.0,10.0,`]  [A list of floating point numbers] ]
 ]

 We will see later in this documentation how it is possible to avoid printing
 the trailing `','`.

 Overall, the syntax is similar to __qi__ with the exception that we use the `<<`
 operator for output concatenation. This should be easy to understand as it
 follows the conventions used in the Standard's I/O streams.

 Another important feature of __karma__ allows you to fully decouple the data
 type from the output format. You can use the same output format with different
 data types as long as these conform conceptually. The next table gives some
 related examples.

 [table Different data types usable with the output format `*(int_ << eol)`
     [ [Data type]                 [Description] ]
     [ [`int i[4]`]                [C style arrays] ]
     [ [`std::vector<int>`]        [Standard vector] ]
     [ [`std::list<int>`]          [Standard list] ]
     [ [`boost::array<long, 20>`]  [Boost array] ]
 ]

 [endsect]
	[/==============================================================================
	Copyright (C) 2001-2010 Joel de Guzman
	Copyright (C) 2001-2010 Hartmut Kaiser

	Distributed under the Boost Software License, Version 1.0. (See accompanying
	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	===============================================================================/]

	[section Introduction]

	Boost Spirit is an object-oriented, recursive-descent parser and
	output generation library for C++. It allows you to write grammars and
	format descriptions using a format similar to Extended Backus Naur
	Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
	ISO-EBNF]] directly in C++. These inline grammar
	specifications can mix freely with other C++ code and, thanks to the
	generative power of C++ templates, are immediately executable. In
	retrospect, conventional compiler-compilers or parser-generators have
	to perform an additional translation step from the source EBNF code to
	C or C++ code.

	The syntax and semantics of the libraries' API directly form domain-specific
	embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
	user:

	* one for creating parser grammars,
	* one for the specification of the required tokens to be used for parsing,
	* and one for the description of the required output formats.

	Since the target input grammars and output formats are written entirely in C++
	we do not need any separate tools to compile, preprocess or integrate those
	into the build process. __spirit__ allows seamless integration of the parsing
	and output generation process with other C++ code. This often allows for
	simpler and more efficient code.

	Both the created parsers and generators are fully attributed, which allows you
	to easily build and handle hierarchical data structures in memory. These data
	structures resemble the structure of the input data and can directly be used
	to generate arbitrarily-formatted output.

	The [link spirit.spiritstructure figure] below depicts the overall structure
	of the Boost Spirit library. The library consists of 4 major parts:

	* __classic__: This is the almost-unchanged code base taken from the
	former Boost Spirit V1.8 distribution. It has been moved into the namespace
	boost::spirit::classic. A special compatibility layer has been added to
	ensure complete compatibility with existing code using Spirit V1.8.
	* __qi__: This is the parser library allowing you to build recursive
	descent parsers. The exposed domain-specific language can be used to describe
	the grammars to implement, and the rules for storing the parsed information.
	* __lex__: This is the library usable to create tokenizers (lexers). The
	domain-specific language exposed by __lex__ allows you to define regular
	expressions used to match tokens (create token definitions), associate these
	regular expressions with code to be executed whenever they are matched, and
	to add the token definitions to the lexical analyzer.
	* __karma__: This is the generator library allowing you to create code for
	recursive descent, data type-driven output formatting. The exposed
	domain-specific language is almost equivalent to the parser description language
	used in __qi__, except that it is used to describe the required output
	format to generate from a given data structure.

	[fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]


	The three components, __qi__, __karma__ and __lex__, are designed to be used
	either stand alone, or together. The general methodology is to use the token
	sequence generated by __lex__ as the input for a parser generated by __qi__.
	On the opposite side of the equation, the hierarchical data structures generated
	by __qi__ are used for the output generators created using __karma__.
	However, there is nothing to stop you from using any of these components all
	by themselves.

	The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
	some input being converted to some internal representation. After some
	(optional) transformation these data are converted back into some different,
	external representation. The picture highlights Spirit's place in this data
	transformation flow.

	[fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]

	[heading A Quick Overview of Parsing with __qi__]

	__qi__ is Spirit's sublibrary dealing with generating parsers based on a given
	target grammar (essentially a format description of the input data to read).

	A simple EBNF grammar snippet:

	group ::= '(' expression ')'
	factor ::= integer \| group
	term ::= factor (('' factor) \| ('/' factor))
	expression ::= term (('+' term) \| ('-' term))*

	is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
	code snippet:

	group = '(' >> expression >> ')';
	factor = integer \| group;
	term = factor >> (('' >> factor) \| ('/' >> factor));
	expression = term >> *(('+' >> term) \| ('-' >> term));

	Through the magic of expression templates, this is perfectly valid and
	executable C++ code. The production rule `expression` is, in fact, an object that
	has a member function `parse` that does the work given a source code written in
	the grammar that we have just declared. Yes, it's a calculator. We shall
	simplify for now by skipping the type declarations and the definition of the
	rule `integer` invoked by `factor`. Now, the production rule `expression` in our
	grammar specification, traditionally called the `start` symbol, can recognize
	inputs such as:

	12345
	-12345
	+12345
	1 + 2
	1 * 2
	1/2 + 3/4
	1 + 2 + 3 + 4
	1 * 2 * 3 * 4
	(1 + 2) * (3 + 4)
	(-1 + 2) * (3 + -4)
	1 + ((6 * 200) - 20) / 6
	(1 + (2 + (3 + (4 + 5))))

	Certainly we have modified the original EBNF syntax. This is done to
	conform to C++ syntax rules. Most notably we see the abundance of
	shift >> operators. Since there are no 'empty' operators in C++, it is
	simply not possible to write something like:

	a b

	as seen in math syntax, for example, to mean multiplication or, in our case,
	as seen in EBNF syntax to mean sequencing (b should follow a). __qi__
	uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
	with arrows pointing to the right, to mean "is followed by". Thus we write:

	a >> b

	The alternative operator `\|` and the parentheses `()` remain as is. The
	assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
	the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a
	prefix. Instead of:

	a* //... in EBNF syntax,

	we write:

	*a //... in Spirit.

	since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
	rule with the ubiquitous semi-colon, `;`.


	[heading A Quick Overview of Output Generation with __karma__]

	Spirit not only allows you to describe the structure of the input, it also enables
	the specification of the output format for your data in a similar way, and based
	on a single syntax and compatible semantics.

	Let's assume we need to generate a textual representation from a simple data
	structure such as a `std::vector<int>`. Conventional code probably would look like:

	std::vector<int> v (initialize_and_fill());
	std::vector<int>::iterator end = v.end();
	for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
	std::cout << *it << std::endl;

	which is not very flexible and quite difficult to maintain when it comes to
	changing the required output format. Spirit's sublibrary /Karma/ allows you to
	specify output formats for arbitrary data structures in a very flexible way.
	The following snippet is the /Karma/ format description used to create the
	same output as the traditional code above:

	*(int_ << eol)

	Here are some more examples of format descriptions for different output
	representations of the same `std::vector<int>`:

	[table Different output formats for `std::vector<int>`
	[ [Format] [Example] [Description] ]
	[ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
	[ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`] [Comma separated list of integers in parenthesis] ]
	[ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
	[ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
	]

	We will see later in this documentation how it is possible to avoid printing
	the trailing `','`.

	Overall, the syntax is similar to __qi__ with the exception that we use the `<<`
	operator for output concatenation. This should be easy to understand as it
	follows the conventions used in the Standard's I/O streams.

	Another important feature of __karma__ allows you to fully decouple the data
	type from the output format. You can use the same output format with different
	data types as long as these conform conceptually. The next table gives some
	related examples.

	[table Different data types usable with the output format `*(int_ << eol)`
	[ [Data type] [Description] ]
	[ [`int i[4]`] [C style arrays] ]
	[ [`std::vector<int>`] [Standard vector] ]
	[ [`std::list<int>`] [Standard list] ]
	[ [`boost::array<long, 20>`] [Boost array] ]
	]

	[endsect]