| [/============================================================================== |
| Copyright (C) 2001-2010 Joel de Guzman |
| Copyright (C) 2001-2010 Hartmut Kaiser |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| ===============================================================================/] |
| |
| [section String] |
| |
| This module includes parsers for strings. Currently, this module |
| includes the literal and string parsers and the symbol table. |
| |
| [heading Module Header] |
| |
| // forwards to <boost/spirit/home/qi/string.hpp> |
| #include <boost/spirit/include/qi_string.hpp> |
| |
| Also, see __include_structure__. |
| |
| [/------------------------------------------------------------------------------] |
| [section:string String (`string`, `lit`)] |
| |
| [heading Description] |
| |
| The `string` parser matches a string of characters. The `string` parser |
| is an implicit lexeme: the `skip` parser is not applied in between |
| characters of the string. The `string` parser has an associated |
| __char_encoding_namespace__. This is needed when doing basic operations |
| such as inhibiting case sensitivity. Examples: |
| |
| string("Hello") |
| string(L"Hello") |
| string(s) // s is a std::string |
| |
| `lit`, like `string`, also matches a string of characters. The main |
| difference is that `lit` does not synthesize an attribute. A plain |
| string like `"hello"` or a `std::basic_string` is equivalent to a `lit`. |
| Examples: |
| |
| "Hello" |
| lit("Hello") |
| lit(L"Hello") |
| lit(s) // s is a std::string |
| |
| [heading Header] |
| |
| // forwards to <boost/spirit/home/qi/string/lit.hpp> |
| #include <boost/spirit/include/qi_lit.hpp> |
| |
| [heading Namespace] |
| |
| [table |
| [[Name]] |
| [[`boost::spirit::lit // alias: boost::spirit::qi::lit`]] |
| [[`ns::string`]] |
| ] |
| |
| In the table above, `ns` represents a __char_encoding_namespace__. |
| |
| [heading Model of] |
| |
| [:__primitive_parser_concept__] |
| |
| [variablelist Notation |
| [[`s`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__.]] |
| [[`ns`] [A __char_encoding_namespace__.]]] |
| |
| [heading Expression Semantics] |
| |
| Semantics of an expression is defined only where it differs from, or is |
| not defined in __primitive_parser_concept__. |
| |
| [table |
| [[Expression] [Semantics]] |
| [[`s`] [Create string parser |
| from a string, `s`.]] |
| [[`lit(s)`] [Create a string parser |
| from a string, `s`.]] |
| [[`ns::string(s)`] [Create a string parser with `ns` encoding |
| from a string, `s`.]] |
| ] |
| |
| [heading Attributes] |
| |
| [table |
| [[Expression] [Attribute]] |
| [[`s`] [__unused__]] |
| [[`lit(s)`] [__unused__]] |
| [[`ns::string(s)`] [`std::basic_string<T>` where `T` |
| is the underlying character type |
| of `s`.]] |
| ] |
| |
| [heading Complexity] |
| |
| [:O(N)] |
| |
| where `N` is the number of characters in the string to be parsed. |
| |
| [heading Example] |
| |
| [note The test harness for the example(s) below is presented in the |
| __qi_basics_examples__ section.] |
| |
| Some using declarations: |
| |
| [reference_using_declarations_lit_string] |
| |
| Basic literals: |
| |
| [reference_string_literals] |
| |
| From a `std::string` |
| |
| [reference_string_std_string] |
| |
| Lazy strings using __phoenix__ |
| |
| [reference_string_phoenix] |
| |
| [endsect] [/ lit/string] |
| |
| |
| [/------------------------------------------------------------------------------] |
| [section:symbols Symbols (`symbols`)] |
| |
| [heading Description] |
| |
| The class `symbols` implements a symbol table: an associative container |
| (or map) of key-value pairs where the keys are strings. The `symbols` |
| class can work efficiently with 8, 16, 32 and even 64 bit characters. |
| |
| Traditionally, symbol table management is maintained separately outside |
| the grammar through semantic actions. Contrary to standard practice, the |
| Spirit symbol table class `symbols` is-a parser, an instance of which may |
| be used anywhere in the grammar specification. It is an example of a |
| dynamic parser. A dynamic parser is characterized by its ability to |
| modify its behavior at run time. Initially, an empty symbols object |
| matches nothing. At any time, symbols may be added, thus, dynamically |
| altering its behavior. |
| |
| [heading Header] |
| |
| // forwards to <boost/spirit/home/qi/string/symbols.hpp> |
| #include <boost/spirit/include/qi_symbols.hpp> |
| |
| Also, see __include_structure__. |
| |
| [heading Namespace] |
| |
| [table |
| [[Name]] |
| [[`boost::spirit::qi::symbols`]] |
| [[`boost::spirit::qi::tst`]] |
| [[`boost::spirit::qi::tst_map`]] |
| ] |
| |
| [heading Synopsis] |
| |
| template <typename Char, typename T, typename Lookup> |
| struct symbols; |
| |
| [heading Template parameters] |
| |
| [table |
| [[Parameter] [Description] [Default]] |
| [[`Char`] [The character type |
| of the symbol strings.] [`char`]] |
| [[`T`] [The data type associated |
| with each symbol.] [__unused_type__]] |
| [[`Lookup`] [The symbol search |
| implementation] [`tst<Char, T>`]] |
| ] |
| |
| [heading Model of] |
| |
| [:__primitive_parser_concept__] |
| |
| [variablelist Notation |
| [[`Sym`] [A `symbols` type.]] |
| [[`Char`] [A character type.]] |
| [[`T`] [A data type.]] |
| [[`sym`, `sym2`][`symbols` objects.]] |
| [[`sseq`] [An __stl__ container of strings.]] |
| [[`dseq`] [An __stl__ container of data with `value_type` `T`.]] |
| [[`s1`...`sN`] [A __string__.]] |
| [[`d1`...`dN`] [Objects of type `T`.]] |
| [[`f`] [A callable function or function object.]] |
| [[`f`, `l`] [`ForwardIterator` first/last pair.]] |
| ] |
| |
| [heading Expression Semantics] |
| |
| Semantics of an expression is defined only where it differs from, or is not |
| defined in __primitive_parser_concept__. |
| |
| [table |
| [[Expression] [Semantics]] |
| [[`Sym()`] [Construct an empty symbols.]] |
| [[`Sym(sym2)`] [Copy construct a symbols from `sym2` (Another `symbols` object).]] |
| [[`Sym(sseq)`] [Construct symbols from `sseq` (An __stl__ container of strings).]] |
| [[`Sym(sseq, dseq)`] [Construct symbols from `sseq` and `dseq` |
| (An __stl__ container of strings and an __stl__ container of |
| data with `value_type` `T`).]] |
| [[`sym = sym2`] [Assign `sym2` to `sym`.]] |
| [[`sym = s1, s2, ..., sN`] [Assign one or more symbols (`s1`...`sN`) to `sym`.]] |
| [[`sym += s1, s2, ..., sN`] [Add one or more symbols (`s1`...`sN`) to `sym`.]] |
| [[`sym.add(s1)(s2)...(sN)`] [Add one or more symbols (`s1`...`sN`) to `sym`.]] |
| [[`sym.add(s1, d1)(s2, d2)...(sN, dN)`] |
| [Add one or more symbols (`s1`...`sN`) |
| with associated data (`d1`...`dN`) to `sym`.]] |
| [[`sym -= s1, s2, ..., sN`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]] |
| [[`sym.remove(s1)(s2)...(sN)`] [Remove one or more symbols (`s1`...`sN`) from `sym`.]] |
| [[`sym.clear()`] [Erase all of the symbols in `sym`.]] |
| [[`sym.at(s)`] [Return a reference to the object associated |
| with symbol, `s`. If `sym` does not already |
| contain such an object, `at` inserts the default |
| object `T()`.]] |
| [[`sym.find(s)`] [Return a pointer to the object associated |
| with symbol, `s`. If `sym` does not already |
| contain such an object, `find` returns a null |
| pointer.]] |
| [[`sym.prefix_find(f, l)`] [Return a pointer to the object associated |
| with longest symbol that matches the beginning |
| of the range `[f, l)`, and updates `f` to point |
| to one past the end of that match. If no symbol matches, |
| then return a null pointer, and `f` is unchanged.]] |
| [[`sym.for_each(f)`] [For each symbol in `sym`, `s`, a |
| `std::basic_string<Char>` with associated data, |
| `d`, an object of type `T`, invoke `f(s, d)`.]] |
| ] |
| |
| [heading Attributes] |
| |
| The attribute of `symbol<Char, T>` is `T`. |
| |
| [heading Complexity] |
| |
| The default implementation uses a Ternary Search Tree (TST) with |
| complexity: |
| |
| [:O(log n+k)] |
| |
| Where k is the length of the string to be searched in a TST with n |
| strings. |
| |
| TSTs are faster than hashing for many typical search problems especially |
| when the search interface is iterator based. TSTs are many times faster |
| than hash tables for unsuccessful searches since mismatches are |
| discovered earlier after examining only a few characters. Hash tables |
| always examine an entire key when searching. |
| |
| An alternative implementation uses a hybrid hash-map front end (for the |
| first character) plus a TST: `tst_map`. This gives us a complexity of |
| |
| [:O(1 + log n+k-1)] |
| |
| This is found to be significantly faster than plain TST, albeit with a |
| bit more memory usage requirements (each slot in the hash-map is a TST |
| node). If you require a lot of symbols to be searched, use the `tst_map` |
| implementation. This can be done by using `tst_map` as the third |
| template parameter to the symbols class: |
| |
| symbols<Char, T, tst_map<Char, T> > sym; |
| |
| [heading Example] |
| |
| [note The test harness for the example(s) below is presented in the |
| __qi_basics_examples__ section.] |
| |
| Some using declarations: |
| |
| [reference_using_declarations_symbols] |
| |
| Symbols with data: |
| |
| [reference_symbols_with_data] |
| |
| When `symbols` is used for case-insensitive parsing (in a __qi_no_case__ |
| directive), added symbol strings should be in lowercase. Symbol strings |
| containing one or more uppercase characters will not match any input |
| when symbols is used in a `no_case` directive. |
| |
| [reference_symbols_with_no_case] |
| |
| |
| [endsect] [/ symbols] |
| |
| [endsect] [/ String] |