| [/============================================================================== |
| Copyright (C) 2001-2010 Joel de Guzman |
| Copyright (C) 2001-2010 Hartmut Kaiser |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| ===============================================================================/] |
| [section Char] |
| |
| This module includes parsers for single characters. Currently, this |
| module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single |
| characters, ranges and character sets) and the encoding specific |
| character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.). |
| |
| [heading Module Header] |
| |
| // forwards to <boost/spirit/home/qi/char.hpp> |
| #include <boost/spirit/include/qi_char.hpp> |
| |
| Also, see __include_structure__. |
| |
| [/------------------------------------------------------------------------------] |
| [section:char Char (`char_`, `lit`)] |
| |
| [heading Description] |
| |
| The `char_` parser matches single characters. The `char_` parser has an |
| associated __char_encoding_namespace__. This is needed when doing basic |
| operations such as inhibiting case sensitivity and dealing with |
| character ranges. |
| |
| There are various forms of `char_`. |
| |
| [heading char_] |
| |
| The no argument form of `char_` matches any character in the associated |
| __char_encoding_namespace__. |
| |
| char_ // matches any character |
| |
| [heading char_(ch)] |
| |
| The single argument form of `char_` (with a character argument) matches |
| the supplied character. |
| |
| char_('x') // matches 'x' |
| char_(L'x') // matches L'x' |
| char_(x) // matches x (a char) |
| |
| [heading char_(first, last)] |
| |
| `char_` with two arguments, matches a range of characters. |
| |
| char_('a','z') // alphabetic characters |
| char_(L'0',L'9') // digits |
| |
| A range of characters is created from a low-high character pair. Such a |
| parser matches a single character that is in the range, including both |
| endpoints. Note, the first character must be /before/ the second, |
| according to the underlying __char_encoding_namespace__. |
| |
| Character mapping is inherently platform dependent. It is not guaranteed |
| in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we |
| purposely attach a specific __char_encoding_namespace__ (such as ASCII, |
| ISO-8859-1) to the `char_` parser to eliminate such ambiguities. |
| |
| [note *Sparse bit vectors* |
| |
| To accommodate 16/32 and 64 bit characters, the char-set statically |
| switches from a `std::bitset` implementation when the character type is |
| not greater than 8 bits, to a sparse bit/boolean set which uses a sorted |
| vector of disjoint ranges (`range_run`). The set is constructed from |
| ranges such that adjacent or overlapping ranges are coalesced. |
| |
| `range_runs` are very space-economical in situations where there are lots |
| of ranges and a few individual disjoint values. Searching is O(log n) |
| where n is the number of ranges.] |
| |
| [heading char_(def)] |
| |
| Lastly, when given a string (a plain C string, a `std::basic_string`, |
| etc.), the string is regarded as a char-set definition string following |
| a syntax that resembles posix style regular expression character sets |
| (except that double quotes delimit the set elements instead of square |
| brackets and there is no special negation ^ character). Examples: |
| |
| char_("a-zA-Z") // alphabetic characters |
| char_("0-9a-fA-F") // hexadecimal characters |
| char_("actgACTG") // DNA identifiers |
| char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E |
| |
| [heading lit(ch)] |
| |
| `lit`, when passed a single character, behaves like the single argument |
| `char_` except that `lit` does not synthesize an attribute. A plain |
| `char` or `wchar_t` is equivalent to a `lit`. |
| |
| [note `lit` is reused by both the [qi_lit_string string parsers] and the |
| char parsers. In general, a char parser is created when you pass in a |
| character and a string parser is created when you pass in a string. The |
| exception is when you pass a single element literal string, e.g. |
| `lit("x")`. In this case, we optimize this to create a char parser |
| instead of a string parser.] |
| |
| Examples: |
| |
| 'x' |
| lit('x') |
| lit(L'x') |
| lit(c) // c is a char |
| |
| [heading Header] |
| |
| // forwards to <boost/spirit/home/qi/char/char.hpp> |
| #include <boost/spirit/include/qi_char_.hpp> |
| |
| Also, see __include_structure__. |
| |
| [heading Namespace] |
| |
| [table |
| [[Name]] |
| [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]] |
| [[`ns::char_`]] |
| ] |
| |
| In the table above, `ns` represents a __char_encoding_namespace__. |
| |
| [heading Model of] |
| |
| [:__primitive_parser_concept__] |
| |
| [variablelist Notation |
| [[`c`, `f`, `l`] [A literal char, e.g. `'x'`, `L'x'` or anything that can be |
| converted to a `char` or `wchar_t`, or a __qi_lazy_argument__ |
| that evaluates to anything that can be converted to a `char` |
| or `wchar_t`.]] |
| [[`ns`] [A __char_encoding_namespace__.]] |
| [[`cs`] [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__ |
| that specifies a char-set definition string following a syntax |
| that resembles posix style regular expression character sets |
| (except the square brackets and the negation `^` character).]] |
| [[`cp`] [A char parser, a char range parser or a char set parser.]] |
| ] |
| |
| [heading Expression Semantics] |
| |
| Semantics of an expression is defined only where it differs from, or is |
| not defined in __primitive_parser_concept__. |
| |
| [table |
| [[Expression] [Semantics]] |
| [[`c`] [Create char parser from a char, `c`.]] |
| [[`lit(c)`] [Create a char parser from a char, `c`.]] |
| [[`ns::char_`] [Create a char parser that matches any character in the |
| `ns` encoding.]] |
| [[`ns::char_(c)`] [Create a char parser with `ns` encoding from a char, `c`.]] |
| [[`ns::char_(f, l)`][Create a char-range parser that matches characters from |
| range (`f` to `l`, inclusive) with `ns` encoding.]] |
| [[`ns::char_(cs)`] [Create a char-set parser with `ns` encoding from a char-set |
| definition string, `cs`.]] |
| [[`~cp`] [Negate `cp`. The result is a negated char parser that |
| matches any character in the `ns` encoding except the |
| characters matched by `cp`.]] |
| ] |
| |
| [heading Attributes] |
| |
| [table |
| [[Expression] [Attribute]] |
| [[`c`] [__unused__ or if `c` is a __qi_lazy_argument__, the character |
| type returned by invoking it.]] |
| [[`lit(c)`] [__unused__ or if `c` is a __qi_lazy_argument__, the character |
| type returned by invoking it.]] |
| [[`ns::char_`] [The character type of the __char_encoding_namespace__, `ns`.]] |
| [[`ns::char_(c)`] [The character type of the __char_encoding_namespace__, `ns`.]] |
| [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]] |
| [[`ns::char_(cs)`] [The character type of the __char_encoding_namespace__, `ns`.]] |
| [[`~cp`] [The attribute of `cp`.]] |
| ] |
| |
| [heading Complexity] |
| |
| [:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g. |
| `wchar_t`). These have *O(log N)* complexity, where N is the number of |
| distinct character ranges in the set.] |
| |
| [heading Example] |
| |
| [note The test harness for the example(s) below is presented in the |
| __qi_basics_examples__ section.] |
| |
| Some using declarations: |
| |
| [reference_using_declarations_lit_char] |
| |
| Basic literals: |
| |
| [reference_char_literals] |
| |
| Range: |
| |
| [reference_char_range] |
| |
| Character set: |
| |
| [reference_char_set] |
| |
| Lazy char_ using __phoenix__ |
| |
| [reference_char_phoenix] |
| |
| [endsect] [/ Char] |
| |
| [/------------------------------------------------------------------------------] |
| [section:char_class Char Classification (`alnum`, `digit`, etc.)] |
| |
| [heading Description] |
| |
| The library has the full repertoire of single character parsers for |
| character classification. This includes the usual `alnum`, `alpha`, |
| `digit`, `xdigit`, etc. parsers. These parsers have an associated |
| __char_encoding_namespace__. This is needed when doing basic operations |
| such as inhibiting case sensitivity. |
| |
| [heading Header] |
| |
| // forwards to <boost/spirit/home/qi/char/char_class.hpp> |
| #include <boost/spirit/include/qi_char_class.hpp> |
| |
| Also, see __include_structure__. |
| |
| [heading Namespace] |
| |
| [table |
| [[Name]] |
| [[`ns::alnum`]] |
| [[`ns::alpha`]] |
| [[`ns::blank`]] |
| [[`ns::cntrl`]] |
| [[`ns::digit`]] |
| [[`ns::graph`]] |
| [[`ns::lower`]] |
| [[`ns::print`]] |
| [[`ns::punct`]] |
| [[`ns::space`]] |
| [[`ns::upper`]] |
| [[`ns::xdigit`]] |
| ] |
| |
| In the table above, `ns` represents a __char_encoding_namespace__. |
| |
| [heading Model of] |
| |
| [:__primitive_parser_concept__] |
| |
| [variablelist Notation |
| [[`ns`] [A __char_encoding_namespace__.]] |
| ] |
| |
| [heading Expression Semantics] |
| |
| Semantics of an expression is defined only where it differs from, or is |
| not defined in __primitive_parser_concept__. |
| |
| [table |
| [[Expression] [Semantics]] |
| [[`ns::alnum`] [Matches alpha-numeric characters]] |
| [[`ns::alpha`] [Matches alphabetic characters]] |
| [[`ns::blank`] [Matches spaces or tabs]] |
| [[`ns::cntrl`] [Matches control characters]] |
| [[`ns::digit`] [Matches numeric digits]] |
| [[`ns::graph`] [Matches non-space printing characters]] |
| [[`ns::lower`] [Matches lower case letters]] |
| [[`ns::print`] [Matches printable characters]] |
| [[`ns::punct`] [Matches punctuation symbols]] |
| [[`ns::space`] [Matches spaces, tabs, returns, and newlines]] |
| [[`ns::upper`] [Matches upper case letters]] |
| [[`ns::xdigit`] [Matches hexadecimal digits]] |
| ] |
| |
| [heading Attributes] |
| |
| [:The character type of the __char_encoding_namespace__, `ns`.] |
| |
| [heading Complexity] |
| |
| [:O(N)] |
| |
| [heading Example] |
| |
| [note The test harness for the example(s) below is presented in the |
| __qi_basics_examples__ section.] |
| |
| Some using declarations: |
| |
| [reference_using_declarations_char_class] |
| |
| Basic usage: |
| |
| [reference_char_class] |
| |
| [endsect] [/ Char Classification] |
| |
| [endsect] |