| [/ |
| / Copyright (c) 2008 Eric Niebler |
| / |
| / Distributed under the Boost Software License, Version 1.0. (See accompanying |
| / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| /] |
| |
| [section Introduction] |
| |
| [h2 What is xpressive?] |
| |
| xpressive is a regular expression template library. Regular expressions |
| (regexes) can be written as strings that are parsed dynamically at runtime |
| (dynamic regexes), or as ['expression templates][footnote See |
| [@http://www.osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html |
| Expression Templates]] that are parsed at compile-time (static regexes). |
| Dynamic regexes have the advantage that they can be accepted from the user |
| as input at runtime or read from an initialization file. Static regexes |
| have several advantages. Since they are C++ expressions instead of |
| strings, they can be syntax-checked at compile-time. Also, they can naturally |
| refer to code and data elsewhere in your program, giving you the ability to call |
| back into your code from within a regex match. Finally, since they are statically |
| bound, the compiler can generate faster code for static regexes. |
| |
| xpressive's dual nature is unique and powerful. Static xpressive is a bit |
| like the _spirit_fx_. Like _spirit_, you can build grammars with |
| static regexes using expression templates. (Unlike _spirit_, xpressive does |
| exhaustive backtracking, trying every possibility to find a match for your |
| pattern.) Dynamic xpressive is a bit like _regexpp_. In fact, |
| xpressive's interface should be familiar to anyone who has used _regexpp_. |
| xpressive's innovation comes from allowing you to mix and match static and |
| dynamic regexes in the same program, and even in the same expression! You |
| can embed a dynamic regex in a static regex, or /vice versa/, and the embedded |
| regex will participate fully in the search, back-tracking as needed to make |
| the match succeed. |
| |
| [h2 Hello, world!] |
| |
| Enough theory. Let's have a look at ['Hello World], xpressive style: |
| |
| #include <iostream> |
| #include <boost/xpressive/xpressive.hpp> |
| |
| using namespace boost::xpressive; |
| |
| int main() |
| { |
| std::string hello( "hello world!" ); |
| |
| sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); |
| smatch what; |
| |
| if( regex_match( hello, what, rex ) ) |
| { |
| std::cout << what[0] << '\n'; // whole match |
| std::cout << what[1] << '\n'; // first capture |
| std::cout << what[2] << '\n'; // second capture |
| } |
| |
| return 0; |
| } |
| |
| This program outputs the following: |
| |
| [pre |
| hello world! |
| hello |
| world |
| ] |
| |
| The first thing you'll notice about the code is that all the types in xpressive live in |
| the `boost::xpressive` namespace. |
| |
| [note Most of the rest of the examples in this document will leave off the |
| `using namespace boost::xpressive;` directive. Just pretend it's there.] |
| |
| Next, you'll notice the type of the regular expression object is `sregex`. If you are familiar |
| with _regexpp_, this is different than what you are used to. The "`s`" in "`sregex`" stands for |
| "`string`", indicating that this regex can be used to find patterns in `std::string` objects. |
| I'll discuss this difference and its implications in detail later. |
| |
| Notice how the regex object is initialized: |
| |
| sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); |
| |
| To create a regular expression object from a string, you must call a factory method such as |
| _regex_compile_. This is another area in which xpressive differs from |
| other object-oriented regular expression libraries. Other libraries encourage you to think of |
| a regular expression as a kind of string on steroids. In xpressive, regular expressions are not |
| strings; they are little programs in a domain-specific language. Strings are only one ['representation] |
| of that language. Another representation is an expression template. For example, the above line of code |
| is equivalent to the following: |
| |
| sregex rex = (s1= +_w) >> ' ' >> (s2= +_w) >> '!'; |
| |
| This describes the same regular expression, except it uses the domain-specific embedded language |
| defined by static xpressive. |
| |
| As you can see, static regexes have a syntax that is noticeably different than standard Perl |
| syntax. That is because we are constrained by C++'s syntax. The biggest difference is the use |
| of `>>` to mean "followed by". For instance, in Perl you can just put sub-expressions next |
| to each other: |
| |
| abc |
| |
| But in C++, there must be an operator separating sub-expressions: |
| |
| a >> b >> c |
| |
| In Perl, parentheses `()` have special meaning. They group, but as a side-effect they also create |
| back-references like [^$1] and [^$2]. In C++, there is no way to overload parentheses to give them |
| side-effects. To get the same effect, we use the special `s1`, `s2`, etc. tokens. Assign to |
| one to create a back-reference (known as a sub-match in xpressive). |
| |
| You'll also notice that the one-or-more repetition operator `+` has moved from postfix |
| to prefix position. That's because C++ doesn't have a postfix `+` operator. So: |
| |
| "\\w+" |
| |
| is the same as: |
| |
| +_w |
| |
| We'll cover all the other differences [link boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes later]. |
| |
| [endsect] |