| [/ |
| Copyright 2006-2007 John Maddock. |
| Distributed under the Boost Software License, Version 1.0. |
| (See accompanying file LICENSE_1_0.txt or copy at |
| http://www.boost.org/LICENSE_1_0.txt). |
| ] |
| |
| [section:faq FAQ] |
| |
| [*Q.] I can't get regex++ to work with escape characters, what's going on? |
| |
| [*A.] If you embed regular expressions in C++ code, then remember that escape |
| characters are processed twice: once by the C++ compiler, and once by the |
| Boost.Regex expression compiler, so to pass the regular expression \d+ |
| to Boost.Regex, you need to embed "\\d+" in your code. Likewise to match a |
| literal backslash you will need to embed "\\\\" in your code. |
| |
| [*Q.] No matter what I do regex_match always returns false, what's going on? |
| |
| [*A.] The algorithm regex_match only succeeds if the expression matches *all* |
| of the text, if you want to *find* a sub-string within the text that matches |
| the expression then use regex_search instead. |
| |
| [*Q.] Why does using parenthesis in a POSIX regular expression change the |
| result of a match? |
| |
| [*A.] For POSIX (extended and basic) regular expressions, but not for perl regexes, |
| parentheses don't only mark; they determine what the best match is as well. |
| When the expression is compiled as a POSIX basic or extended regex then Boost.Regex |
| follows the POSIX standard leftmost longest rule for determining what matched. |
| So if there is more than one possible match after considering the whole expression, |
| it looks next at the first sub-expression and then the second sub-expression |
| and so on. So... |
| |
| "'''(0*)([0-9]*)'''" against "00123" would produce |
| $1 = "00" |
| $2 = "123" |
| |
| where as |
| |
| "0*([0-9])*" against "00123" would produce |
| $1 = "00123" |
| |
| If you think about it, had $1 only matched the "123", this would be "less good" |
| than the match "00123" which is both further to the left and longer. If you |
| want $1 to match only the "123" part, then you need to use something like: |
| |
| "0*([1-9][0-9]*)" |
| |
| as the expression. |
| |
| [*Q.] Why don't character ranges work properly (POSIX mode only)? |
| |
| [*A.] The POSIX standard specifies that character range expressions are |
| locale sensitive - so for example the expression [A-Z] will match any |
| collating element that collates between 'A' and 'Z'. That means that for |
| most locales other than "C" or "POSIX", [A-Z] would match the single |
| character 't' for example, which is not what most people expect - or |
| at least not what most people have come to expect from regular |
| expression engines. For this reason, the default behaviour of Boost.Regex |
| (perl mode) is to turn locale sensitive collation off by not setting the |
| `regex_constants::collate` compile time flag. However if you set a non-default |
| compile time flag - for example `regex_constants::extended` or |
| `regex_constants::basic`, then locale dependent collation will be enabled, |
| this also applies to the POSIX API functions which use either |
| `regex_constants::extended` or `regex_constants::basic` internally. |
| [Note - when `regex_constants::nocollate` in effect, the library behaves |
| "as if" the LC_COLLATE locale category were always "C", regardless of what |
| its actually set to - end note]. |
| |
| [*Q.] Why are there no throw specifications on any of the functions? |
| What exceptions can the library throw? |
| |
| [*A.] Not all compilers support (or honor) throw specifications, others |
| support them but with reduced efficiency. Throw specifications may be added |
| at a later date as compilers begin to handle this better. The library |
| should throw only three types of exception: [boost::regex_error] can be |
| thrown by [basic_regex] when compiling a regular expression, `std::runtime_error` |
| can be thrown when a call to `basic_regex::imbue` tries to open a message |
| catalogue that doesn't exist, or when a call to [regex_search] or [regex_match] |
| results in an "everlasting" search, or when a call to `RegEx::GrepFiles` or |
| `RegEx::FindFiles` tries to open a file that cannot be opened, finally |
| `std::bad_alloc` can be thrown by just about any of the functions in this library. |
| |
| [*Q.] Why can't I use the "convenience" versions of regex_match / |
| regex_search / regex_grep / regex_format / regex_merge? |
| |
| [*A.] These versions may or may not be available depending upon the |
| capabilities of your compiler, the rules determining the format of |
| these functions are quite complex - and only the versions visible to |
| a standard compliant compiler are given in the help. To find out |
| what your compiler supports, run <boost/regex.hpp> through your |
| C++ pre-processor, and search the output file for the function |
| that you are interested in. Note however, that very few current |
| compilers still have problems with these overloaded functions. |
| |
| [endsect] |
| |