| [/============================================================================== |
| Copyright (C) 2001-2010 Joel de Guzman |
| Copyright (C) 2001-2010 Hartmut Kaiser |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| ===============================================================================/] |
| |
| [section:lexer_quickstart1 Quickstart 1 - A word counter using __lex__] |
| |
| __lex__ is very modular, which follows the general building principle of the |
| __spirit__ libraries. You never pay for features you don't use. It is nicely |
| integrated with the other parts of __spirit__ but nevertheless can be used |
| separately to build stand alone lexical analyzers. |
| The first quick start example describes a stand alone application: |
| counting characters, words, and lines in a file, very similar to what the well |
| known Unix command `wc` is doing (for the full example code see here: |
| [@../../example/lex/word_count_functor.cpp word_count_functor.cpp]). |
| |
| [import ../example/lex/word_count_functor.cpp] |
| |
| |
| [heading Prerequisites] |
| |
| The only required `#include` specific to /Spirit.Lex/ follows. It is a wrapper |
| for all necessary definitions to use /Spirit.Lex/ in a stand alone fashion, and |
| on top of the __lexertl__ library. Additionally we `#include` two of the Boost |
| headers to define `boost::bind()` and `boost::ref()`. |
| |
| [wcf_includes] |
| |
| To make all the code below more readable we introduce the following namespaces. |
| |
| [wcf_namespaces] |
| |
| |
| [heading Defining Tokens] |
| |
| The most important step while creating a lexer using __lex__ is to define the |
| tokens to be recognized in the input sequence. This is normally done by |
| defining the regular expressions describing the matching character sequences, |
| and optionally their corresponding token ids. Additionally the defined tokens |
| need to be associated with an instance of a lexer object as provided by the |
| library. The following code snippet shows how this can be done using __lex__. |
| |
| [wcf_token_definition] |
| |
| |
| [heading Doing the Useful Work] |
| |
| We will use a setup, where we want the __lex__ library to invoke a given |
| function after any of of the generated tokens is recognized. For this reason |
| we need to implement a functor taking at least the generated token as an |
| argument and returning a boolean value allowing to stop the tokenization |
| process. The default token type used in this example carries a token value of |
| the type __boost_iterator_range__`<BaseIterator>` pointing to the matched |
| range in the underlying input sequence. |
| |
| [wcf_functor] |
| |
| All what is left is to write some boilerplate code helping to tie together the |
| pieces described so far. To simplify this example we call the `lex::tokenize()` |
| function implemented in __lex__ (for a more detailed description of this |
| function see here: __fixme__), even if we could have written a loop to iterate |
| over the lexer iterators [`first`, `last`) as well. |
| |
| |
| [heading Pulling Everything Together] |
| |
| [wcf_main] |
| |
| |
| [heading Comparing __lex__ with __flex__] |
| |
| This example was deliberately chosen to be as much as possible similar to the |
| equivalent __flex__ program (see below), which isn't too different from what |
| has to be written when using __lex__. |
| |
| [note Interestingly enough, performance comparisons of lexical analyzers |
| written using __lex__ with equivalent programs generated by |
| __flex__ show that both have comparable execution speeds! |
| Generally, thanks to the highly optimized __lexertl__ library and |
| due its carefully designed integration with __spirit__ the |
| abstraction penalty to be paid for using __lex__ is negligible. |
| ] |
| |
| The remaining examples in this tutorial will use more sophisticated features |
| of __lex__, mainly to allow further simplification of the code to be written, |
| while maintaining the similarity with corresponding features of __flex__. |
| __lex__ has been designed to be as similar to __flex__ as possible. That |
| is why this documentation will provide the corresponding __flex__ code for the |
| shown __lex__ examples almost everywhere. So consequently, here is the __flex__ |
| code corresponding to the example as shown above. |
| |
| [wcf_flex_version] |
| |
| [endsect] |