| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> |
| <title>Boyer-Moore-Horspool Search</title> |
| <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css"> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.78.1"> |
| <link rel="home" href="../../index.html" title="The Boost Algorithm Library"> |
| <link rel="up" href="../../algorithm/Searching.html" title="Searching Algorithms"> |
| <link rel="prev" href="../../algorithm/Searching.html" title="Searching Algorithms"> |
| <link rel="next" href="KnuthMorrisPratt.html" title="Knuth-Morris-Pratt Search"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <table cellpadding="2" width="100%"><tr> |
| <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td> |
| <td align="center"><a href="../../../../../../index.html">Home</a></td> |
| <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td> |
| <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> |
| <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> |
| <td align="center"><a href="../../../../../../more/index.htm">More</a></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool"></a><a class="link" href="BoyerMooreHorspool.html" title="Boyer-Moore-Horspool Search">Boyer-Moore-Horspool |
| Search</a> |
| </h3></div></div></div> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h0"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview">Overview</a> |
| </h5> |
| <p> |
| The header file 'boyer_moore_horspool.hpp' contains an implementation of |
| the Boyer-Moore-Horspool algorithm for searching sequences of values. |
| </p> |
| <p> |
| The Boyer-Moore-Horspool search algorithm was published by Nigel Horspool |
| in 1980. It is a refinement of the Boyer-Moore algorithm that trades space |
| for time. It uses less space for internal tables than Boyer-Moore, and has |
| poorer worst-case performance. |
| </p> |
| <p> |
| The Boyer-Moore-Horspool algorithm cannot be used with comparison predicates |
| like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>. |
| </p> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h1"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface">Interface</a> |
| </h5> |
| <p> |
| Nomenclature: I refer to the sequence being searched for as the "pattern", |
| and the sequence being searched in as the "corpus". |
| </p> |
| <p> |
| For flexibility, the Boyer-Moore-Horspool algorithm has two interfaces; an |
| object-based interface and a procedural one. The object-based interface builds |
| the tables in the constructor, and uses operator () to perform the search. |
| The procedural interface builds the table and does the search all in one |
| step. If you are going to be searching for the same pattern in multiple corpora, |
| then you should use the object interface, and only build the tables once. |
| </p> |
| <p> |
| Here is the object interface: |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">></span> |
| <span class="keyword">class</span> <span class="identifier">boyer_moore_horspool</span> <span class="special">{</span> |
| <span class="keyword">public</span><span class="special">:</span> |
| <span class="identifier">boyer_moore_horspool</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span> |
| <span class="special">~</span><span class="identifier">boyer_moore_horspool</span> <span class="special">();</span> |
| |
| <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">></span> |
| <span class="identifier">corpusIter</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| and here is the corresponding procedural interface: |
| </p> |
| <p> |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">></span> |
| <span class="identifier">corpusIter</span> <span class="identifier">boyer_moore_horspool_search</span> <span class="special">(</span> |
| <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span> |
| <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| Each of the functions is passed two pairs of iterators. The first two define |
| the corpus and the second two define the pattern. Note that the two pairs |
| need not be of the same type, but they do need to "point" at the |
| same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code> |
| and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type. |
| </p> |
| <p> |
| The return value of the function is an iterator pointing to the start of |
| the pattern in the corpus. If the pattern is not found, it returns the end |
| of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>). |
| </p> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h2"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance">Performance</a> |
| </h5> |
| <p> |
| The execution time of the Boyer-Moore-Horspool algorithm is linear in the |
| size of the string being searched; it can have a significantly lower constant |
| factor than many other search algorithms: it doesn't need to check every |
| character of the string to be searched, but rather skips over some of them. |
| Generally the algorithm gets faster as the pattern being searched for becomes |
| longer. Its efficiency derives from the fact that with each unsuccessful |
| attempt to find a match between the search string and the text it is searching, |
| it uses the information gained from that attempt to rule out as many positions |
| of the text as possible where the string cannot match. |
| </p> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h3"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use">Memory |
| Use</a> |
| </h5> |
| <p> |
| The algorithm an internal table that has one entry for each member of the |
| "alphabet" in the pattern. For (8-bit) character types, this table |
| contains 256 entries. |
| </p> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h4"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity">Complexity</a> |
| </h5> |
| <p> |
| The worst-case performance is <span class="emphasis"><em>O(m x n)</em></span>, where <span class="emphasis"><em>m</em></span> |
| is the length of the pattern and <span class="emphasis"><em>n</em></span> is the length of |
| the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>. The best case |
| performance is sub-linear, and is, in fact, identical to Boyer-Moore, but |
| the initialization is quicker and the internal loop is simpler than Boyer-Moore. |
| </p> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h5"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety">Exception |
| Safety</a> |
| </h5> |
| <p> |
| Both the object-oriented and procedural versions of the Boyer-Moore-Horspool |
| algorithm take their parameters by value and do not use any information other |
| than what is passed in. Therefore, both interfaces provide the strong exception |
| guarantee. |
| </p> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h6"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes">Notes</a> |
| </h5> |
| <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| <li class="listitem"> |
| When using the object-based interface, the pattern must remain unchanged |
| for during the searches; i.e, from the time the object is constructed |
| until the final call to operator () returns. |
| </li> |
| <li class="listitem"> |
| The Boyer-Moore-Horspool algorithm requires random-access iterators for |
| both the pattern and the corpus. |
| </li> |
| </ul></div> |
| <h5> |
| <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h7"></a> |
| <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points">Customization |
| points</a> |
| </h5> |
| <p> |
| The Boyer-Moore-Horspool object takes a traits template parameter which enables |
| the caller to customize how the precomputed table is stored. This table, |
| called the skip table, contains (logically) one entry for every possible |
| value that the pattern can contain. When searching 8-bit character data, |
| this table contains 256 elements. The traits class defines the table to be |
| used. |
| </p> |
| <p> |
| The default traits class uses a <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">array</span></code> |
| for small 'alphabets' and a <code class="computeroutput"><span class="identifier">tr1</span><span class="special">::</span><span class="identifier">unordered_map</span></code> |
| for larger ones. The array-based skip table gives excellent performance, |
| but could be prohibitively large when the 'alphabet' of elements to be searched |
| grows. The unordered_map based version only grows as the number of unique |
| elements in the pattern, but makes many more heap allocations, and gives |
| slower lookup performance. |
| </p> |
| <p> |
| To use a different skip table, you should define your own skip table object |
| and your own traits class, and use them to instantiate the Boyer-Moore-Horspool |
| object. The interface to these objects is described TBD. |
| </p> |
| </div> |
| <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> |
| <td align="left"></td> |
| <td align="right"><div class="copyright-footer">Copyright © 2010-2012 Marshall Clow<p> |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) |
| </p> |
| </div></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| </body> |
| </html> |