| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> |
| <title>String Sort</title> |
| <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css"> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.78.1"> |
| <link rel="home" href="../../index.html" title="Boost.Sort"> |
| <link rel="up" href="../sort_hpp.html" title="Spreadsort"> |
| <link rel="prev" href="float_sort.html" title="Float Sort"> |
| <link rel="next" href="rationale.html" title="Rationale"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <table cellpadding="2" width="100%"><tr> |
| <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td> |
| <td align="center"><a href="../../../../../../index.html">Home</a></td> |
| <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td> |
| <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> |
| <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> |
| <td align="center"><a href="../../../../../../more/index.htm">More</a></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="float_sort.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="sort.sort_hpp.string_sort"></a><a class="link" href="string_sort.html" title="String Sort">String Sort</a> |
| </h3></div></div></div> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code> |
| is a hybrid radix-based/comparison-based algorithm that sorts strings of |
| characters (or arrays of binary data) in ascending order. |
| </p> |
| <p> |
| The simplest version (no functors) sorts strings of items that can cast to |
| an unsigned data type (such as an unsigned char), have a &lt; operator, |
| have a size function, and have a data() function that returns a pointer to |
| an array of characters, such as a std::string. The functor version can sort |
| any data type that has a strict weak ordering, via templating, but requires |
| definitions of a get_char (acts like x[offset] on a string or a byte array), |
| get_length (returns length of the string being sorted), and a comparison |
| functor. Individual characters returned by get_char must support the != operator |
| and have an unsigned value that defines their lexicographical order. |
| </p> |
| <p> |
| This algorithm is not efficient for character types larger than 2 bytes each, |
| and is optimized for one-byte character strings. For this reason, <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a> will |
| be called instead if the character type is of size > 2. |
| </p> |
| <p> |
| <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code> |
| has a special optimization for identical substrings. This adds some overhead |
| on random data, but identical substrings are common in real strings. |
| </p> |
| <p> |
| reverse_string_sort sorts strings in reverse (decending) order, but is otherwise |
| identical. <code class="literal"><code class="computeroutput"><a class="link" href="../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code> |
| is sufficiently flexible that it should sort any data type that <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a> |
| can, assuming the user provides appropriate functors that index into a key. |
| </p> |
| <p> |
| <a href="../../../../doc/graph/windows_string_sort.htm" target="_top">Windows String Sort</a> |
| </p> |
| <p> |
| <a href="../../../../doc/graph/osx_string_sort.htm" target="_top">OSX String Sort</a> |
| </p> |
| <div class="section"> |
| <div class="titlepage"><div><div><h4 class="title"> |
| <a name="sort.sort_hpp.string_sort.stringsort_examples"></a><a class="link" href="string_sort.html#sort.sort_hpp.string_sort.stringsort_examples" title="String Sort Examples">String |
| Sort Examples</a> |
| </h4></div></div></div> |
| <p> |
| See <a href="../../../../example/stringfunctorsample.cpp" target="_top">stringfunctorsample.cpp</a> |
| for an example of how to sort structs using a string key and all functors: |
| </p> |
| <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span> |
| <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">a</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span> |
| <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">[</span><span class="identifier">offset</span><span class="special">];</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span> |
| <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span><span class="special">{</span> <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">a</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| and these functors are used thus: |
| </p> |
| <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span> |
| </pre> |
| <p> |
| See <a href="../../../../example/generalizedstruct.cpp" target="_top">generalizedstruct.cpp</a> |
| for a working example of a generalized approach to sort structs by a sequence |
| of integer, float, and multiple string keys using string_sort: |
| </p> |
| <pre class="programlisting"><span class="keyword">struct</span> <span class="identifier">DATA_TYPE</span> <span class="special">{</span> |
| <span class="identifier">time_t</span> <span class="identifier">birth</span><span class="special">;</span> |
| <span class="keyword">float</span> <span class="identifier">net_worth</span><span class="special">;</span> |
| <span class="identifier">string</span> <span class="identifier">first_name</span><span class="special">;</span> |
| <span class="identifier">string</span> <span class="identifier">last_name</span><span class="special">;</span> |
| <span class="special">};</span> |
| |
| <span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">birth_size</span> <span class="special">=</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">time_t</span><span class="special">);</span> |
| <span class="keyword">static</span> <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">first_name_offset</span> <span class="special">=</span> <span class="identifier">birth_size</span> <span class="special">+</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="keyword">float</span><span class="special">);</span> |
| <span class="keyword">static</span> <span class="keyword">const</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">uint64_t</span> <span class="identifier">base_mask</span> <span class="special">=</span> <span class="number">0xff</span><span class="special">;</span> |
| |
| <span class="keyword">struct</span> <span class="identifier">lessthan</span> <span class="special">{</span> |
| <span class="keyword">inline</span> <span class="keyword">bool</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">y</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">birth</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special">!=</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span> <span class="special"><</span> <span class="identifier">y</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| |
| <span class="keyword">struct</span> <span class="identifier">bracket</span> <span class="special">{</span> |
| <span class="keyword">inline</span> <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">,</span> <span class="identifier">size_t</span> <span class="identifier">offset</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> |
| <span class="comment">// Sort date as a signed int, returning the appropriate byte.</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">birth_size</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span> |
| <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">birth</span> <span class="special">&</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special"><<</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">bit_shift</span><span class="special">;</span> |
| <span class="comment">// Handling the sign bit. Unnecessary if the data is always positive.</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">result</span> <span class="special">^</span> <span class="number">128</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="comment">// Sort a signed float. This requires reversing the order of negatives</span> |
| <span class="comment">// because of the way floats are represented in bits.</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">first_name_offset</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">const</span> <span class="keyword">int</span> <span class="identifier">bit_shift</span> <span class="special">=</span> <span class="number">8</span> <span class="special">*</span> <span class="special">(</span><span class="identifier">first_name_offset</span> <span class="special">-</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span> |
| <span class="keyword">unsigned</span> <span class="identifier">key</span> <span class="special">=</span> <span class="identifier">float_mem_cast</span><span class="special"><</span><span class="keyword">float</span><span class="special">,</span> <span class="keyword">unsigned</span><span class="special">>(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span><span class="special">);</span> |
| <span class="keyword">unsigned</span> <span class="keyword">char</span> <span class="identifier">result</span> <span class="special">=</span> <span class="special">(</span><span class="identifier">key</span> <span class="special">&</span> <span class="special">(</span><span class="identifier">base_mask</span> <span class="special"><<</span> <span class="identifier">bit_shift</span><span class="special">))</span> <span class="special">>></span> <span class="identifier">bit_shift</span><span class="special">;</span> |
| <span class="comment">// Handling the sign.</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">x</span><span class="special">.</span><span class="identifier">net_worth</span> <span class="special"><</span> <span class="number">0</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="number">255</span> <span class="special">-</span> <span class="identifier">result</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="comment">// Increasing positives so they are higher than negatives.</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">birth_size</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="number">128</span> <span class="special">+</span> <span class="identifier">result</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">return</span> <span class="identifier">result</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="comment">// Sort a string that is before the end. This approach supports embedded</span> |
| <span class="comment">// nulls. If embedded nulls are not required, then just delete the "* 2"</span> |
| <span class="comment">// and the inside of the following if just becomes:</span> |
| <span class="comment">// return x.first_name[offset - first_name_offset];</span> |
| <span class="keyword">const</span> <span class="keyword">unsigned</span> <span class="identifier">first_name_end_offset</span> <span class="special">=</span> |
| <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span><span class="special">;</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special"><</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">int</span> <span class="identifier">char_offset</span> <span class="special">=</span> <span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_offset</span><span class="special">;</span> |
| <span class="comment">// This signals that the string continues.</span> |
| <span class="keyword">if</span> <span class="special">(!(</span><span class="identifier">char_offset</span> <span class="special">&</span> <span class="number">1</span><span class="special">))</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="number">1</span><span class="special">;</span> |
| <span class="special">}</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">[</span><span class="identifier">char_offset</span> <span class="special">>></span> <span class="number">1</span><span class="special">];</span> |
| <span class="special">}</span> |
| |
| <span class="comment">// This signals that the string has ended, so that shorter strings come</span> |
| <span class="comment">// before longer ones.</span> |
| <span class="keyword">if</span> <span class="special">(</span><span class="identifier">offset</span> <span class="special">==</span> <span class="identifier">first_name_end_offset</span><span class="special">)</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="number">0</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="comment">// The final string needs no special consideration.</span> |
| <span class="keyword">return</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">[</span><span class="identifier">offset</span> <span class="special">-</span> <span class="identifier">first_name_end_offset</span> <span class="special">-</span> <span class="number">1</span><span class="special">];</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| |
| <span class="keyword">struct</span> <span class="identifier">getsize</span> <span class="special">{</span> |
| <span class="keyword">inline</span> <span class="identifier">size_t</span> <span class="keyword">operator</span><span class="special">()(</span><span class="keyword">const</span> <span class="identifier">DATA_TYPE</span> <span class="special">&</span><span class="identifier">x</span><span class="special">)</span> <span class="keyword">const</span> <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">first_name_offset</span> <span class="special">+</span> <span class="identifier">x</span><span class="special">.</span><span class="identifier">first_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">()</span> <span class="special">*</span> <span class="number">2</span> <span class="special">+</span> <span class="number">1</span> <span class="special">+</span> |
| <span class="identifier">x</span><span class="special">.</span><span class="identifier">last_name</span><span class="special">.</span><span class="identifier">size</span><span class="special">();</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <pre class="programlisting"><span class="identifier">string_sort</span><span class="special">(</span><span class="identifier">array</span><span class="special">.</span><span class="identifier">begin</span><span class="special">(),</span> <span class="identifier">array</span><span class="special">.</span><span class="identifier">end</span><span class="special">(),</span> <span class="identifier">bracket</span><span class="special">(),</span> <span class="identifier">getsize</span><span class="special">(),</span> <span class="identifier">lessthan</span><span class="special">());</span> |
| </pre> |
| <p> |
| Other examples: |
| </p> |
| <p> |
| <a href="../../../../example/stringsample.cpp" target="_top">String sort.</a> |
| </p> |
| <p> |
| <a href="../../../../example/reversestringsample.cpp" target="_top">Reverse string sort.</a> |
| </p> |
| <p> |
| <a href="../../../../example/wstringsample.cpp" target="_top">Wide character string sort.</a> |
| </p> |
| <p> |
| <a href="../../../../example/caseinsensitive.cpp" target="_top">Case insensitive string |
| sort.</a> |
| </p> |
| <p> |
| <a href="../../../../example/charstringsample.cpp" target="_top">Sort structs using a string |
| key and indexing functors.</a> |
| </p> |
| <p> |
| <a href="../../../../example/reversestringfunctorsample.cpp" target="_top">Sort structs |
| using a string keynd all functors in reverse order.</a> |
| </p> |
| </div> |
| </div> |
| <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> |
| <td align="left"></td> |
| <td align="right"><div class="copyright-footer">Copyright © 2014 Steven Ross<p> |
| Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost |
| Software License, Version 1.0</a>. |
| </p> |
| </div></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="float_sort.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../sort_hpp.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="rationale.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| </body> |
| </html> |