boost/libs/sort/doc/html/sort/sort_hpp/rationale/why_spreadsort.html - nest-cam/4320010/boost - Git at Google

 <html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
 <title>Why spreadsort?</title>
 <link rel="stylesheet" href="../../../../../../../doc/src/boostbook.css" type="text/css">
 <meta name="generator" content="DocBook XSL Stylesheets V1.78.1">
 <link rel="home" href="../../../index.html" title="Boost.Sort">
 <link rel="up" href="../rationale.html" title="Rationale">
 <link rel="prev" href="hybrid_radix.html" title="Hybrid Radix">
 <link rel="next" href="unstable_sort.html" title="Unstable Sorting">
 </head>
 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
 <table cellpadding="2" width="100%"><tr>
 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../boost.png"></td>
 <td align="center"><a href="../../../../../../../index.html">Home</a></td>
 <td align="center"><a href="../../../../../../../libs/libraries.htm">Libraries</a></td>
 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
 <td align="center"><a href="../../../../../../../more/index.htm">More</a></td>
 </tr></table>
 <hr>
 <div class="spirit-nav">
 <a accesskey="p" href="hybrid_radix.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../rationale.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unstable_sort.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
 </div>
 <div class="section">
 <div class="titlepage"><div><div><h4 class="title">
 <a name="sort.sort_hpp.rationale.why_spreadsort"></a><a class="link" href="why_spreadsort.html" title="Why spreadsort?">Why spreadsort?</a>
 </h4></div></div></div>
 <p>
           The <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/spreadsort_idp47957744.html" title="Function template spreadsort">spreadsort</a></code></code>
           algorithm used in this library is designed to provide best possible worst-case
           performance, while still being cache-friendly. It provides the better of
           <span class="emphasis"><em>&#119926;(N*log(K/S + S))</em></span> and <span class="emphasis"><em>&#119926;(N*log(N))</em></span>
           worst-case time, where <span class="emphasis"><em>K</em></span> is the log of the range.
           The log of the range is normally the length in bits of the data type; 32
           for a 32-bit integer.
         </p>
 <p>
           <code class="computeroutput"><span class="identifier">flash_sort</span></code> (another hybrid
           algorithm), by comparison is <span class="emphasis"><em>&#119926;(N)</em></span> for evenly distributed
           lists. The problem is, <code class="computeroutput"><span class="identifier">flash_sort</span></code>
           is merely an MSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix
           sort</a> combined with <span class="emphasis"><em>&#119926;(N*N)</em></span> insertion sort to
           deal with small subsets where the MSD Radix Sort is inefficient, so it
           is inefficient with chunks of data around the size at which it switches
           to <code class="computeroutput"><span class="identifier">insertion_sort</span></code>, and
           ends up operating as an enhanced MSD Radix Sort. For uneven distributions
           this makes it especially inefficient.
         </p>
 <p>
           <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/integer_sort_idp41299456.html" title="Function template integer_sort">integer_sort</a></code></code>
           and <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/float_sort_idp47034528.html" title="Function template float_sort">float_sort</a></code></code>
           use <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
           instead, which provides <span class="emphasis"><em>&#119926;(N*log(N))</em></span> performance for
           these medium-sized pieces. Also, <code class="computeroutput"><span class="identifier">flash_sort</span></code>'s
           <span class="emphasis"><em>&#119926;(N)</em></span> performance for even distributions comes at the
           cost of cache misses, which on modern architectures are extremely expensive,
           and in testing on modern systems ends up being slower than cutting up the
           data in multiple, cache-friendly steps. Also worth noting is that on most
           modern computers, <code class="computeroutput"><span class="identifier">log2</span><span class="special">(</span><span class="identifier">available</span>
           <span class="identifier">RAM</span><span class="special">)/</span><span class="identifier">log2</span><span class="special">(</span><span class="identifier">L1</span> <span class="identifier">cache</span>
           <span class="identifier">size</span><span class="special">)</span></code>
           is around 3, where a cache miss takes more than 3 times as long as an in-cache
           random-access, and the size of <span class="emphasis"><em>max_splits</em></span> is tuned
           to the size of the cache. On a computer where cache misses aren't this
           expensive, <span class="emphasis"><em>max_splits</em></span> could be increased to a large
           value, or eliminated entirely, and <code class="computeroutput"><span class="identifier">integer_sort</span><span class="special">/</span><span class="identifier">float_sort</span></code>
           would have the same <span class="emphasis"><em>&#119926;(N)</em></span> performance on even distributions.
         </p>
 <p>
           Adaptive Left Radix (ALR) is similar to <code class="computeroutput"><span class="identifier">flash_sort</span></code>,
           but more cache-friendly. It still uses insertion_sort. Because ALR uses
           <span class="emphasis"><em>&#119926;(N*N)</em></span> <code class="computeroutput"><span class="identifier">insertion_sort</span></code>,
           it isn't efficient to use the comparison-based fallback sort on large lists,
           and if the data is clustered in small chunks just over the fallback size
           with a few outliers, radix-based sorting iterates many times doing little
           sorting with high overhead. Asymptotically, ALR is still <span class="emphasis"><em>&#119926;(N*log(K/S
           + S))</em></span>, but with a very small <span class="emphasis"><em>S</em></span> (about 2
           in the worst case), which compares unfavorably with the 11 default value
           of <span class="emphasis"><em>max_splits</em></span> for Spreadsort.
         </p>
 <p>
           ALR also does not have the <span class="emphasis"><em>&#119926;(N*log(N))</em></span> fallback, so
           for small lists that are not evenly distributed it is extremely inefficient.
           See the <code class="computeroutput"><span class="identifier">alrbreaker</span></code> and
           <code class="computeroutput"><span class="identifier">binaryalrbreaker</span></code> testcases
           for examples; either replace the call to sort with a call to ALR and update
           the ALR_THRESHOLD at the top, or as a quick comparison make <code class="computeroutput"><span class="identifier">get_max_count</span> <span class="keyword">return</span>
           <span class="identifier">ALR_THRESHOLD</span></code> (20 by default
           based upon the paper). These small tests take 4-10 times as long with ALR
           as <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
           in the author's testing, depending on the test system, because they are
           trying to sort a highly uneven distribution. Normal Spreadsort does much
           better with them, because <code class="computeroutput"><span class="identifier">get_max_count</span></code>
           is designed around minimizing worst-case runtime.
         </p>
 <p>
           <code class="computeroutput"><span class="identifier">burst_sort</span></code> is an efficient
           hybrid algorithm for strings that uses substantial additional memory.
         </p>
 <p>
           <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
           uses minimal additional memory by comparison. Speed comparisons between
           the two haven't been made, but the better memory efficiency makes <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
           more general.
         </p>
 <p>
           <code class="computeroutput"><span class="identifier">postal_sort</span></code> and <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
           are similar. A direct performance comparison would be welcome, but an efficient
           version of <code class="computeroutput"><span class="identifier">postal_sort</span></code>
           was not found in a search for source.
         </p>
 <p>
           <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
           is most similar to the <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
           flag sort</a> algorithm. The main difference is that it doesn't bother
           trying to optimize how empty buckets/piles are handled, instead just checking
           to see if all characters at the current index are equal. Other differences
           are using <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
           as the fallback algorithm, and a larger fallback size (256 vs. 16), which
           makes empty pile handling less important.
         </p>
 <p>
           Another difference is not applying the stack-size restriction. Because
           of the equality check in <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>,
           it would take <span class="emphasis"><em>m*m</em></span> memory worth of strings to force
           <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
           to create a stack of depth <span class="emphasis"><em>m</em></span>. This problem isn't a
           realistic one on modern systems with multi-megabyte stacksize limits, where
           main memory would be exhausted holding the long strings necessary to exceed
           the stacksize limit. <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
           can be thought of as modernizing <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
           flag sort</a> to take advantage of <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
           as a fallback algorithm. In the author's testing, <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
           flag sort</a> (on <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">strings</span></code>)
           had comparable runtime to <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>,
           but making a hybrid of the two allows reduced overhead and substantially
           superior performance.
         </p>
 </div>
 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
 <td align="left"></td>
 <td align="right"><div class="copyright-footer">Copyright &#169; 2014 Steven Ross<p>
         Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
         Software License, Version 1.0</a>.
       </p>
 </div></td>
 </tr></table>
 <hr>
 <div class="spirit-nav">
 <a accesskey="p" href="hybrid_radix.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../rationale.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unstable_sort.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
 </div>
 </body>
 </html>
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
	<title>Why spreadsort?</title>
	<link rel="stylesheet" href="../../../../../../../doc/src/boostbook.css" type="text/css">
	<meta name="generator" content="DocBook XSL Stylesheets V1.78.1">
	<link rel="home" href="../../../index.html" title="Boost.Sort">
	<link rel="up" href="../rationale.html" title="Rationale">
	<link rel="prev" href="hybrid_radix.html" title="Hybrid Radix">
	<link rel="next" href="unstable_sort.html" title="Unstable Sorting">
	</head>
	<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
	<table cellpadding="2" width="100%"><tr>
	<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../boost.png"></td>
	<td align="center"><a href="../../../../../../../index.html">Home</a></td>
	<td align="center"><a href="../../../../../../../libs/libraries.htm">Libraries</a></td>
	<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
	<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
	<td align="center"><a href="../../../../../../../more/index.htm">More</a></td>
	</tr></table>
	<hr>
	<div class="spirit-nav">
	<a accesskey="p" href="hybrid_radix.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../rationale.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unstable_sort.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
	</div>
	<div class="section">
	<div class="titlepage"><div><div><h4 class="title">
	<a name="sort.sort_hpp.rationale.why_spreadsort"></a><a class="link" href="why_spreadsort.html" title="Why spreadsort?">Why spreadsort?</a>
	</h4></div></div></div>
	<p>
	The <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/spreadsort_idp47957744.html" title="Function template spreadsort">spreadsort</a></code></code>
	algorithm used in this library is designed to provide best possible worst-case
	performance, while still being cache-friendly. It provides the better of
	<span class="emphasis"><em>𝑶(Nlog(K/S + S))</em></span> and <span class="emphasis"><em>𝑶(Nlog(N))</em></span>
	worst-case time, where <span class="emphasis"><em>K</em></span> is the log of the range.
	The log of the range is normally the length in bits of the data type; 32
	for a 32-bit integer.
	</p>
	<p>
	<code class="computeroutput"><span class="identifier">flash_sort</span></code> (another hybrid
	algorithm), by comparison is <span class="emphasis"><em>𝑶(N)</em></span> for evenly distributed
	lists. The problem is, <code class="computeroutput"><span class="identifier">flash_sort</span></code>
	is merely an MSD <a href="http://en.wikipedia.org/wiki/Radix_sort" target="_top">radix
	sort</a> combined with <span class="emphasis"><em>𝑶(N*N)</em></span> insertion sort to
	deal with small subsets where the MSD Radix Sort is inefficient, so it
	is inefficient with chunks of data around the size at which it switches
	to <code class="computeroutput"><span class="identifier">insertion_sort</span></code>, and
	ends up operating as an enhanced MSD Radix Sort. For uneven distributions
	this makes it especially inefficient.
	</p>
	<p>
	<code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/integer_sort_idp41299456.html" title="Function template integer_sort">integer_sort</a></code></code>
	and <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/float_sort_idp47034528.html" title="Function template float_sort">float_sort</a></code></code>
	use <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
	instead, which provides <span class="emphasis"><em>𝑶(N*log(N))</em></span> performance for
	these medium-sized pieces. Also, <code class="computeroutput"><span class="identifier">flash_sort</span></code>'s
	<span class="emphasis"><em>𝑶(N)</em></span> performance for even distributions comes at the
	cost of cache misses, which on modern architectures are extremely expensive,
	and in testing on modern systems ends up being slower than cutting up the
	data in multiple, cache-friendly steps. Also worth noting is that on most
	modern computers, <code class="computeroutput"><span class="identifier">log2</span><span class="special">(</span><span class="identifier">available</span>
	<span class="identifier">RAM</span><span class="special">)/</span><span class="identifier">log2</span><span class="special">(</span><span class="identifier">L1</span> <span class="identifier">cache</span>
	<span class="identifier">size</span><span class="special">)</span></code>
	is around 3, where a cache miss takes more than 3 times as long as an in-cache
	random-access, and the size of <span class="emphasis"><em>max_splits</em></span> is tuned
	to the size of the cache. On a computer where cache misses aren't this
	expensive, <span class="emphasis"><em>max_splits</em></span> could be increased to a large
	value, or eliminated entirely, and <code class="computeroutput"><span class="identifier">integer_sort</span><span class="special">/</span><span class="identifier">float_sort</span></code>
	would have the same <span class="emphasis"><em>𝑶(N)</em></span> performance on even distributions.
	</p>
	<p>
	Adaptive Left Radix (ALR) is similar to <code class="computeroutput"><span class="identifier">flash_sort</span></code>,
	but more cache-friendly. It still uses insertion_sort. Because ALR uses
	<span class="emphasis"><em>𝑶(N*N)</em></span> <code class="computeroutput"><span class="identifier">insertion_sort</span></code>,
	it isn't efficient to use the comparison-based fallback sort on large lists,
	and if the data is clustered in small chunks just over the fallback size
	with a few outliers, radix-based sorting iterates many times doing little
	sorting with high overhead. Asymptotically, ALR is still <span class="emphasis"><em>𝑶(N*log(K/S
	+ S))</em></span>, but with a very small <span class="emphasis"><em>S</em></span> (about 2
	in the worst case), which compares unfavorably with the 11 default value
	of <span class="emphasis"><em>max_splits</em></span> for Spreadsort.
	</p>
	<p>
	ALR also does not have the <span class="emphasis"><em>𝑶(N*log(N))</em></span> fallback, so
	for small lists that are not evenly distributed it is extremely inefficient.
	See the <code class="computeroutput"><span class="identifier">alrbreaker</span></code> and
	<code class="computeroutput"><span class="identifier">binaryalrbreaker</span></code> testcases
	for examples; either replace the call to sort with a call to ALR and update
	the ALR_THRESHOLD at the top, or as a quick comparison make <code class="computeroutput"><span class="identifier">get_max_count</span> <span class="keyword">return</span>
	<span class="identifier">ALR_THRESHOLD</span></code> (20 by default
	based upon the paper). These small tests take 4-10 times as long with ALR
	as <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
	in the author's testing, depending on the test system, because they are
	trying to sort a highly uneven distribution. Normal Spreadsort does much
	better with them, because <code class="computeroutput"><span class="identifier">get_max_count</span></code>
	is designed around minimizing worst-case runtime.
	</p>
	<p>
	<code class="computeroutput"><span class="identifier">burst_sort</span></code> is an efficient
	hybrid algorithm for strings that uses substantial additional memory.
	</p>
	<p>
	<code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
	uses minimal additional memory by comparison. Speed comparisons between
	the two haven't been made, but the better memory efficiency makes <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
	more general.
	</p>
	<p>
	<code class="computeroutput"><span class="identifier">postal_sort</span></code> and <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
	are similar. A direct performance comparison would be welcome, but an efficient
	version of <code class="computeroutput"><span class="identifier">postal_sort</span></code>
	was not found in a search for source.
	</p>
	<p>
	<code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
	is most similar to the <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
	flag sort</a> algorithm. The main difference is that it doesn't bother
	trying to optimize how empty buckets/piles are handled, instead just checking
	to see if all characters at the current index are equal. Other differences
	are using <a href="http://en.cppreference.com/w/cpp/algorithm/sort" target="_top">std::sort</a>
	as the fallback algorithm, and a larger fallback size (256 vs. 16), which
	makes empty pile handling less important.
	</p>
	<p>
	Another difference is not applying the stack-size restriction. Because
	of the equality check in <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>,
	it would take <span class="emphasis"><em>m*m</em></span> memory worth of strings to force
	<code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
	to create a stack of depth <span class="emphasis"><em>m</em></span>. This problem isn't a
	realistic one on modern systems with multi-megabyte stacksize limits, where
	main memory would be exhausted holding the long strings necessary to exceed
	the stacksize limit. <code class="literal"><code class="computeroutput"><a class="link" href="../../../boost/sort/spreadsort/string_sort_idp48004640.html" title="Function template string_sort">string_sort</a></code></code>
	can be thought of as modernizing <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
	flag sort</a> to take advantage of <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>
	as a fallback algorithm. In the author's testing, <a href="http://en.wikipedia.org/wiki/American_flag_sort" target="_top">American
	flag sort</a> (on <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">strings</span></code>)
	had comparable runtime to <a href="http://en.wikipedia.org/wiki/Introsort" target="_top">introsort</a>,
	but making a hybrid of the two allows reduced overhead and substantially
	superior performance.
	</p>
	</div>
	<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
	<td align="left"></td>
	<td align="right"><div class="copyright-footer">Copyright © 2014 Steven Ross<p>
	Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
	Software License, Version 1.0</a>.
	</p>
	</div></td>
	</tr></table>
	<hr>
	<div class="spirit-nav">
	<a accesskey="p" href="hybrid_radix.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../rationale.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unstable_sort.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
	</div>
	</body>
	</html>