blob: 9d1a96f5b939b6fa3f7eabdb9cae2c0ed3782381 [file] [log] [blame]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>Confidence Intervals on the Standard Deviation</title>
<link rel="stylesheet" href="../../../../../../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.74.0">
<link rel="home" href="../../../../../index.html" title="Math Toolkit">
<link rel="up" href="../cs_eg.html" title="Chi Squared Distribution Examples">
<link rel="prev" href="../cs_eg.html" title="Chi Squared Distribution Examples">
<link rel="next" href="chi_sq_test.html" title="Chi-Square Test for the Standard Deviation">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../cs_eg.html"><img src="../../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../cs_eg.html"><img src="../../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="chi_sq_test.html"><img src="../../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section" lang="en">
<div class="titlepage"><div><div><h6 class="title">
<a name="math_toolkit.dist.stat_tut.weg.cs_eg.chi_sq_intervals"></a><a class="link" href="chi_sq_intervals.html" title="Confidence Intervals on the Standard Deviation">
Confidence Intervals on the Standard Deviation</a>
</h6></div></div></div>
<p>
Once you have calculated the standard deviation for your data, a legitimate
question to ask is "How reliable is the calculated standard deviation?".
For this situation the Chi Squared distribution can be used to calculate
confidence intervals for the standard deviation.
</p>
<p>
The full example code &amp; sample output is in <a href="../../../../../../../../example/chi_square_std_dev_test.cpp" target="_top">chi_square_std_deviation_test.cpp</a>.
</p>
<p>
We'll begin by defining the procedure that will calculate and print
out the confidence intervals:
</p>
<pre class="programlisting"><span class="keyword">void</span> <span class="identifier">confidence_limits_on_std_deviation</span><span class="special">(</span>
<span class="keyword">double</span> <span class="identifier">Sd</span><span class="special">,</span> <span class="comment">// Sample Standard Deviation
</span> <span class="keyword">unsigned</span> <span class="identifier">N</span><span class="special">)</span> <span class="comment">// Sample size
</span><span class="special">{</span>
</pre>
<p>
We'll begin by printing out some general information:
</p>
<pre class="programlisting"><span class="identifier">cout</span> <span class="special">&lt;&lt;</span>
<span class="string">"________________________________________________\n"</span>
<span class="string">"2-Sided Confidence Limits For Standard Deviation\n"</span>
<span class="string">"________________________________________________\n\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">7</span><span class="special">);</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Number of Observations"</span> <span class="special">&lt;&lt;</span> <span class="string">"= "</span> <span class="special">&lt;&lt;</span> <span class="identifier">N</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Standard Deviation"</span> <span class="special">&lt;&lt;</span> <span class="string">"= "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sd</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
</pre>
<p>
and then define a table of significance levels for which we'll calculate
intervals:
</p>
<pre class="programlisting"><span class="keyword">double</span> <span class="identifier">alpha</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">0.5</span><span class="special">,</span> <span class="number">0.25</span><span class="special">,</span> <span class="number">0.1</span><span class="special">,</span> <span class="number">0.05</span><span class="special">,</span> <span class="number">0.01</span><span class="special">,</span> <span class="number">0.001</span><span class="special">,</span> <span class="number">0.0001</span><span class="special">,</span> <span class="number">0.00001</span> <span class="special">};</span>
</pre>
<p>
The distribution we'll need to calculate the confidence intervals is
a Chi Squared distribution, with N-1 degrees of freedom:
</p>
<pre class="programlisting"><span class="identifier">chi_squared</span> <span class="identifier">dist</span><span class="special">(</span><span class="identifier">N</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
</pre>
<p>
For each value of alpha, the formula for the confidence interval is
given by:
</p>
<p>
<span class="inlinemediaobject"><img src="../../../../../../equations/chi_squ_tut1.png"></span>
</p>
<p>
Where <span class="inlinemediaobject"><img src="../../../../../../equations/chi_squ_tut2.png"></span> is the upper critical value, and <span class="inlinemediaobject"><img src="../../../../../../equations/chi_squ_tut3.png"></span> is
the lower critical value of the Chi Squared distribution.
</p>
<p>
In code we begin by printing out a table header:
</p>
<pre class="programlisting"><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"\n\n"</span>
<span class="string">"_____________________________________________\n"</span>
<span class="string">"Confidence Lower Upper\n"</span>
<span class="string">" Value (%) Limit Limit\n"</span>
<span class="string">"_____________________________________________\n"</span><span class="special">;</span>
</pre>
<p>
and then loop over the values of alpha and calculate the intervals
for each: remember that the lower critical value is the same as the
quantile, and the upper critical value is the same as the quantile
from the complement of the probability:
</p>
<pre class="programlisting"><span class="keyword">for</span><span class="special">(</span><span class="keyword">unsigned</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special">&lt;</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">alpha</span><span class="special">)/</span><span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">alpha</span><span class="special">[</span><span class="number">0</span><span class="special">]);</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span>
<span class="special">{</span>
<span class="comment">// Confidence value:
</span> <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">fixed</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">3</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">10</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">right</span> <span class="special">&lt;&lt;</span> <span class="number">100</span> <span class="special">*</span> <span class="special">(</span><span class="number">1</span><span class="special">-</span><span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]);</span>
<span class="comment">// Calculate limits:
</span> <span class="keyword">double</span> <span class="identifier">lower_limit</span> <span class="special">=</span> <span class="identifier">sqrt</span><span class="special">((</span><span class="identifier">N</span> <span class="special">-</span> <span class="number">1</span><span class="special">)</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">/</span> <span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special">/</span> <span class="number">2</span><span class="special">)));</span>
<span class="keyword">double</span> <span class="identifier">upper_limit</span> <span class="special">=</span> <span class="identifier">sqrt</span><span class="special">((</span><span class="identifier">N</span> <span class="special">-</span> <span class="number">1</span><span class="special">)</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">/</span> <span class="identifier">quantile</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special">/</span> <span class="number">2</span><span class="special">));</span>
<span class="comment">// Print Limits:
</span> <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">fixed</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">5</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">15</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">right</span> <span class="special">&lt;&lt;</span> <span class="identifier">lower_limit</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">fixed</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">5</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">15</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">right</span> <span class="special">&lt;&lt;</span> <span class="identifier">upper_limit</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
<span class="special">}</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
</pre>
<p>
To see some example output we'll use the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3581.htm" target="_top">gear
data</a> from the <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
e-Handbook of Statistical Methods.</a>. The data represents measurements
of gear diameter from a manufacturing process.
</p>
<pre class="programlisting">________________________________________________
2-Sided Confidence Limits For Standard Deviation
________________________________________________
Number of Observations = 100
Standard Deviation = 0.006278908
_____________________________________________
Confidence Lower Upper
Value (%) Limit Limit
_____________________________________________
50.000 0.00601 0.00662
75.000 0.00582 0.00685
90.000 0.00563 0.00712
95.000 0.00551 0.00729
99.000 0.00530 0.00766
99.900 0.00507 0.00812
99.990 0.00489 0.00855
99.999 0.00474 0.00895
</pre>
<p>
So at the 95% confidence level we conclude that the standard deviation
is between 0.00551 and 0.00729.
</p>
<a name="math_toolkit.dist.stat_tut.weg.cs_eg.chi_sq_intervals.confidence_intervals_as_a_function_of_the_number_of_observations"></a><h5>
<a name="id938955"></a>
<a class="link" href="chi_sq_intervals.html#math_toolkit.dist.stat_tut.weg.cs_eg.chi_sq_intervals.confidence_intervals_as_a_function_of_the_number_of_observations">Confidence
intervals as a function of the number of observations</a>
</h5>
<p>
Similarly, we can also list the confidence intervals for the standard
deviation for the common confidence levels 95%, for increasing numbers
of observations.
</p>
<p>
The standard deviation used to compute these values is unity, so the
limits listed are <span class="bold"><strong>multipliers</strong></span> for
any particular standard deviation. For example, given a standard deviation
of 0.0062789 as in the example above; for 100 observations the multiplier
is 0.8780 giving the lower confidence limit of 0.8780 * 0.006728 =
0.00551.
</p>
<pre class="programlisting">____________________________________________________
Confidence level (two-sided) = 0.0500000
Standard Deviation = 1.0000000
________________________________________
Observations Lower Upper
Limit Limit
________________________________________
2 0.4461 31.9102
3 0.5207 6.2847
4 0.5665 3.7285
5 0.5991 2.8736
6 0.6242 2.4526
7 0.6444 2.2021
8 0.6612 2.0353
9 0.6755 1.9158
10 0.6878 1.8256
15 0.7321 1.5771
20 0.7605 1.4606
30 0.7964 1.3443
40 0.8192 1.2840
50 0.8353 1.2461
60 0.8476 1.2197
100 0.8780 1.1617
120 0.8875 1.1454
1000 0.9580 1.0459
10000 0.9863 1.0141
50000 0.9938 1.0062
100000 0.9956 1.0044
1000000 0.9986 1.0014
</pre>
<p>
With just 2 observations the limits are from <span class="bold"><strong>0.445</strong></span>
up to to <span class="bold"><strong>31.9</strong></span>, so the standard deviation
might be about <span class="bold"><strong>half</strong></span> the observed value
up to <span class="bold"><strong>30 times</strong></span> the observed value!
</p>
<p>
Estimating a standard deviation with just a handful of values leaves
a very great uncertainty, especially the upper limit. Note especially
how far the upper limit is skewed from the most likely standard deviation.
</p>
<p>
Even for 10 observations, normally considered a reasonable number,
the range is still from 0.69 to 1.8, about a range of 0.7 to 2, and
is still highly skewed with an upper limit <span class="bold"><strong>twice</strong></span>
the median.
</p>
<p>
When we have 1000 observations, the estimate of the standard deviation
is starting to look convincing, with a range from 0.95 to 1.05 - now
near symmetrical, but still about + or - 5%.
</p>
<p>
Only when we have 10000 or more repeated observations can we start
to be reasonably confident (provided we are sure that other factors
like drift are not creeping in).
</p>
<p>
For 10000 observations, the interval is 0.99 to 1.1 - finally a really
convincing + or -1% confidence.
</p>
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 2006 , 2007, 2008, 2009, 2010 John Maddock, Paul A. Bristow,
Hubert Holin, Xiaogang Zhang, Bruno Lalande, Johan R&#229;de, Gautam Sewani and
Thijs van den Berg<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>
</div></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../cs_eg.html"><img src="../../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../cs_eg.html"><img src="../../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="chi_sq_test.html"><img src="../../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>