blob: f243eb429ca254618b480b0423cd3a87c86fb82d [file] [log] [blame]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>Calculating confidence intervals on the mean with the Students-t distribution</title>
<link rel="stylesheet" href="../../../../../../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.74.0">
<link rel="home" href="../../../../../index.html" title="Math Toolkit">
<link rel="up" href="../st_eg.html" title="Student's t Distribution Examples">
<link rel="prev" href="../st_eg.html" title="Student's t Distribution Examples">
<link rel="next" href="tut_mean_test.html" title='Testing a sample mean for difference from a "true" mean'>
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../st_eg.html"><img src="../../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tut_mean_test.html"><img src="../../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section" lang="en">
<div class="titlepage"><div><div><h6 class="title">
<a name="math_toolkit.dist.stat_tut.weg.st_eg.tut_mean_intervals"></a><a class="link" href="tut_mean_intervals.html" title="Calculating confidence intervals on the mean with the Students-t distribution">
Calculating confidence intervals on the mean with the Students-t distribution</a>
</h6></div></div></div>
<p>
Let's say you have a sample mean, you may wish to know what confidence
intervals you can place on that mean. Colloquially: "I want an
interval that I can be P% sure contains the true mean". (On a
technical point, note that the interval either contains the true mean
or it does not: the meaning of the confidence level is subtly different
from this colloquialism. More background information can be found on
the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm" target="_top">NIST
site</a>).
</p>
<p>
The formula for the interval can be expressed as:
</p>
<p>
<span class="inlinemediaobject"><img src="../../../../../../equations/dist_tutorial4.png"></span>
</p>
<p>
Where, <span class="emphasis"><em>Y<sub>s</sub></em></span> is the sample mean, <span class="emphasis"><em>s</em></span>
is the sample standard deviation, <span class="emphasis"><em>N</em></span> is the sample
size, <span class="emphasis"><em>[alpha]</em></span> is the desired significance level
and <span class="emphasis"><em>t<sub>(&#945;/2,N-1)</sub></em></span> is the upper critical value of the
Students-t distribution with <span class="emphasis"><em>N-1</em></span> degrees of freedom.
</p>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top">
<p>
The quantity &#945; &#8203; is the maximum acceptable risk of falsely rejecting
the null-hypothesis. The smaller the value of &#945; the greater the strength
of the test.
</p>
<p>
The confidence level of the test is defined as 1 - &#945;, and often expressed
as a percentage. So for example a significance level of 0.05, is
equivalent to a 95% confidence level. Refer to <a href="http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm" target="_top">"What
are confidence intervals?"</a> in <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
e-Handbook of Statistical Methods.</a> for more information.
</p>
</td></tr>
</table></div>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>
The usual assumptions of <a href="http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables" target="_top">independent
and identically distributed (i.i.d.)</a> variables and <a href="http://en.wikipedia.org/wiki/Normal_distribution" target="_top">normal distribution</a>
of course apply here, as they do in other examples.
</p></td></tr>
</table></div>
<p>
From the formula, it should be clear that:
</p>
<div class="itemizedlist"><ul type="disc">
<li>
The width of the confidence interval decreases as the sample size
increases.
</li>
<li>
The width increases as the standard deviation increases.
</li>
<li>
The width increases as the <span class="emphasis"><em>confidence level increases</em></span>
(0.5 towards 0.99999 - stronger).
</li>
<li>
The width increases as the <span class="emphasis"><em>significance level decreases</em></span>
(0.5 towards 0.00000...01 - stronger).
</li>
</ul></div>
<p>
The following example code is taken from the example program <a href="../../../../../../../../example/students_t_single_sample.cpp" target="_top">students_t_single_sample.cpp</a>.
</p>
<p>
We'll begin by defining a procedure to calculate intervals for various
confidence levels; the procedure will print these out as a table:
</p>
<pre class="programlisting"><span class="comment">// Needed includes:
</span><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">distributions</span><span class="special">/</span><span class="identifier">students_t</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>
<span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">iostream</span><span class="special">&gt;</span>
<span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">iomanip</span><span class="special">&gt;</span>
<span class="comment">// Bring everything into global namespace for ease of use:
</span><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
<span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
<span class="keyword">void</span> <span class="identifier">confidence_limits_on_mean</span><span class="special">(</span>
<span class="keyword">double</span> <span class="identifier">Sm</span><span class="special">,</span> <span class="comment">// Sm = Sample Mean.
</span> <span class="keyword">double</span> <span class="identifier">Sd</span><span class="special">,</span> <span class="comment">// Sd = Sample Standard Deviation.
</span> <span class="keyword">unsigned</span> <span class="identifier">Sn</span><span class="special">)</span> <span class="comment">// Sn = Sample Size.
</span><span class="special">{</span>
<span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
<span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>
<span class="comment">// Print out general info:
</span> <span class="identifier">cout</span> <span class="special">&lt;&lt;</span>
<span class="string">"__________________________________\n"</span>
<span class="string">"2-Sided Confidence Limits For Mean\n"</span>
<span class="string">"__________________________________\n\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">7</span><span class="special">);</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Number of Observations"</span> <span class="special">&lt;&lt;</span> <span class="string">"= "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sn</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Mean"</span> <span class="special">&lt;&lt;</span> <span class="string">"= "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sm</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Standard Deviation"</span> <span class="special">&lt;&lt;</span> <span class="string">"= "</span> <span class="special">&lt;&lt;</span> <span class="identifier">Sd</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
</pre>
<p>
We'll define a table of significance/risk levels for which we'll compute
intervals:
</p>
<pre class="programlisting"><span class="keyword">double</span> <span class="identifier">alpha</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">0.5</span><span class="special">,</span> <span class="number">0.25</span><span class="special">,</span> <span class="number">0.1</span><span class="special">,</span> <span class="number">0.05</span><span class="special">,</span> <span class="number">0.01</span><span class="special">,</span> <span class="number">0.001</span><span class="special">,</span> <span class="number">0.0001</span><span class="special">,</span> <span class="number">0.00001</span> <span class="special">};</span>
</pre>
<p>
Note that these are the complements of the confidence/probability levels:
0.5, 0.75, 0.9 .. 0.99999).
</p>
<p>
Next we'll declare the distribution object we'll need, note that the
<span class="emphasis"><em>degrees of freedom</em></span> parameter is the sample size
less one:
</p>
<pre class="programlisting"><span class="identifier">students_t</span> <span class="identifier">dist</span><span class="special">(</span><span class="identifier">Sn</span> <span class="special">-</span> <span class="number">1</span><span class="special">);</span>
</pre>
<p>
Most of what follows in the program is pretty printing, so let's focus
on the calculation of the interval. First we need the t-statistic,
computed using the <span class="emphasis"><em>quantile</em></span> function and our significance
level. Note that since the significance levels are the complement of
the probability, we have to wrap the arguments in a call to <span class="emphasis"><em>complement(...)</em></span>:
</p>
<pre class="programlisting"><span class="keyword">double</span> <span class="identifier">T</span> <span class="special">=</span> <span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]</span> <span class="special">/</span> <span class="number">2</span><span class="special">));</span>
</pre>
<p>
Note that alpha was divided by two, since we'll be calculating both
the upper and lower bounds: had we been interested in a single sided
interval then we would have omitted this step.
</p>
<p>
Now to complete the picture, we'll get the (one-sided) width of the
interval from the t-statistic by multiplying by the standard deviation,
and dividing by the square root of the sample size:
</p>
<pre class="programlisting"><span class="keyword">double</span> <span class="identifier">w</span> <span class="special">=</span> <span class="identifier">T</span> <span class="special">*</span> <span class="identifier">Sd</span> <span class="special">/</span> <span class="identifier">sqrt</span><span class="special">(</span><span class="keyword">double</span><span class="special">(</span><span class="identifier">Sn</span><span class="special">));</span>
</pre>
<p>
The two-sided interval is then the sample mean plus and minus this
width.
</p>
<p>
And apart from some more pretty-printing that completes the procedure.
</p>
<p>
Let's take a look at some sample output, first using the <a href="http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm" target="_top">Heat
flow data</a> from the NIST site. The data set was collected by
Bob Zarr of NIST in January, 1990 from a heat flow meter calibration
and stability analysis. The corresponding dataplot output for this
test can be found in <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm" target="_top">section
3.5.2</a> of the <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
e-Handbook of Statistical Methods.</a>.
</p>
<pre class="programlisting"> __________________________________
2-Sided Confidence Limits For Mean
__________________________________
Number of Observations = 195
Mean = 9.26146
Standard Deviation = 0.02278881
___________________________________________________________________
Confidence T Interval Lower Upper
Value (%) Value Width Limit Limit
___________________________________________________________________
50.000 0.676 1.103e-003 9.26036 9.26256
75.000 1.154 1.883e-003 9.25958 9.26334
90.000 1.653 2.697e-003 9.25876 9.26416
95.000 1.972 3.219e-003 9.25824 9.26468
99.000 2.601 4.245e-003 9.25721 9.26571
99.900 3.341 5.453e-003 9.25601 9.26691
99.990 3.973 6.484e-003 9.25498 9.26794
99.999 4.537 7.404e-003 9.25406 9.26886
</pre>
<p>
As you can see the large sample size (195) and small standard deviation
(0.023) have combined to give very small intervals, indeed we can be
very confident that the true mean is 9.2.
</p>
<p>
For comparison the next example data output is taken from <span class="emphasis"><em>P.K.Hou,
O. W. Lau &amp; M.C. Wong, Analyst (1983) vol. 108, p 64. and from
Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 J. C.
Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.</em></span>
The values result from the determination of mercury by cold-vapour
atomic absorption.
</p>
<pre class="programlisting"> __________________________________
2-Sided Confidence Limits For Mean
__________________________________
Number of Observations = 3
Mean = 37.8000000
Standard Deviation = 0.9643650
___________________________________________________________________
Confidence T Interval Lower Upper
Value (%) Value Width Limit Limit
___________________________________________________________________
50.000 0.816 0.455 37.34539 38.25461
75.000 1.604 0.893 36.90717 38.69283
90.000 2.920 1.626 36.17422 39.42578
95.000 4.303 2.396 35.40438 40.19562
99.000 9.925 5.526 32.27408 43.32592
99.900 31.599 17.594 20.20639 55.39361
99.990 99.992 55.673 -17.87346 93.47346
99.999 316.225 176.067 -138.26683 213.86683
</pre>
<p>
This time the fact that there are only three measurements leads to
much wider intervals, indeed such large intervals that it's hard to
be very confident in the location of the mean.
</p>
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 2006 , 2007, 2008, 2009, 2010 John Maddock, Paul A. Bristow,
Hubert Holin, Xiaogang Zhang, Bruno Lalande, Johan R&#229;de, Gautam Sewani and
Thijs van den Berg<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>
</div></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../st_eg.html"><img src="../../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../st_eg.html"><img src="../../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="tut_mean_test.html"><img src="../../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>