During a national health campaign, we see reports in journals that state confidence intervals in terms of proportions or percentages. For example, a study on a particular physical activity intervention might show that 40% of participants met the recommended weekly exercise guidelines within three percentage points (if the sample is large enough). Often, these health polls are calculated with 95% confidence; so, the researchers would be 95% confident that the true proportion of the population who met the guidelines would be between 0.37 and 0.43: (0.40–0.03,0.40+0.03).
Kinesiologists are interested in the true proportion of athletes who experience over-training symptoms during a competitive season. Nutritionists are interested in the proportion of households in the United States that meet the daily fiber intake requirements. Confidence intervals can be calculated for the true proportion of athletes who experience those symptoms each week and for the true proportion of households in the United States that meet nutritional standards.
The procedure to find the confidence interval for a population proportion is similar to that for the population mean, but the formulas are a bit different although conceptually identical. While the formulas are different, they are based upon the same mathematical foundation given to us by the Central Limit Theorem. Because of this, we will see the same basic format using the same three pieces of information: the sample value of the parameter in question, the standard deviation of the relevant sampling distribution, and the number of standard deviations we need to have the confidence in our estimate that we desire.
How do you know you are dealing with a proportion problem? First, the underlying distribution has a binary random variable and therefore is a binomial distribution. (There is no mention of a mean or average.) If \(X\) is a binomial random variable, then \(X \sim B(n, p)\) where \(n\) is the number of trials and \(p\) is the probability of a success. To form a sample proportion, take \(X\), the random variable for the number of successes and divide it by \(n\), the number of trials (or the sample size). The random variable \(p^{\prime}\) (read "P prime") is the sample proportion,
\[P^{\prime}=\dfrac{X}{n}\]
(Sometimes the random variable is denoted as \(\widehat{P}\), read "P hat".)
\(p^{\prime}=\) the estimated proportion of successes or sample proportion of successes ( \(p^{\prime}\) is a point estimate for \(p\), the true population proportion, and thus \(q\) is the probability of a failure in any one trial.)
\(x=\) the number of successes in the sample
\(n=\) the size of the sample
The formula for the confidence interval for a population proportion follows the same format as that for an estimate of a population mean. Remembering the sampling distribution for the proportion from The Central Limit Theorem, the standard deviation was found to be:
\[\sigma_{\mathrm{p}},=\sqrt{\dfrac{p(1-p)}{n}}\]
The confidence interval for a population proportion, therefore, becomes:
\[p=p^{\prime} \pm\left[Z_{\left(\dfrac{a}{2}\right)} \sqrt{\dfrac{p^{\prime}\left(1-p^{\prime}\right)}{n}}\right]\]
\(Z_{\left(\dfrac{a}{2}\right)}\) is set according to our desired degree of confidence and \(\sqrt{\dfrac{p^{\prime}\left(1-p^{\prime}\right)}{n}}\) is the standard deviation of the sampling distribution.
The sample proportions \(p^{\prime}\) and \(q^{\prime}\) are estimates of the unknown population proportions \(p\) and \(q\). The estimated proportions \(p^{\prime}\) and \(q^{\prime}\) are used because \(p\) and \(q\) are not known.
Remember that as \(p\) moves further from 0.5 the binomial distribution becomes less symmetrical. Because we are estimating the binomial with the symmetrical normal distribution the further away from symmetrical the binomial becomes the less confidence we have in the estimate.
This conclusion can be demonstrated through the following analysis. Proportions are based upon the binomial probability distribution. The possible outcomes are binary, either "success" or "failure". This gives rise to a proportion, meaning the percentage of the outcomes that are "successes". It was shown that the binomial distribution could be fully understood if we knew only the probability of a success in any one trial, called p. The mean and the standard deviation of the binomial were found to be:
\[\begin{array}{c}
\mu=\mathrm{np} \\
\sigma=\sqrt{\mathrm{np} q}
\end{array}\]
It was also shown that the binomial could be estimated by the normal distribution if BOTH np AND nq were greater than 5 . From the discussion above, it was found that the standardizing formula for the binomial distribution is:
\[Z=\dfrac{\mathrm{p}^{\prime}-p}{\sqrt{\left(\dfrac{p q}{n}\right)}}\]
which is nothing more than a restatement of the general standardizing formula with appropriate substitutions for \(\mu\) and \(\sigma\) from the binomial. We can use the standard normal distribution, the reason \(Z\) is in the equation, because the normal distribution is the limiting distribution of the binomial. This is another example of the Central Limit Theorem. We have already seen that the sampling distribution of means is normally distributed. Recall the extended discussion in The Central Limit Theorem concerning the sampling distribution of proportions and the conclusions of the Central Limit Theorem.
We can now manipulate this formula in just the same way we did for finding the confidence intervals for a mean, but to find the confidence interval for the binomial population parameter, p.
\[p^{\prime}-Z_{\dfrac{\alpha}{2}} \sqrt{\dfrac{p^{\prime} q^{\prime}}{n}} \leq p \leq p^{\prime}+Z_{\dfrac{\alpha}{2}} \sqrt{\dfrac{p^{\prime} q^{\prime}}{n}}\]
Where \(p^{\prime}=\mathrm{x} / \mathrm{n}\), the point estimate of \(p\) taken from the sample. Notice that \(p^{\prime}\) has replaced \(p\) in the formula. This is because we do not know \(p\), indeed, this is just what we are trying to estimate.
Unfortunately, there is no correction factor for cases where the sample size is small so \(\mathrm{np}^{\prime}\) and \(\mathrm{nq}^{\prime}\) must always be greater than 5 to develop an interval estimate for \(p\).
Suppose that a public health research firm is hired to estimate the percentage of adults living in a large city who use wearable fitness trackers to monitor their daily physical activity. Five hundred randomly selected adult residents in this city are surveyed to determine whether they use these devices. Of the 500 people sampled, 421 responded yes—they do own and use a fitness tracker.
Problem
Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who use wearable fitness trackers.
- Answer
-
The solution step-by-step.
Let \(X=\) the number of people in the sample who have smartphones. \(X\) is binomial: the random variable is binary, people either have a smartphone or they do not.
To calculate the confidence interval, we must find \(p^{\prime}, q^{\prime}\).
\(n=500\)
\(x=\) the number of successes in the sample \(=421\)
\(p^{\prime}=\dfrac{x}{n}=\dfrac{421}{500}=0.842\)
\(p^{\prime}=0.842\) is the sample proportion; this is the point estimate of the population proportion.
\(q^{\prime}=1-p^{\prime}=1-0.842=0.158\)
Since the requested confidence level is \(C L=0.95\), then \(\alpha=1-C L=1-0.95=0.05\left(\dfrac{\alpha}{2}\right)=0.025\).
Then \(z_{\dfrac{\alpha}{2}}=z_{0.025}=1.96\)
This can be found using the Standard Normal probability table in Appendix A Statistical Tables. This can also be found in the students \(t\) table at the 0.025 column and infinity degrees of freedom because at infinite degrees of freedom the students \(t\)-distribution becomes the standard normal distribution, \(\mathbf{Z}\).
The confidence interval for the true binomial population proportion is
\[p^{\prime}-Z_\alpha \sqrt{\dfrac{p^{\prime} q^{\prime}}{n}} \leq p \leq p^{\prime}+Z_\alpha \sqrt{\dfrac{p^{\prime} q^{\prime}}{n}}\]
Substituting in the values from above we find the confidence interval is : \(0.810 \leq p \leq 0.874\)
Interpretation: We estimate with 95\% confidence that between 81\% and 87.4\% of all adult residents of this city use wearable fitness trackers.
Explanation of 95\% Confidence Level: Ninety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who use wearable fitness trackers.
The Dundee Sports Performance Academy tracks the progress of its athletes into elite ranks. To understand the broader landscape, a study examines the proportion of athletes from 150 different performance academies who successfully transition to competing in professional-level sporting events.
A confidence interval for the population proportion of these elite athletes is constructed based on the study. The lower limit is determined to be 0.08 and the upper limit is determined to be 0.16.
Determine the level of confidence used to construct the interval for the population proportion of athletes who compete in professional events.
- Answer
-
We begin with the formula for a confidence interval for a proportion because the random variable is binary; either the client competes in professional sporting events or they don't.
\[p=p^{\prime} \pm\left[Z_{\left(\dfrac{a}{2}\right)} \sqrt{\dfrac{p^{\prime}\left(1-p^{\prime}\right)}{n}}\right]\]
Next we find the sample proportion:
\[p^{\prime}=\dfrac{0.08+0.16}{2}=0.12\]
The \(\pm\) that makes up the confidence interval is thus \(0.04 ; 0.12+0.04=0.16\) and \(0.12-0.04=0.08\), the boundaries of the confidence interval. Finally, we solve for \(Z\).
\[\left[Z \cdot \sqrt{\dfrac{0.12(1-0.12)}{150}}\right]=0.04, \text { therefore } Z=1.51\]
And then look up the probability for 1.51 standard deviations on the standard normal table.
\[p(Z=1.51)=0.4345, p(Z) \cdot 2=0.8690 \text { or } 86.90 \%\]


