Skip to main content
Medicine LibreTexts

8.1: Binomial Distribution

  • Page ID
    140402
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    There are three characteristics of a binomial experiment.

    1. There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
    2. The random variable, xx, number of successes, is discrete.
    3. There are only two possible outcomes, called "success" and "failure," for each trial. The letter p denotes the probability of a success on any one trial, and q denotes the probability of a failure on any one trial. p + q = 1.
    4. The n trials are independent and are repeated using identical conditions. Think of this as drawing WITH replacement. Because the n trials are independent, the outcome of one trial does not help in predicting the outcome of another trial. Another way of saying this is that for each individual trial, the probability, p, of a success and probability, q, of a failure remain the same. For example, randomly guessing at a true-false statistics question has only two outcomes. If a success is guessing correctly, then a failure is guessing incorrectly. Suppose a specific rapid diagnostic test correctly identifies the presence of a biomarker with a probability of p=0.6(success). Then, the probability of the test failing to identify the biomarker is q=0.4 (failure).

    The outcomes of a binomial experiment fit a binomial probability distribution. The random variable \(X=\) the number of successes obtained in the \(n\) independent trials.

    The mean, \(\mu\), and variance, \(\sigma^2\), for the binomial probability distribution are \(\mu=n p\) and \(\sigma^2=n p q\). The standard deviation, \(\sigma\), is then \(\sigma=\sqrt{n p q}\).

    Any experiment that has characteristics three and four and where \(n=1\) is called a Bernoulli Trial (named after Jacob Bernoulli who, in the late 1600s, studied them extensively). A binomial experiment takes place when the number of successes is counted in one or more Bernoulli Trials.

    Example \(\PageIndex{1}\)

    In a specialized community health program for hypertension management, the attrition rate (the rate at which participants drop out before completion) is 30% for any given cohort. This implies that, for any given cohort, 70% of the participants remain in the program for the entire duration.

    In this study, a "success" (the event we are counting) is defined as an individual who drops out of the program. The random variable X= the number of participants who drop out from a randomly selected hypertension management cohort.

    Example \(\PageIndex{2}\)

    Suppose you play a game that you can only either win or lose. The probability that you win any game is \(55 \%\), and the probability that you lose is \(45 \%\). Each game you play is independent. If you play the game 20 times, write the function that describes the probability that you win 15 of the 20 times. Here, if you define \(X\) as the number of wins, then \(X\) takes on the values \(0,1,2,3, \ldots, 20\). The probability of a success is \(p=0.55\). The probability of a failure is \(q=0.45\). The number of trials is \(n=20\). The probability question can be stated mathematically as \(P(x=15)\).

    Example \(\PageIndex{3}\)

    A public health researcher is studying a group of 5 patients who are receiving a specialized experimental treatment. Based on historical clinical data, the probability that an individual patient will show a "positive response" to this specific dosage is 0.25 (p=0.25), and the probability of no response is 0.75 (q=0.75). Each patient's response is independent.

    What is the probability that more than 3 patients in this group of five will show a positive response?

    Let X= the number of positive responders in 5 patients. X can take on the values {0,1,2,3,4,5}. The number of trials is n=5.

    State the probability question mathematically: P(X>3)

    Solution

    First, we develop the probability density function (PDF). With the fully developed PDF, we can simply sum the individual probabilities for x=4 and x=5 to find the solution for P(X>3). \(P(x>3)=P(x=4)+P(x=5)=0.0146+0.0007=0.0153\). We have added the two individual probabilities because of the addition rule from Probability Topics.

    Figure \(\PageIndex{1}\) also allows us to see the link between the probability density function and probability and area. We also see in Figure \(\PageIndex{1}\) the skew of the binomial distribution when p is not equal to 0.5 . In Figure 4.2 the distribution is skewed right as a result of \(\mu=n p=1.25\) because \(p=0.25\).

    Histogram of probability density function of the given data.
    Figure \(\PageIndex{1}\)

    \[\begin{aligned}
    P\left(x=x_0\right) & =\binom{n}{x} p^x(1-p)^{n-x} \\
    = & \binom{5}{x_0} \cdot 25^{x_0} \cdot 75^{5-x_0} \\
    & \text { etc. } \\
    \mu= & \mathrm{np}=1.25
    \end{aligned}\]

    Notation for the Binomial: \(B=\) Binomial Probability Distribution Function

    \(X \sim B(n, p)\)

    Read this as " \(X\) is a random variable with a binomial distribution." The parameters are \(n\) and \(p ; n=\) number of trials, \(p=\) probability of a success on each trial.

    Example \(\PageIndex{14}\)

    In a study on social determinants of health, a public health researcher finds that approximately 41% of adult workers in a specific region have a high school diploma as their highest level of education. If 20 adult workers are randomly selected from this region for a health literacy survey, find the probability that at most 12 of them have only a high school diploma. Additionally, how many workers from this sample would you expect to fall into this category?

    Solution:

    1. Identify the Parameters: Let X= the number of workers whose highest education is a high school diploma.

    • X takes on the values {0,1,2,…,20}
    • n=20 (number of workers sampled)
    • p=0.41 (probability of "success")
    • q=1−0.41=0.59 (probability of "failure")
    • Distribution: X∼B(20,0.41)

    2. Find the Probability P(X≤12): To find the probability of "at most 12," we calculate the cumulative probability from x=0 through x=12:

    • P(X≤12)=0.9738 (calculated using the binomial CDF function)

    3. Find the Expected Value (μ): The "expected" number of workers is the mean of the binomial distribution:

    • μ=n⋅p
    • μ=20⋅0.41=8.2
    • Interpretation: Out of 20 randomly selected workers, you would expect about 8.2 of them to have only a high school diploma.

    USING THE TI-83, 83+, 84, 84+ CALCULATOR

    • Go into \(2^{\text {nd }}\) DISTR. The syntax for the instructions are as follows:
    • To calculate ( \(x=\) value): binompdf( \(n, p\), number) if "number" is left out, the result is the binomial probability table. \(\_\_\_\_\)
    • To calculate \(P(x \leq\) value): binomodf( \(n, p\), number) if "number" is left out, the result is the cumulative binomial probability table.
    • For this problem: After you are in \(2^{\text {nd }}\) DISTR, arrow down to binomcdf. Press ENTER. Enter \(20,0.41,12\) ). The result is \(P(x \leq 12)=0.9738\).
    Note

    If you want to find \(P(x=12)\), use the pdf (binompdf). If you want to find \(P(x>12)\), use 1 binomcdf(20,0.41,12).

    The probability that at most 12 workers have a high school diploma but do not pursue any further education is 0.9738 .

    The graph of \(X \sim B(20,0.41)\) is as follows:

    This histogram shows a binomial probability distribution. It is made up of bars that are fairly normally distributed. The x-axis shows values from 0 to 20. The y-axis shows values from 0 to 0.2 in increments of 0.05.
    Figure \(\PageIndex{1}\)

    The \(y\)-axis contains the probability of \(x\), where \(X=\) the number of workers who have only a high school diploma.

    The number of adult workers that you expect to have a high school diploma but not pursue any further education is the mean, \(\mu=n p=(20)(0.41)=8.2\).

    The formula for the variance is \(\sigma^2=n p q\). The standard deviation is \(\sigma=\sqrt{n p q}\).

    \[\sigma=\sqrt{(20)(0.41)(0.59)}=2.20 \]

    Exercise \(\PageIndex{2}\)

    A large municipal hospital has a registry of 560 patient files from the past year. Within these records, eight files contain a rare "signature biomarker" that qualifies the patient for a specialized longitudinal study. A public health researcher randomly samples 100 files to audit. Let X= the number of sampled files that feature the signature biomarker.
    a. What values does \(x\) take on?
    b. What is the probability distribution? Find the following probabilities:
    i. The probability that exactly two files feature the biomarker
    ii. The probability that at most six files feature the biomarker
    iii. The probability that more than three files feature the biomarker
    c. Using the formulas, calculate the (i) mean and (ii) standard deviation.

    Answer

    a. \(x=0,1,2,3,4,5,6,7,8\)
    b. \(X \sim B\left(100, \frac{8}{560}\right)\)
    i. \(P(x=2)=\) binompdf \(\left(100, \frac{8}{560}, 2\right)=0.2466\)
    ii. \(P(x \leq 6)=\operatorname{binomcdf}\left(100, \frac{8}{560}, 6\right)=0.9994\)
    iii. \(P(x>3)=1-P(x \leq 3)=1-\operatorname{binomcdf}\left(100, \frac{8}{560}, 3\right)=1-0.9443=0.0557\)
    c.
    i. Mean \(=n p=(100)\left(\frac{8}{560}\right)=\frac{800}{560} \approx 1.4286\)
    ii. Standard Deviation \(=\sqrt{n p q}=\sqrt{(100)\left(\frac{8}{560}\right)\left(\frac{552}{560}\right)} \approx 1.1867\)

    Exercise \(\PageIndex{3}\)

    The lifetime risk of developing cancer is about one in 67 (1.5%). Suppose we randomly sample 200 people. Let X = the number of people who will develop cancer.

    Problem

    1. What is the probability distribution for X?
    2. Using the formulas, calculate the (i) mean and (ii) standard deviation of X.
    3. Use your calculator to find the probability that at most eight people develop cancer
    4. Is it more likely that five or six people will develop cancer? Justify your answer numerically.
    Answer

    a. \(X \sim B(200,0.015)\)
    b.

    \[\begin{array}{l}
    \text { Mean }=n p=200(0.015)=3 \\
    \text { Standard Deviation }=\sqrt{n p q}=\sqrt{200(0.015)(0.985)}=1.719
    \end{array}\]

    c. \(P(x \leq 8)=0.9965\)
    d. The probability that five people develop cancer is 0.1011 . The probability that six people develop cancer is 0.0500 .

    Example \(\PageIndex{5}\)

    The following example illustrates a scenario that is not binomial because it violates the condition of independence.

    A hospital's Clinical Safety Committee is composed of ten senior physicians and six nursing staff members. The committee needs to select a Chairperson and a Recording Secretary for the upcoming year. What is the probability that both the Chairperson and the Recording Secretary are nursing staff members?

    To ensure fairness, the names of all 16 committee members are placed in a pool, and two names are drawn without replacement. The first name drawn becomes the Chairperson, and the second name becomes the Recording Secretary.

    Why this is not a Binomial Distribution:

    There are two trials (two names drawn), but the trials are not independent because the outcome of the first draw changes the probabilities for the second draw.

    1. First Trial: The probability of selecting a nursing staff member on the first draw is:

      P(Nurse1​)=166​

    2. Second Trial (Dependent):
      • If the first person drawn was a nurse, only 5 nurses and 15 total people remain. The probability of the second person being a nurse is:

        P(Nurse2​∣Nurse1​)=155​

      • If the first person drawn was a physician, all 6 nurses still remain, but only 15 total people remain. The probability of the second person being a nurse is:

        P(Nurse2​∣Physician1​)=156​

    Because the probability of selecting a nursing staff member changes based on the first result, the condition of independence is violated. This type of problem is actually solved using the Hypergeometric Distribution.

    The Probability of Both being Nurses:

    To find the probability that both are nursing staff members, we multiply the dependent probabilities:

    P(\text{Both Nurses}) = \left(\frac{6}{16}\right) \times \left(\frac{5}{15}\right) = \frac{30}{240} = \mathbf{0.125 \text{ (or 12.5%)}}


    This page titled 8.1: Binomial Distribution is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.