Skip to main content
Medicine LibreTexts

8.2: Geometric Distribution

  • Page ID
    140403
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    There are four main characteristics of a Geometric Experiment, which we often use to model how many attempts a patient needs before they successfully master a new clinical task.

    1. A trial is repeated until a success occurs. You can think of this as a series of Bernoulli trials where every attempt is a "failure" (the task was not completed correctly) until the very last one, which is a "success." Then you stop.

    • Clinical Example: A patient in cardiac rehab is learning to monitor their own pulse. They repeat the process until they successfully find their pulse for the first time. If it takes six tries, the sequence is: failure, failure, failure, failure, failure, success, STOP. In theory, the number of trials could go on forever.

    2. The repeated trials are independent of each other. In this mathematical model, we assume the patient's likelihood of success on the current attempt is not influenced by the previous attempt.

    3. The probability of success (p) and failure (q) is the same for each trial. The relationship is defined as p+q=1 and q=1−p.

    • Example: Suppose a patient has a 1/6 (approx. 16.7%) probability of completing a difficult balance exercise correctly on any given try. If we want to find the probability that their first success happens on the fifth try, they must fail the first four times and succeed on the fifth.
    • The Calculation: (5/6)(5/6)(5/6)(5/6​)(1/6​)=0.0804

    4. The random variable X represents the number of the trial in which the first success occurs. Specifically, X= the number of independent trials required until the first successful completion of the task.

    The following are additional attributes of the geometric distribution:

    1. The random variable can be defined in two ways depending on what the clinician or researcher is interested in measuring. In our example of the balance exercise Method A: "What is the probability that the first success will be on the fifth try?"
      Method B: "What is the probability that it takes four failures before a success occurs?
      While these sound identical, each way of framing the question slightly alters the mathematical form of the probability density function and changes the resulting mean and standard deviation.

    2. Implicit in this model is that the probability of success remains constant. In real-world rehabilitation, a patient usually "learns," meaning their probability of success (p) should theoretically improve with each attempt. However, the geometric distribution cannot capture learning. It assumes the patient is just as likely to succeed on the first attempt as they are on the twentieth. We assume a "historical" success rate that remains fixed throughout the session.
    3. The geometric distribution is “memoryless.” There are very few probability density functions that are what is known as “memoryless,” and the Geometric distribution is the only one with a discrete random variable that is memoryless. As an example, historically Major League Baseball player Jones has a record of hitting the ball for at least an advance to first base with a probability of 0.20. Jones has not had a hit in his last 10 times at bat. What is the probability that Jones will get a hit in his third time at bat? The answer ignores his 10 previous failures. All events prior to the events in current time are irrelevant and thus are considered “memoryless.”

    Formally: \(P(x=n+k \mid x \geq k+1)=P(x=n)\), where \(\mathrm{k}=\) number of previous failures

    Jones’s probability of a hit begins anew each time he comes to bat. This feature of the geometric distribution results in a curious result: Drawing parts from a manufacturing process to test for parts that are defective, the geometric distribution begins with a clean slate each time the tests begin with no consideration of previous test results. More on this when we get to the exponential probability density function.

    Exercise \(\PageIndex{1}\)

    You play a game of chance that you can either win or lose (there are no other possibilities) until you lose. Your probability of losing is p = 0.57. What is the probability that it takes five games until you lose? Let X = the number of games you play until you lose (includes the losing game). Then X takes on the values 1, 2, 3, ... (could go on indefinitely). The probability question is P(x = 5).

    Exercise \(\PageIndex{2}\)

    A hospital safety officer believes that 35% of all patient safety incidents in the facility are caused by a failure to follow standard surgical checklists. They decide to randomly audit incident reports (replacing each report after reading to maintain independence) until they find one that specifically cites a failure to follow the checklist.

    Problem Setup

    Let X = the number of reports the safety officer must examine until they find a report showing an incident caused by a failure to follow the checklist. X can take on the values 1,2,3,…,∞.

    a. On average, how many reports would the safety officer expect to review until they find one that shows a failure to follow the checklist? This asks for the expected value (mean). Even though the reports are discrete (you can't read half a report), the average doesn't have to be a whole number.

    Using the formula for the mean of a geometric distribution: μ=E(x)=1/p μ=1/0.35≈2.8 reports

    b. What is the probability that the safety officer will have to examine at least three reports until they find a report showing a failure to follow the checklist? This question asks for P(x≥3). Unlike the binomial distribution, we cannot add up every individual probability because the geometric distribution goes to infinity. Instead, we use a complement or a shortcut formula.

    Solution Using Shortcuts

    In the geometric distribution, we have specific "more than" and "less than" shortcut formulas:

    • To find the probability it takes MORE THAN n trials to get the first success: P(x>n)=(1−p)n
    • To find the probability it takes LESS THAN n trials for success: P(x<n)=1−(1−p)n

    Applying the formula to our audit: Since "at least 3" (x≥3) is the same as saying "more than 2" (x>2), we can use the first shortcut where n=2: P(x≥3)=P(x>2) P(x>2)=(1−0.35)2 P(x>2)=(0.65)2=0.4225

    Conclusion: There is a 42.25% chance that the safety officer will have to look at 3 or more reports before finding one involving a checklist failure.

    Exercise \(\PageIndex{3}\)

    Suppose you are a public health worker conducting a localized study on environmental health. You are looking for a resident in a specific district who lives within five miles of a local industrial site. You know from census data that 55% of the 25,000 residents in this district live within that five-mile radius.

    You randomly contact residents until one confirms they live within five miles of the site. What is the probability that you need to contact exactly four people to find your first resident?

    Problem

    a. Let X = the number of ____________ you must ask ____________ one says yes.

    b. What values does X take on?

    c. What are p and q?

    d. The probability question is P(_______).

    Answer

    a. Let X = the number of residents you must ask until one says yes.

    b. 1, 2, 3, …, (total number of students)

    c. p = 0.55; q = 0.45

    d. P(x = 4)

    Notation for the Geometric: G = Geometric Probability Distribution Function

    \(X \sim G(p)\)

    Read this as "X is a random variable with a geometric distribution." The parameter is p; p = the probability of a success for each trial.

    CASE I: Random Variable X Is Event of First Success

    In this case we ask, “What is the probability that we will have some number x of events of interest to us of failures before a success?”

    The geometric pdf tells us the probability that the first occurrence of success requires x number of failure independent trials, each with probability (1-p). If the probability of success on each trial is p, then the probability that the xth trial (out of x trials) is the first success is:

    \[P(X=x)=(1-p)^{x-1} p\]

    for \(x=1,2,3, \ldots\).

    Like the binomial distribution, the geometric distribution has the parameters of the mean and standard deviation. The expected value of X , the mean for Case I , is \(\mu=\dfrac{1}{p}\). This tells us how many failed trials to expect until we get the first success. This count includes in the count of trials the trial that results in success. The above form of the geometric distribution is used for modeling the number of trials until the first success. The number of trials includes the one that is a success: \(x=\) all trials including the one that is a success. This can be seen in the form of the formula. If \(\mathrm{X}=\) number of trials including the success, then we must multiply the probability of failure, \((1-p)\), times the number of failures, that is \(x-1\). The standard deviation of Case \(I\) of the geometric distribution is:

    \[\sigma=\sqrt{\left(\dfrac{1}{p}\right)\left(\dfrac{1}{p}-1\right)}\]

    CASE II: Random Variable X Is Number of Failures BEFORE a Success

    By contrast to Case I, the following form of the geometric distribution used for modeling number of failures until the first success is:

    \[P(X=x)=(1-p)^x p\]

    for \(x=0,1,2,3, \ldots\)

    In this case the trial that is the success is not counted as a trial in the formula: \(x=\) number of failures. The expected value, the mean, of this distribution is \(\mu=\dfrac{1-p}{p}\). This tells us how many failures to expect before we have a success. In either case, the sequence of probabilities is a geometric sequence. In Case II the standard deviation parameter is:

    \[\sigma=\sqrt{\left(\dfrac{1-p}{p}\right)}\]

    Bar graph of probability distribution of the given data.
    Figure \(\PageIndex{1}\): Geometric Distribution: P= 0.02 Common Ratio =r=0.98

    The \(y\)-axis in Figure \(\PageIndex{1}\) contains the probability of \(x\), and the \(x\)-axis is the random number components tested. For example, at \(x=1\) the probability it will be found to be defective is 0.0196 . With two components tested, the probability the second component is defective is graphed at a probability of 0.0196 at \(x=2\) on the \(x\) axis. For the probability that the third component is defective we find \(P(X=3)=0.019208\). (The first two are the same because of rounding in the computations.)

    Notice in Figure \(\PageIndex{1}\) that the probabilities decline by the same step down with each change in the value of \(x\). This increment is called the common ratio. This exists for the geometric probability distribution uniquely. The common ratio, called r , can be calculated by dividing any value by the previous value, e.g., \(\dfrac{P(x=5)}{P(x=4)}=\dfrac{0.018447}{0.018823}=0.98\)

    For this set of data, therefore, the common ratio is 0.98 . The common ratio then multiplied by any other probability value will provide the next probability value in the sequence. For example, the probability that the sixth component tested is a failure is 0.018078 . Check this using the formula from Case I. Now we have the \(P(x=6)\). Knowing this and the common ratio we can calculate the \(P(x=7)\) by simple multiplication:
    \(P(x=7)=(0.018078) \times 0.98=0.017716\), the same value we found earlier by using the geometric probability distribution. This common ratio increment is the same ratio between every number and is called a geometric progression and thus the name for this probability density function. Once the common ratio is calculated, any \(P\left(x=x_a\right)\) one desires to know can be easily found.

    The number of components that you would expect to test until you find the first defective component is the mean, \(\mu=50\) for this case of defective components. The formula for the mean of the geometric distribution for the random variable defined as number of failures before first success is \(\mu=\dfrac{1}{p}=\dfrac{1}{0.02}=50\).

    See Example \(\PageIndex{4}\) for an example where the geometric random variable is defined as number of trials until first success. The expected value of this formula for the geometric distribution will be different from this version of the distribution. Case II also has a variance but is changed from the Case I formula. This formula for the variance is:

    \[P(X=x)=(1-p)^{x-1} p\]

    for \(x=1,2,3, \ldots\)

    The formula for the variance is

    \[\sigma^2=\left(\dfrac{1}{p}\right)\left(\dfrac{1}{p}-1\right)=\left(\dfrac{1}{0.02}\right)\left(\dfrac{1}{0.02}-1\right)=2,450\]

    The standard deviation is

    \[\sigma=\sqrt{\left(\dfrac{1}{p}\right)\left(\dfrac{1}{p}-1\right)}=\sqrt{\left(\dfrac{1}{0.02}\right)\left(\dfrac{1}{0.02}-1\right)}=49.5\]

    Exercise \(\PageIndex{4}\)

    Assume that the probability of a defective pacemaker component is 0.02 . Components are randomly selected. Find the probability that the first defect is caused by the seventh component tested. How many components do you expect to test until one is found to be defective?

    Let \(X=\) the number of pacemaker components tested until the first defect is found.

    \(X\) takes on the values \(1,2,3, \ldots\) where \(p=0.02\).

    Find \(P(x=7)\). Answer: \(P(x=7)=(1-0.02)^{7-1} \times 0.02=0.0177\).

    To find the probability that \(x \leq 7\), follow the same instructions EXCEPT select E:geometcdf(as the distribution function.

    The probability that the seventh component is the first defect is 0.0177 .

    The formula for the variance is \(\sigma^2=\left(\frac{1}{p}\right)\left(\frac{1}{p}-1\right)=\left(\frac{1}{0.02}\right)\left(\frac{1}{0.02}-1\right)=2,450\)

    The standard deviation is \(\sigma=\sqrt{\left(\frac{1}{p}\right)\left(\frac{1}{p}-1\right)}=\sqrt{\left(\frac{1}{0.02}\right)\left(\frac{1}{0.02}-1\right)}=49.5\)

    Exercise \(\PageIndex{5}\)

    The lifetime risk of developing cancer is about one in \(67(1.5 \%)\). Let \(X=\) the number of people you ask until one says they have cancer. Then \(X\) is a discrete random variable with a geometric distribution: \(X \sim G\left(\frac{1}{67}\right)\) or \(X \sim G(0.015)\)

    a. What is the probability of that you ask ten people before one says they have cancer?
    b. What is the probability that you must ask 20 people?
    c. Find the (i) mean and (ii) standard deviation of \(X\).

    Answer

    a. \(P(x=10)= \sim G (0.015,10)=0.0131\)
    b. \(P(x=20)=\sim G (0.015,20)=0.0113\)
    c. Mean \(=\mu=\frac{1}{p}=\frac{1}{0.015}=66.67\)

    Standard Deviation \(=\sigma=\sqrt{\frac{1-p}{p^2}}=\sqrt{\frac{1-0.015}{[0.015]^2}}=66.16\)


    This page titled 8.2: Geometric Distribution is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.