Skip to main content
Medicine LibreTexts

13.5: Test of Independence

  • Page ID
    140494
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Tests of independence involve using a contingency table of observed (data) values.

    The test statistic for a test of independence is similar to that of a goodness-of-fit test:

    \[\sum_{(i \cdot j)} \dfrac{(O-E)^2}{E}\]

    where:

    • \(O=\) observed values
    • \(E=\) expected values
    • \(i=\) the number of rows in the table
    • \(j=\) the number of columns in the table

    There are \(i \cdot j\) terms of the form \(\dfrac{(O-E)^2}{E}\).

    A test of independence determines whether two factors are independent or not. You first encountered the term independence in Independent and Mutually Exclusive Events earlier. As a review, consider the following example.

    Note

    The expected value inside each cell needs to be at least five in order for you to use this test.

    Exercise \(\PageIndex{1}\)

    Suppose Event A = a patient has high blood pressure (hypertension) and Event B = a patient consumes more than the recommended daily limit of sodium. If \(A\) and \(B\) are independent then \(P(A \cap B)=P(A) P(B) . A \cap B\) represents a person who both has hypertension and a high-sodium diet.

    In a public health study of 755 individuals, researchers collected the following data:

    • Hypertension (Event A): 70 people had hypertension; 685 did not.
    • High-Sodium Diet (Event B): 305 followed a high-sodium diet; 450 did not.
    Answer

    Let \(y=\) y= the expected number of people who both have hypertension and follow a high-sodium diet.

    If \(A\) and \(B\) are independent, then \(P(A \cap B)=P(A) P(B)\). By substitution,

    \[\dfrac{y}{755}=\left(\dfrac{70}{755}\right)\left(\dfrac{305}{755}\right)\]

    Solve for \(y: y=\dfrac{(70)(305)}{755}=28.3\)

    About 28 people from the sample are expected to have both hypertension and a high-sodium diet if the two factors are completely unrelated.

    In a test of independence, we state the null and alternative hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternative hypothesis states that they are not independent (dependent). If we do a test of independence using the example, then the null hypothesis is:

    \(H_0\) : Having hypertension and following a high-sodium diet are independent events; in other words, they have no effect on each other.

    If the null hypothesis were true, we would expect about 28 people to fall into both categories.

    The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, as it is in a goodness-of-fit.

    The number of degrees of freedom for the test of independence is:

    \(d f=(\) number of columns -1\()(\) number of rows -1\()\)

    The following formula calculates the expected number \((E)\) :

    \[E=\dfrac{(\text { row total })(\text { column total })}{\text { total number surveyed }}\]

    Exercise \(\PageIndex{1}\)

    A volunteer group, provides from one to nine hours each week with disabled senior citizens with a fitness and mobility program. The program recruits among community college students, four-year college students, and nonstudents. In Table \(\PageIndex{1}\) is a sample of the adult volunteers and the number of hours they volunteer per week.

    Table \(\PageIndex{1}\): Number of Hours Worked Per Week by Volunteer Type (Observed) The table contains observed (O) values (data).
    Type of volunteer 1–3 Hours 4–6 Hours 7–9 Hours Row total
    Community college students 111 96 48 255
    Four-year college students 96 133 61 290
    Nonstudents 91 150 53 294
    Column total 298 379 162 839

    The observed table and the question at the end of the problem, "Is the number of hours volunteered independent of the type of volunteer?" tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.

    H0: The number of hours volunteered is independent of the type of volunteer.

    Ha: The number of hours volunteered is dependent on the type of volunteer.

    The expected results are in Table \(\PageIndex{2}\)

    Table \(\PageIndex{2}\): Number of Hours Worked Per Week by Volunteer Type (Expected) The table contains expected (E) values (data)
    Type of volunteer 1-3 Hours 4-6 Hours 7-9 Hours
    Community college students 90.57 115.19 49.24
    Four-year college students 103.00 131.00 56.00
    Nonstudents 104.42 132.81 56.77

    For example, the calculation for the expected frequency for the top left cell is

    \[E=\dfrac{(\text { row total })(\text { column total })}{\text { total number surveyed }}=\dfrac{(255)(298)}{839}=90.57\]

    Calculate the test statistic: \(\chi^2=12.99\) (calculator or computer)

    Distribution for the test: \(\chi_4^2\)

    \(d f=(3\) columns -1\()(3\) rows -1\()=(2)(2)=4\)

    Nonsymmetrical chi-square curve with values of 0 and 12.99 on the x-axis representing the test statistic of number of hours worked by volunteers of different types. A vertical upward line extends from 12.99 to the curve and the area to the right of this is equal to the p-value.
    Figure \(\PageIndex{1}\)

    The graph of the Chi-square shows the distribution and marks the critical value with four degrees of freedom at \(95 \%\) level of confidence, \(\alpha=0.05,9.488\). The graph also marks the calculated \(\chi_c^2\) test statistic of 12.99. Comparing the test statistic with the critical value, as we have done with all other hypothesis tests, we reach the conclusion.

    Make a decision: Because the calculated test statistic is in the tail we cannot accept \(H_0\). This means that the factors are not independent.

    Conclusion: At a 5\% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.

    For the example in Table 11.16, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?

    Exercise \(\PageIndex{2}\)

    De Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. Table \(\PageIndex{4}\) shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.

    Table \(\PageIndex{4}\): Need to Succeed in School vs. Anxiety Level
    Need to Succeed in School High
    Anxiety
    Med-high
    Anxiety
    Medium
    Anxiety
    Med-low
    Anxiety
    Low
    Anxiety
    Row Total
    High Need 35 42 53 15 10 155
    Medium Need 18 48 63 33 31 193
    Low Need 4 5 11 15 17 52
    Column Total 57 95 127 63 58 400

    Problem

    a. How many high anxiety level students are expected to have a high need to succeed in school?

    b. If the two variables are independent, how many students do you expect to have a low need to succeed in school and a med-low level of anxiety?

    Answer

    a. The column total for a high anxiety level is 57 . The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.

    \[E=\dfrac{(\text { row total })(\text { column total })}{\text { total surveyed }}=\dfrac{155 \cdot 57}{400}=22.09\]

    The expected number of students who have a high anxiety level and a high need to succeed in school is about 22 .

    b. The column total for a med-low anxiety level is 63 . The row total for a low need to succeed in school is 52 . The sample size or total surveyed is 400.

    \[E=\dfrac{(\text { row total })(\text { column total })}{\text { total surveyed }}=8.19\]

    The expected number of students who have a low need to succeed in school and a med-low level of anxiety is 8 .


    This page titled 13.5: Test of Independence is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.