12.12: Matched or Paired Samples
- Page ID
- 140473
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In most cases of epidemiological or community health data, we have little or no control over the process of how the data are gathered. In this sense, the data are not the result of a planned, controlled experiment. In some cases, however, we can develop data that are part of a controlled clinical or physiological experiment. This situation occurs frequently in rehabilitation and athletic training settings. Imagine that the recovery rates of two different treatment protocols are being tested for differences in a performance metric, such as range of motion or muscle strength. The test is the same in format to what we have been testing, but here we can utilize matched pairs for which we can test if differences exist. In this design, each observation has its matched pair (such as a "pre-treatment" and "post-treatment" measurement on the same patient, or a "left-leg" vs. "right-leg" comparison) against which differences are calculated. First, the differences in the metric to be tested between the two lists of paired observations must be calculated, and this is typically labeled with the letter "d."
Then, the average of these matched differences, \(\bar{X}_d=\dfrac{\sum\left(x_1-x_2\right)}{n}\) is calculated as is its standard deviation, Sd. We expect that the standard deviation of the differences of the matched pairs will be smaller than unmatched pairs because presumably fewer differences should exist because of the correlation between the two groups.
When using a hypothesis test for matched or paired samples, the following characteristics may be present:
- Simple random sampling is used.
- Sample sizes are often small.
- Two measurements (samples) are drawn from the same pair of individuals or objects.
- Differences are calculated from the matched or paired samples.
- The differences from the sample that is used for the hypothesis test.
- Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal, as has been the case using the Central Limit Theorem.
In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, μd, is then tested using a Student's-t test for a single population mean with n – 1 degrees of freedom, where n is the number of differences, that is, the number of pairs not the number of observations.
The null and alternative hypotheses for this test are:
\[\begin{array}{l}
H_0: \mu_d=0 \\
H_a: \mu_d \neq 0
\end{array}\]
The test statistic is:
\[t_c=\dfrac{\bar{x}_d-\mu_d}{\left(\dfrac{s_d}{\sqrt{n}}\right)}\]
A physical therapy clinic has developed a functional mobility program for patients recovering from knee surgery because the staff is concerned with the results of the standard six-month recovery assessment. They hope that this targeted program will result in better assessment scores.
Each patient constitutes a “pair”: the initial mobility score they received when first entering the clinic and the score given at their six-month follow-up. The difference in the two scores was calculated for each patient.
The Data:
- Sample size (n): 20 patients.
- Mean score before the program (xˉ1): 20.4
- Mean score after the program (xˉ2): 23.9
- Standard deviation of the differences (sd): 3.8 points.
Test at the 10% significance level the null hypothesis that the two population means are equal against the alternative that the mobility program helps improve the patients’ scores.
- Answer
-
The first step is to identify this as a two sample case: before the training and after the training. This differentiates this problem from simple one sample issues. Second, we determine that the two samples are "paired." Each observation in the first sample has a paired observation in the second sample. This information tells us that the null and alternative hypotheses should be:
\[\begin{array}{l}
H_0: \mu_d \leq 0 \\
H_a: \mu_d>0
\end{array}\]This form reflects the implied claim that the mobility program improves scores; the test is one-tailed and the claim is in the alternative hypothesis. Because the experiment was conducted as a matched paired sample rather than simply taking scores from people who took the mobility program versus those who didn't, we use the matched pair test statistic:
\[\text { Test Statistic: } t_c=\dfrac{\bar{X}_d-\mu_d}{\dfrac{S_d}{\sqrt{n}}}=\dfrac{(23.9-20.4)-0}{\left(\dfrac{3.8}{\sqrt{20}}\right)}=4.12\]
In order to solve this equation, the individual scores, pre-training course and post-training course need to be used to calculate the individual differences. These scores are then averaged and the average difference is calculated:
\[\bar{X}_d=\dfrac{\sum\left(x_1-x_2\right)}{n}\]
From these differences we can calculate the standard deviation across the individual differences:
\[S_d=\sqrt{\dfrac{\sum\left(d_i-\bar{X}_d\right)^2}{n-1}} \text { where }_i=x_{1 i}-x_{2 i}\]
We can now compare the calculated value of the test statistic, 4.12, with the critical value. The critical value is a Student's \(t\) with degrees of freedom equal to the number of pairs, not observations, minus 1. In this case 20 pairs and at \(90 \%\) confidence level \(\mathrm{t}_{\mathrm{a} / 2}= \pm 1.729\) at \(d f=20-1=19\). The calculated test statistic is most certainly in the tail of the distribution and thus we cannot accept the null hypothesis that there is no difference from the mobility program. Evidence seems indicate that the mobility program improves patient outcomes.
A college softball coach was interested in whether the college's strength development class increased their players' maximum lift (in pounds). Four players were asked to participate in the study. The amount of weight they could each lift was recorded before they took the strength development class. After completing the class, the amount of weight they could each lift was again measured. The data are as follows:
| Weight (in pounds) | Player 1 | Player 2 | Player 3 | Player 4 |
|---|---|---|---|---|
| Amount of weight lifted prior to the class | 205 | 241 | 338 | 368 |
| Amount of weight lifted after the class | 295 | 252 | 330 | 360 |
The coach wants to know if the strength development class makes the players stronger, on average.
- Answer
-
Record the differences data. Calculate the differences by subtracting the amount of weight lifted prior to the class from the weight lifted after completing the class. The data for the differences are: \(\{90,11,-8,-8\}\).
\[\bar{x}_d=21.3, s_d=46.7\]
Using the difference data, this becomes a test of a single mean.
Define the random variable: \(\bar{X}_d\) mean difference in the maximum lift per player.
The distribution for the hypothesis test is a student's t with 3 degrees of freedom.
\[H_0: \mu_d \leq 0, H_a: \mu_d>0\]
Figure \(\PageIndex{2}\) Calculate the test statistic look up the critical value: The calculated value of the test statistic is 0.91 . The critical value of the student's \(t\) at \(5 \%\) level of significance and 3 degrees of freedom is 2.353.
Decision: If the level of significance is \(5 \%\), we cannot reject the null hypothesis, because the calculated value of the test statistic is not in the tail.
What is the conclusion?
At a 5\% level of significance, from the sample data, there is not sufficient evidence to conclude that the strength development class helped to make the players stronger, on average. Be sure to note in your conclusion that a sample size of only 4 leaves a degree of freedom of only 3 and sets the critical value very large. In short, sample size so small cannot result in always meaningful conclusions.

