Skip to main content
Medicine LibreTexts

2.5: Sampling

  • Page ID
    55482
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Ideally, one would select a representative sample of the target population for study, e.g., the accuracy of pollsters in predicting the outcome of presidential elections depends on their getting a representative sample of voters.

    Representative samples can be hard to get. In 1987, the extent of HIV infection in the U.S. was to be estimated by randomly and anonymously testing 50,000 people. But a preliminary survey showed that 30% of the population would refuse to participate, and there was no way to know if the infection rate in this 30% was the same as in the other 70%. Similarly, the nation’s infection rate can’t be accurately estimated based on select groups like blood donors and Armed Forces applicants. It’s easy to see why estimates of HIV infection varied from one news report to the next.

    Sometimes, it’s impractical to use representative samples. Most human studies of drugs and nutrients are very selective, eliminating those who are unreliable or are more vulnerable to harm.

    The subjects of a study that showed that aspirin helped prevent heart attacks were selected from a group of male physicians (see below*)— hardly a representative sample of the U.S. population. Also, note that those who might have adverse effects from aspirin were eliminated as subjects.

    The size of the sample also is important. The number of subjects needed to test a hypothesis is based mainly on how dramatic an outcome is expected. To test a powerful insecticide, you only need a few insects. It’s enough to show that 8 of 8 treated insects die, compared to 0 of 8 untreated controls.

    When expected differences between groups are small, or there’s a lot of variation in response, huge samples are needed to provide the statistical power to document differences. How many subjects are needed to test the hypothesis that a lowfat diet lowers the risk of breast cancer?

    To get enough breast cancers for comparison, 10,000 women would have to follow a low-fat diet for 10 years, and 22,000 more would be needed as controls, according to a proposal in 1988. The study wasn’t funded, in part because of concern that the low-fat group might not stick to a low-fat diet for 10 years, and the control group might lower their fat intake on their own

    The study was proposed again as part of a $625 million study of 63,000 postmenopausal women called the Women’s Health Initiative.** From 1993 to 1998, 48,835 postmenopausal women with no previous history of breast cancer were randomized into two groups: control group that stayed on their usual diet (32% calories as fat), low-fat group. The groups stayed on these diets for 8.5 years. The low-fat group lowered their fat intake to about 24.5% of calories from fat (the goal was 20%), but drifted up to 29%.

    After an average follow-up of 8 years, the group assigned to a low-fat diet didn’t have significantly fewer breast cancers than the group on their usual diet. However, after a follow-up of nearly 20 years, the survival rate among those who got breast cancer was higher in the low-fat group. This result could also be attributed to the low-fat group eating more fruit, vegetables, and grains than the control group.

    Sometimes, the number of subjects needed for a study is achieved by combining subjects and results from several already-published studies and then statistically analyzing the combined data, a process called meta‑analysis. Say there are 25 published studies of whether a high-fat diet raises the risk of prostate cancer, but none shows statistically significant differences; 12 show a higher risk with a high‑fat diet, 5 show no difference, and 8 show a lower risk. By combining the data, all subjects and prostate cancers can be simultaneously compared, possibly showing a slightly higher risk that reaches statistical significance.

    There are drawbacks to meta-analysis, such as variation in the quality of the studies. Also, studies are less likely to be published if they show negative results (e.g., a study showing vitamin E doesn’t lower the risk of heart disease), thereby skewing the combined data. This can especially be a problem when a study is privately funded. If you sell garlic pills and fund a study to see if your pills lower blood-cholesterol, you’d be quick to publish the results if your pills worked. But if your pills didn’t work any better than the placebo...


    *

    Screen Shot 2021-11-03 at 4.52.50 PM.png

    **Most such studies are funded by our tax dollars allocated by Congress to the National Institutes of Health (NIH) in Bethesda, MD. A lot of research is done at NIH, but NIH gives most of its money to scientists elsewhere (mainly at academic centers). Scientists submit their proposals to NIH. After extensive scientific review, the proposals are given priority numbers. Funding is limited, so they’re funded in order of priority number. Congress can instruct NIH to fund certain studies, even those with low priority numbers, sometimes on the basis of political considerations.


    This page titled 2.5: Sampling is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Judi S. Morrill via source content that was edited to the style and standards of the LibreTexts platform.