Skip to main content
Medicine LibreTexts

2.13: Evaluating Data

  • Page ID
    55704
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    All of the information collected in a study is called data. Scientists scrutinize the data in assessing the conclusions of research by others. A favorite saying among scientists is, “In God we trust; everyone else has to show their data.”

    Scientific data typically are reported in metric units* and evaluated by statistical analysis. Suppose you measure the heights of girls attending a summer camp, and find that, on average, the girls from California are taller than the girls from Ohio. Is this difference “for real?” Scientists decide by using the variation in heights and number of girls to see if the difference is statistically significant (probably not due to chance). The greater the difference, the less the variation, and the larger the number of girls, the more likely the difference will be statistically significant.

    Statistical analysis gives a measure of probability (P). The difference (e.g., in height) between study groups is said to be statistically significant when calculations show that the probability that the difference is due to chance is less than 5% (less than 5 in 100, expressed as P<0.05). Conversely, if the probability of a chance finding is 5% or more, the difference is “not statistically significant.” In other words, scientists can say that they found a difference between groups only if they’re at least 95% sure (the odds are less than 5 out of 100 that the difference was a fluke).

    The “scientific sentence” on page 1 says that the drug cholestyramine lowered the level of lipoproteins by doubling (P<0.01) its rate of removal from the blood. P<0.01 means that the probability of getting this result purely by chance was less than 1 in 100 (<1%).

    Many kinds of statistical analyses exist, including measuring the strength of a link between two factors (e.g., alcohol consumption and liver cancer). Factors that increase the risk of disease are called risk factors. The extent of risk often is given as Relative Risk.\(^{†}\) For example, a study of thousands of men working in the production of either beer or mineral water found that “excess” beer-drinking was linked to a higher risk of cancer of the liver and esophagus.

    The study (1975‑1978) took place in Denmark where workers producing beer belonged to the Danish Brewery Workers Union. The Union had a long-standing agreement with the breweries whereby each worker could drink up to 6 bottles (about 2 quarts) of free beer each day while working at the brewery. Workers producing mineral water weren’t allotted free beer (or mineral water); they drank about as much beer as the general population. The brewery workers drank about 4 times more.

    The Relative Risk of this excess beer was determined by tallying cancers in the brewery workers and a matched group not given this free beer.** The brewery workers had Relative Risks of 2.1 for esophageal cancer, 1.5 for liver cancer, and 1.1 for colon cancer. In other words, “excess beer” was linked to a 200% higher risk of esophageal cancer, a 50% higher risk of liver cancer, but the risk of colon cancer wasn’t significantly higher.

    Statistical analyses can be impressively complex, but no matter how fancy the measuring devices, how complex the statistics, or how precise the answers, the results are only as good as the data or sample selected (garbage in, garbage out). Fanciness can give an illusion of accuracy. A digital bathroom scale that displays the weight to three decimal points could be off by a pound.

    Be wary of how statistics are presented. Suppose you’re told that the number of Americans dying of cancer has steadily increased. Before getting alarmed, ask if the size of our population has also gone up. If told that the cancer death rate (e.g., cancer deaths per 10,000 people) has steadily increased, ask if the population has changed in other ways; e.g., do older people make up a bigger portion of the population?

    Assessment of data, conclusions, and statistics must be objective. Statements such as, this year 483,000 people will die of cancer—that’s 1,323 people every day, may elicit donations to cancer research, but don’t provide much useful information.


    *See Appendix A‑1 for conversion factors. United States, Myanmar (Burma), and Liberia are the only countries in the world that don’t use the metric system in daily affairs, as in recipes, measuring height and weight, dispensing gasoline.

    \(^{†}\)Relative Risk RR = [number of (cancers) found] divided by [number of (cancers) expected].

    **In this study, RR = [# of cancers in the brewery workers] / [# of cancers in the mineral-water workers].


    This page titled 2.13: Evaluating Data is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Judi S. Morrill via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?