The common measures of location are quartiles and percentiles.
Quartiles divide an ordered data set into four equal parts. The three quartiles of a data set are labeled as Q1, Q2, and Q3.
- About one-fourth of the data falls on or below the first quartile Q1.
- About one-half of the data falls on or below the second quartile Q2.
- About three-fourths of the data falls on or below the first quartile Q3.
In the same way, percentiles divide a data set into 100 equal parts.
To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths.
To score in the 90th percentile for a VO2 max fitness test does not mean, necessarily, that you received a score of 90 on the test. It means that 90% of the test scores are the same or less than your score, and 10% of the test scores are the same or greater than your test score.
Percentiles are useful for comparing values. For this reason, health and fitness professionals use percentiles extensively. One instance in which kinesiologists and exercise physiologists use percentiles is when interpreting standardized fitness assessment results to determine an individual's relative cardiovascular health. For example, suppose a pediatric fitness program sets the 75th percentile as the benchmark for "Excellent" physical endurance for a specific age group. That translates into a score of at least a specific threshold value (e.g., 50 mL/kg/min).
Percentiles are mostly used with very large populations. Therefore, if you were to say that 90% of the fitness scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.
The median is a number that measures the "center" of the data. You can think of the median as the "middle value," but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger. For example, consider the following data.<\p> 1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; 1
Ordered from smallest to largest:1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
Since there are 14 observations, the median is between the seventh value, 6.8, and the eighth value, 7.2. To find the median, add the two values together and divide by two.
\[\begin{array}{c}
6.8+7.2=14 \\
14 \div 2=7
\end{array}\]
The median is seven. Half of the values are smaller than seven and half of the values are larger than seven.
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first find the median or second quartile. The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5 < <\p>
The median or second quartile is seven. The lower half of the data are 1, 1, 2, 2, 4, 6, 6.8. The middle value of the lower half is two. 1; 1; 2; 2; 4; 6; 6.8
The number two, which is part of the data, is the first quartile. One-fourth of the entire sets of values are the same as or less than two and three-fourths of the values are more than two.
The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5. The middle value of the upper half is nine.
The third quartile, Q3, is nine. Three-fourths (75%) of the ordered data set are less than nine. One-fourth (25%) of the ordered data set are greater than nine. The third quartile is part of the data set in this example.
The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile (Q3) and the first quartile (Q1).
\[IQR = Q_3 - Q_1 \]
The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile. Potential outliers always require further investigation.
A potential outlier is a data point that is significantly different from the other data points. These special data points may be errors or some kind of abnormality or they may be a key to understanding the data.
For the following 13 annual healthcare costs for patients with a severe chronic condition, calculate the IQR and determine if any costs are potential outliers. Identifying extremely high-cost patients (outliers) is a major focus for resource allocation and intervention in public health and health policy. Costs are in dollars.
389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,800; 1,095,000
- Answer
-
Order the data from smallest to largest.
114,950 ; 158,000 ; 230,500 ; 387,000 ; 389,950 ; 479,000 ; 488,800 ; 529,000 ; 575,000 ; 639,000 ; \\
659,000 ; 1,095,000 ; 5,500,000\(M=488,800\)
\(Q_1=\dfrac{230,500+387,000}{2}=308,750\)
\(Q_3=\dfrac{639,000+659,000}{2}=649,000\)
\(I Q R=649,000-308,750=340,250\)
\((1.5)(I Q R)=(1.5)(340,250)=510,375\)
\(Q_1-(1.5)(I Q R)=308,750-510,375=-201,625 \)
\(Q_3+(1.5)(I Q R)=649,000+510,375=1,159,375\)
No healthcare price is less than \(-201,625\). However, \(5,500,000\) is more than \(1,159,375\). Therefore, \(5,500,000\) is a potential outlier.
Solution
Order the data from smallest to largest.
114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000
M = 488,800
Q1 =
Q3 =
IQR = 649,000 – 308,750 = 340,250
(1.5)(IQR) = (1.5)(340,250) = 510,375
Q1 – (1.5)(IQR) = 308,750 – 510,375 = –201,625
Q3 + (1.5)(IQR) = 649,000 + 510,375 = 1,159,375
No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.
VO2max cardiorespiratory fitness test percentile scores for assessments held during the day are:
99; 56; 78; 55.5; 32; 90; 80; 81; 56; 59; 45; 77; 84.5; 84; 70; 72; 68; 32; 79; 90
VO2max cardiorespiratory fitness test percentile scores for assessments held during the evening are::
98; 78; 68; 83; 81; 89; 88; 76; 65; 45; 98; 90; 80; 84.5; 85; 79; 78; 98; 90; 79; 81; 25.5
For the two data sets, find the following:
- The interquartile range. Compare the two interquartile ranges.
- Any outliers in either set.
- Answer
-
The five number summary for the day and night scores is
Table \(PageIndex{1}\) Minimum Q1 Median Q3 Maximum Day 32 56 74.5 82.5 99 Night 25.5 78 81 89 98 Table 2.23
The IQR for the day group is Q3 – Q1 = 82.5 – 56 = 26.5
The IQR for the night group is Q3 – Q1 = 89 – 78 = 11
The interquartile range (the spread or variability) for the day assessments are larger than the night assessments IQR. This suggests more variation will be found in the day assessment test scores.
Day assessment outliers are found using the IQR times 1.5 rule. So,- Q1 - IQR(1.5) = 56 – 26.5(1.5) = 16.25
- Q3 + IQR(1.5) = 82.5 + 26.5(1.5) = 122.25
Since the minimum and maximum values for the day assessments are greater than 16.25 and less than 122.25, there are no outliers.
Night assessment outliers are calculated as:
- Q1 – IQR (1.5) = 78 – 11(1.5) = 61.5
- Q3 + IQR(1.5) = 89 + 11(1.5) = 105.5
For the evening scores, any assessment score less than 61.5 is an outlier. Therefore, the scores of 45 and 25.5 are outliers. Since no assessment score is greater than 105.5, there is no upper end outlier.
Solution
The five number summary for the day and night classes is
| Minimum | Q1 | Median | Q3 | Maximum | |
|---|---|---|---|---|---|
| Day | 32 | 56 | 74.5 | 82.5 | 99 |
| Night | 25.5 | 78 | 81 | 89 | 98 |
- The IQR for the day group is Q3 – Q1 = 82.5 – 56 = 26.5
The IQR for the night group is Q3 – Q1 = 89 – 78 = 11
The interquartile range (the spread or variability) for the day class is larger than the night class IQR. This suggests more variation will be found in the day class’s class test scores.
- Day class outliers are found using the IQR times 1.5 rule. So,
- Q1 - IQR(1.5) = 56 – 26.5(1.5) = 16.25
- Q3 + IQR(1.5) = 82.5 + 26.5(1.5) = 122.25
Since the minimum and maximum values for the day class are greater than 16.25 and less than 122.25, there are no outliers.
Night class outliers are calculated as:
- Q1 – IQR (1.5) = 78 – 11(1.5) = 61.5
- Q3 + IQR(1.5) = 89 + 11(1.5) = 105.5
For this class, any test score less than 61.5 is an outlier. Therefore, the scores of 45 and 25.5 are outliers. Since no test score is greater than 105.5, there is no upper end outlier.
Fifty participants in a sleep research study were asked how much sleep they get per night (rounded to the nearest hour). The results were:
| AMOUNT OF SLEEP PER NIGHT (HOURS) | FREQUENCY | RELATIVE FREQUENCY | CUMULATIVE RELATIVE FREQUENCY |
|---|---|---|---|
| 4 | 2 | 0.04 | 0.04 |
| 5 | 5 | 0.10 | 0.14 |
| 6 | 7 | 0.14 | 0.28 |
| 7 | 12 | 0.24 | 0.52 |
| 8 | 14 | 0.28 | 0.80 |
| 9 | 7 | 0.14 | 0.94 |
| 10 | 3 | 0.06 | 1.00 |
Find the 28th percentile. Notice the 0.28 in the "cumulative relative frequency" column. Twenty-eight percent of 50 data values is 14 values. There are 14 values less than the 28th percentile. They include the two 4s, the five 5s, and the seven 6s. The 28th percentile is between the last six and the first seven. The 28th percentile is 6.5.
Find the median. Look again at the "cumulative relative frequency" column and find 0.52. The median is the 50th percentile or the second quartile. 50% of 50 is 25. There are 25 values less than the median. They include the two 4s, the five 5s, the seven 6s, and eleven of the 7s. The median or 50th percentile is between the 25th, or seven, and 26th, or seven, values. The median is seven.
Find the third quartile. The third quartile is the same as the 75th percentile. You can "eyeball" this answer. If you look at the "cumulative relative frequency" column, you find 0.52 and 0.80. When you have all the fours, fives, sixes and sevens, you have 52% of the data. When you include all the 8s, you have 80% of the data. The 75th percentile, then, must be an eight. Another way to look at the problem is to find 75% of 50, which is 37.5, and round up to 38. The third quartile, Q3, is the 38th value, which is an eight. You can check this answer by counting the values. (There are 37 values below the third quartile and 12 values above.)
A Formula for Finding the kth Percentile
If you were to do a little research, you would find several formulas for calculating the kth percentile. Here is one of them.
k = the kth percentile. It may or may not be part of the data.
i = the index (ranking or position of a data value)
n = the total number of data
- Order the data from smallest to largest.
- Calculate
i = k/ 100 ( n + 1 ) - If i is an integer, then the kth percentile is the data value in the ith position in the ordered set of data.
- If i is not an integer, then round i up and round i down to the nearest integers. Average the two data values in these two positions in the ordered data set. This is easier to understand in an example.
Listed are 29 ages (in years) for patients diagnosed with Type 2 Diabetes in a recent clinic study, in order from smallest to largest. This data is used by researchers to determine the median age of disease onset and variability in diagnosis timing.
18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77
- Find the 70th percentile.
- Find the 83rd percentile.
- Answer
-
a. \(\circ \: k=70\)
\(\circ \: i=\) the index
\(\circ \: n=29\)
\(i=\dfrac{k}{100}(n+1)=\left(\dfrac{70}{100}\right)(29+1)=21\). Twenty-one is an integer, and the data value in the \(21^{\text {st }}\) position in the ordered data set is 64 . The \(70^{\text {th }}\) percentile is 64 years.b. \(\quad \circ k=83^{\text {rd }}\) percentile
\(\(\quad \circ i=\) the index
\(\(\quad \circ n=29\)
\(i=\dfrac{k}{100}(n+1)=\left(\dfrac{83}{100}\right)(29+1)=24.9\), which is NOT an integer. Round it down to 24 and up to 25. The age in the \(24^{\text {th }}\) position is 71 and the age in the \(25^{\text {th }}\) position is 72 . Average 71 and 72 . The \(83^{\text {rd }}\) percentile is 71.5 years.
Solution
-
- k = 70
- i = the index
- n = 29
" role="presentation" style="position: relative;">k 100 k 100 -
- k = 83rd percentile
- i = the index
- n = 29
" role="presentation" style="position: relative;">k 100 k 100
Using Table :
- Find the 80th percentile.
- Find the 90th percentile.
- Find the first quartile. What is another name for the first quartile?
- Answer
-
Using the data from the frequency table, we have:
a. Notice there are 50 data values in the table, so \(n=50\).
Calculate the index i as follows:
\[i=\dfrac{80}{100}(50+1)=40.8\]
Since \(i=40.8\), calculate the mean of the \(40^{\text {th }}\) and \(41^{\text {st }}\) data values. The \(40^{\text {th }}\) data value is 8 , the \(41^{\text {st }}\) data value is 9 , and the mean of these two data values is 8.5 . Thus, the \(80^{\text {th }}\) percentile is 8.5.
b. Calculate the index \(i\) as follows:
\[i=\dfrac{90}{100}(50+1)=45.9\]
Since \(i=45.9\), calculate the mean of the \(45^{\text {th }}\) and \(46^{\text {th }}\) data values. The \(45^{\text {th }}\) data value is 9 , the \(46^{\text {th }}\) data value is 9 , and the mean of these two data values is 9 . Thus, the \(90^{\text {th }}\) percentile is 9 .c. Another name for the first quartile is the \(25^{\text {th }}\) percentile. Proceed to calculate the \(25^{\text {th }}\) percentile: Calculate the index \(i\) as follows:
\[i=\dfrac{25}{100}(50+1)=12.75\]
Since \(i=12.75\), calculate the mean of the \(12^{\text {th }}\) and \(13^{\text {th }}\) data values. The \(12^{\text {th }}\) data value is 6 , the \(13^{\text {th }}\) data value is 6 , and the mean of these two data values is 6 . Thus, the first quartile is 6 .
Solution
Using the data from the frequency table, we have:
- Notice there are 50 data values in the table, so n = 50.
Calculate the index i as follows:
" role="presentation" style="position: relative;">i = 80 100 ( 50 + 1 ) = 40 . 8 i = 80 100 ( 50 + 1 ) = 40 . 8 - Calculate the index i as follows:
" role="presentation" style="position: relative;">i = 90 100 ( 50 + 1 ) = 45 . 9 i = 90 100 ( 50 + 1 ) = 45 . 9 - Another name for the first quartile is the 25th percentile. Proceed to calculate the 25th percentile:
Calculate the index i as follows:
" role="presentation" style="position: relative;">i = 25 100 ( 50 + 1 ) = 12 . 75 i = 25 100 ( 50 + 1 ) = 12 . 75
A Formula for Finding the Percentile of a Value in a Data Set
- Order the data from smallest to largest.
- x = the number of data values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile.
- y = the number of data values equal to the data value for which you want to find the percentile.
- n = the total number of data.
- Calculate 100 (x+0.5y)/n
Listed are 29 ages (in years) for patients diagnosed with Type 2 Diabetes in a recent clinic study, in order from smallest to largest. This data is used by researchers to determine the median age of disease onset and variability in diagnosis timing.
18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77
- Find the percentile for 58.
- Find the percentile for 25.
- Answer
-
a. Counting from the bottom of the list, there are 18 data values less than 58 . There is one value of 58. \(x=18\) and \(y=1 \cdot \dfrac{x+0.5 y}{n}(100)=\dfrac{18+0.5(1)}{29}(100)=63.80 .58\) is the \(64^{\text {th }}\) percentile.
b. Counting from the bottom of the list, there are three data values less than 25 . There is one value of 25 .
\(x=3\) and \(y=1 \cdot \dfrac{x+0.5 y}{n}(100)=\dfrac{3+0.5(1)}{29}(100)=12.07\). Twenty-five is the \(12^{\text {th }}\) percentile.
Solution
- Counting from the bottom of the list, there are 18 data values less than 58. There is one value of 58.
x = 18 and y = 1.
" role="presentation" style="position: relative;">(100) =x + 0.5 y n x + 0.5 y n " role="presentation" style="position: relative;">(100) = 63.80. 58 is the 64th percentile.18 + 0.5 ( 1 ) 29 18 + 0.5 ( 1 ) 29 - Counting from the bottom of the list, there are three data values less than 25. There is one value of 25.
x = 3 and y = 1.
" role="presentation" style="position: relative;">(100) =x + 0.5 y n x + 0.5 y n " role="presentation" style="position: relative;">(100) = 12.07. Twenty-five is the 12th percentile.3 + 0.5 ( 1 ) 29 3 + 0.5 ( 1 ) 29
Interpreting Percentiles, Quartiles, and Median
A percentile indicates the relative standing of a data value when data are sorted into numerical order from smallest to largest. Percentages of data values are less than or equal to the pth percentile. For example, 15% of data values are less than or equal to the 15th percentile.
- Low percentiles always correspond to lower data values.
- High percentiles always correspond to higher data values.
A percentile may or may not correspond to a value judgment about whether it is "good" or "bad." The interpretation of whether a certain percentile is "good" or "bad" depends on the context of the situation to which the data applies. In some situations, a low percentile would be considered "good;" in other contexts a high percentile might be considered "good". In many situations, there is no value judgment that applies.
Understanding how to interpret percentiles properly is important not only when describing data, but also when calculating probabilities in later chapters of this text.
When writing the interpretation of a percentile in the context of the given data, the sentence should contain the following information.
- information about the context of the situation being considered
- the data value (value of the variable) that represents the percentile
- the percent of individuals or items with data values below the percentile
- the percent of individuals or items with data values above the percentile.
On a timed 400-meter run assessment for a college physical fitness class, the first quartile (Q1) for the time it took to finish the run was 90 seconds. Interpret the first quartile in the context of this situation.
- Answer
-
Twenty-five percent of students finished the 400-meter run in 90 seconds or less. Seventy-five percent of students finished the run in 90 seconds or more. A low percentile (and therefore a low quartile) could be considered a favorable result in this fitness assessment, as finishing more quickly on a timed physical test is desirable. The lower the time, the higher the fitness level relative to the group.
Solution
- Twenty-five percent of students finished the exam in 35 minutes or less.
- Seventy-five percent of students finished the exam in 35 minutes or more.
- A low percentile could be considered good, as finishing more quickly on a timed exam is desirable. (If you take too long, you might not be able to finish.)
Sharpe Middle School is applying for a grant that will be used to add fitness equipment to the gym. The principal surveyed 15 anonymous students to determine how many minutes a day the students spend exercising. The results from the 15 anonymous students are shown.
0 minutes; 40 minutes; 60 minutes; 30 minutes; 60 minutes
10 minutes; 45 minutes; 30 minutes; 300 minutes; 90 minutes;
30 minutes; 120 minutes; 60 minutes; 0 minutes; 20 minutes
Determine the following five values.
- Min = 0
- Q1 = 20
- Med = 40
- Q3 = 60
- Max = 300
If you were the principal, would you be justified in purchasing new fitness equipment? Since 75% of the students exercise for 60 minutes or less daily, and since the IQR is 40 minutes (60 – 20 = 40), we know that half of the students surveyed exercise between 20 minutes and 60 minutes daily. This seems a reasonable amount of time spent exercising, so the principal would be justified in purchasing the new equipment.
However, the principal needs to be careful. The value 300 appears to be a potential outlier.
Q3 + 1.5(IQR) = 60 + (1.5)(40) = 120.
The value 300 is greater than 120 so it is a potential outlier. If we delete it and calculate the five values, we get the following values:
- Min = 0
- Q1 = 20
- Q3 = 60
- Max = 120
We still have 75% of the students exercising for 60 minutes or less daily and half of the students exercising between 20 and 60 minutes a day. However, 15 students is a small sample and the principal should survey more students to be sure of his survey results.


