5.4: Size to give adequate power

Last updated
Save as PDF

Page ID: 13161

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

The alternative approach to setting trial size is based upon selecting the trial size to achieve a specified power. In order to do this, the following must be specified:

What size of difference, D, between the two groups would be of clinical or public health importance? The trial size will be chosen so it would have a good chance of detecting this size of true difference, i.e. there would be a good chance of obtaining a statistically significant result, thus concluding that there is a real difference between the two trial arms. D is the true difference between the two groups, not the estimated difference as measured in the trial. Very small differences are generally of no public health importance, and it would not be of concern if they were not detected in the trial. The general principle, in most cases, is to choose D to be the minimum difference which would be of public health relevance and therefore be important to detect in a trial. Note that ‘detecting’ D means that a significant difference is obtained, indicating that there is some difference between the two groups. This does not mean that the difference is estimated precisely. To ensure a precise estimate is obtained, the approach of Section 3 should be used.
Having specified D, the investigators must decide how confident they wish to be of obtaining a significant result if this were the true difference between the groups. In other words, the power is set for this value of D. Note that, if the true difference between the groups is actually larger than D, the power of the trial will be larger than the value set. The required power is specified in the calculations by choosing the corresponding value of z2,z2,as shown in Table 5.1. Commonly chosen values for the power are 80%, 90%, and 95%, the corresponding values of z2z2being 0.84, 1.28, and 1.64. It would generally be regarded as unsatisfactory to proceed with a trial with a power of less than 70% for the primary outcome, because that means that one would have a more than 30% chance of ‘missing’ a true difference of D.
The significance level must also be specified for the comparison of the two groups under study. This is entered into the calculations in terms of the parameterz1z1The commonest choice for the required p-value is 0.05, corresponding to az1z1of 1.96. Alternative values might be 0.01 or 0.001, corresponding toz1z1values of 2.58 or 3.29, respectively. It is assumed throughout this chapter that two-sided significance tests are to be used (see Chapter 21, Section 2.3). A significance level of 0.05 is assumed in the numerical examples, unless otherwise stated.
In addition, certain additional information must be specified, which varies according to the type of measure being examined. This may be a rough estimate of the rates or proportions that are expected, or an estimate of the standard deviation for a quantitative variable. Note that, if these quantities were known exactly, no trial would be needed! Only rough estimates are required.

Having specified these values, the formulae or tables given in Sections 4.1 to 4.3 can be used to calculate the required trial size.

It is often useful, however, to proceed in the opposite direction, i.e. to explore the power that would be achieved for a range of possible trial sizes and for a range of possible values of the true difference D. This enables the construction of power curves, as illustrated in Figure 5.1. Formulae for this approach are also given in Sections 4.1 to 4.3.

Table 5.1 Relationship between \(z_2\) and % power (numbers in the body of the table show power corresponding to each value of \(z_2\)

First decimal place of \(z_2\)
\(z_2\)	0.0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
−3.0	0.1	0.1	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0
−2.0	2.3	1.8	1.4	1.1	0.8	0.6	0.5	0.3	0.3	0.2
−1.0	15.9	13.6	11.5	9.7	8.1	6.7	5.5	4.5	3.6	2.9
−0.0	50.0	46.0	42.1	38.2	34.5	30.9	27.4	24.2	21.2	18.4
+0.0	50.0	54.0	57.9	61.8	65.5	69.1	72.6	75.8	78.8	81.6
+1.0	84.1	86.4	88.5	90.3	91.9	93.3	94.5	95.5	96.4	97.1
+2.0	97.7	98.2	98.6	98.9	99.2	99.4	99.5	99.7	99.7	99.8
+3.0	99.9	99.9	99.9	100.0	100.0	100.0	100.0	100.0	100.0	100.0

Note: for example, \(z_2=−0.7\) corresponds to a power of 24.2%.

4.1 Comparison of proportions

The trial size required in each group to detect a specified difference \(D=p_1−p_2\), with power specified by z2z2and significance level specified by \(z_1\), is given by:

\[n=[(z_1+z_2)^22p(1−p)]/(p_1−p_2)^2\]

where p is the average of \(p_1\)and \(p_2\).For 90% power and significance at \(p<0.05,\)this simplifies to:

\[n=[21p(1−p)]/(p_1−p_2)^2.\]

Table 5.2 shows the required trial size for a range of values of p1p1and p2p2for 80%, 90%, or 95% power.

To calculate the power of a trial of specified size, calculate as follows, and refer the value of \(z_2\)to Table 5.1.

\[z2=(√ \{n/[2p(1−p)]\})(|p_1−p_2|)−z1.\]

Example: assume that the spleen rate in the control group of the mosquito-net trial is around 40%. To have very high power (say 95%) of detecting a significant effect if the intervention reduces the spleen rate to 30% (so that \(p=0.35\) the number of children required in each group is given by:

\[n=[(1.96+1.64)^2(2×0.35×0.65)]/(0.3−0.4)^2=590.\]

If the true risk ratio is R and we wish to power the trial, such that the lower confidence limit on the risk ratio will be greater than or equal to RLRLwhere RLRLis the lowest acceptable efficacy (say, for whether or not to implement the intervention in a public health system, i.e. we need to be sure that the efficacy is at least \(R_L\)), the required sample size is:

\[n=(z_1+z_2)^2[(1−p_1)/(p_1)+(1−p_2)/(p_1)]/[\log_e(R/R_L)]^2.\]

Table 5.2 Sample size requirements for comparison of proportions

Smaller prop. \(p_1\)	Difference \(D=p_2−p_1\)
Smaller prop. \(p_1\)	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45	0.50	0.55	0.60
0.05	435	141	76	50	36	28	22	18	15	13	11	10
	583	189	102	67	48	37	30	25	21	18	15	13
	719	233	126	83	60	46	37	30	26	22	19	16
0.10	686	200	101	63	44	33	26	21	17	14	12	10
	919	268	135	84	59	44	34	28	23	19	16	14
	1134	330	166	104	72	54	42	34	28	24	20	17
0.15	906	251	122	74	50	37	28	22	18	15	13	10
	1212	336	163	98	67	49	38	30	24	20	17	14
	1497	415	201	122	83	60	46	37	30	25	21	18
0.20	1094	294	139	82	55	40	30	24	19	16	13	11
	1464	394	186	110	74	53	40	31	25	21	17	15
	1808	486	230	136	91	66	50	39	31	26	21	18
0.25	1250	329	153	89	59	42	31	24	19	16	13	11
	1674	441	205	119	79	56	42	32	26	21	17	14
	2067	544	253	147	97	69	52	40	32	26	21	18
0.30	1376	357	163	94	61	43	32	24	19	16	13	10
	1842	478	219	126	82	58	43	33	26	21	17	14
	2274	590	270	156	101	71	53	40	32	26	21	17
0.35	1470	376	170	97	63	44	32	24	19	15	12	10
	1968	504	228	130	84	58	43	32	25	20	16	13
	2430	622	282	160	103	72	53	40	31	25	20	16
0.40	1533	388	174	98	63	43	31	24	18	14	11
	2052	520	233	131	84	58	42	31	24	19	15
	2534	642	287	162	103	71	52	39	30	24	19
0.45	1564	392	174	97	61	42	30	22	17	13
	2094	525	233	130	82	56	40	30	23	18
	2586	648	287	160	101	69	50	37	28	22
0.50	1564	388	170	94	59	40	28	21	15
	2094	520	228	126	79	53	38	28	21
	2586	642	282	156	97	66	46	34	26
0.55	1533	376	163	89	55	37	26	18
	2052	504	219	119	74	49	34	25
	2534	622	270	147	91	60	42	30
0.60	1470	357	153	82	50	33	22
	1968	478	205	110	67	44	30
	2430	590	253	136	83	54	37
0.65	1376	329	139	73	44	28
	1842	441	186	98	59	37
	2274	544	230	121	72	46
0.70	1250	294	122	63	36
	1674	394	163	84	48
	2067	486	201	104	60
0.75	1094	251	101	50
	1464	336	135	67
	1808	415	166	83
0.80	906	200	76
	1212	268	102
	1497	330	126
0.85	686	141
	919	189
	1134	233
0.90	435
	583
	719

Shown in the body of the table are the sample sizes required in each group to give the specified power.**Upper figure: power, 80%; middle figure: power, 90%; lower figure: power, 95%. Using a two-sided significance test withp<0.05.p<0.05.The two groups are assumed to be of equal size.

4.2 Comparison of incidence rates

For a specified difference \(D=r_1−r_2\) and values of \(z_1\) and \(z_2\)representing the required significance level and power, the required number of person-years in each group is given by:

\[y=[(z_1+z_2)^2(r_1+r_2)]/(r_1−r_2)^2\]

where \(r_1\) and \(r_2\)are the expected rates per person-year in the two groups.A rough estimate of the average of the two rates is therefore required, i.e. \([(r_1+r_2)/2]\)For 90% power and significance at p<0.05p<0.05, this formula simplifies to:

\[y=[10.5(r_1+r_2)]/(r_1−r_2)^2.\]

An alternative, but equivalent, formula gives the number of events required in group 2, the control group, in terms of the rate ratio R, for which the specified power is required:

\[e_2=[(z_1+z_2)^2(1+R)]/(1−R)^2.\]

This formula was used to construct Table 5.3, which shows the number of events needed in group 2 to detect a rate ratio of R with 80%, 90%, or 95% power. The total number of events needed in both groups can be calculated as \(e_2(1+R)\).

Since this can be computed without specifying the assumed rates in the two trial groups, this provides a particularly helpful approach when the rates are uncertain. Thus, in an endpoint-driven trial, we can specify the number of events that need to be observed to reach the required power, after which recruitment or follow-up may be terminated.

To calculate the power for a given trial size, compute:

\[z_2=\{√[n/(r_1+r_2)]\}(|r_1−r_2|)−z_1\]

where |r1−r2||r1−r2|is the absolute value of the difference between the two rates.

Refer the resulting value of \(z_2\) to Table 5.1 to determine the power of the trial.

Example: Assume, in the mosquito-net trial, that the death rate from malaria in the control group is 10/1000 child-years, so that \(r_2=0.010.\)Eighty per cent power is wanted to detect a significant effect if the true rate in children with bed-nets is reduced by 70% to \(r_1=0.003.\)The number of child-years of observation required in each group is given by:

\[y=[(1.96+0.84)^2(0.003+0.010)]/(−0.007)^2=2080.\]

The power curves shown in Figure 5.1 were constructed using the same assumption concerning the death rate in controls. For example, with \(y=2000\) and a rate ratio of \(R=0.7\)(corresponding to a death rate of 7 per 1000 child-years in the intervention group), giving a power of 18% (Table 5.1):

\[z_2=\{√[2000/(0.007+0.010)]\}(|0.007−0.010|)−1.96=−0.93.\]

These formulae are used to ensure that there is a high probability of rejecting the null hypothesis if the true effect is of the assumed size. However, this may still mean that the lower confidence limit for the effect size is close to the null, and this may provide insufficient evidence to recommend widespread adoption of the intervention. A larger sample size will be needed to ensure that the lower confidence limit exceeds a given value.Suppose the assumed value of the rate ratio is R and that we wish to power the trial so that there is a high probability that the CI excludes a value RLRLcorresponding to the lower limit of efficacy desired. Then the required sample size is given by the formula:

\[y=(z_1+z_2)^2(1/r_1+1/r_2)/[\log_e(R/R_L)]^2.\]

Example: In the mosquito-net trial, we found that 2080 child-years were required in each trial group to reject the null hypothesis with 80% power if the true rate ratio R was 0.3, corresponding to an efficacy of 70%. Now suppose we wish to ensure that there is an 80% chance that the lower 95% CI for the efficacy exceeds 30%, corresponding to \(R_L=0.7.\) Applying the formula, we obtain the following, demonstrating the substantial increase in sample size that this would necessitate:

\[y=(1.96+0.84)^2(1/0.010+1/0.003)/[\log_e(0.3/0.7)]^2=4732.\]

Table 5.3 Sample size requirements for comparison of rates

Relative rate R *	Expected events in group 2 to give+
Relative rate R *	80% power	90% power	95% power
0.1	10.6	14.3	17.6
0.2	14.7	19.7	24.3
0.3	20.8	27.9	34.4
0.4	30.5	40.8	50.4
0.5	47.0	63.0	77.8
0.6	78.4	105.0	129.6
0.7	148.1	198.3	244.8
0.8	352.8	472.4	583.2
0.9	1489.6	1994.5	2462.4
1.1	1646.4	2204.5	2721.6
1.2	431.2	577.4	712.8
1.4	117.6	157.5	194.4
1.6	56.6	75.8	93.6
1.8	34.3	45.9	56.7
2.0	23.5	31.5	38.9
2.5	12.2	16.3	20.2
3.0	7.8	10.5	13.0
5.0	2.9	3.9	4.9
10.0	1.1	1.4	1.8

Numbers in the body of the table are expected number of events required in group 2 to give specified power if relative rate in group 1 is R.

*R, ratio of incidence rate in group 1 to incidence rate in group 2.

+ Using a two-sided significance test with \(p<0.05\).The two groups are assumed to be of equal size.

4.3 Comparison of means

The trial size required in each group to detect a specified difference \(D=μ_1−μ_2,\)with power specified by \(z_2\) and the significance level specified by \(z_1,\)is given by:

\[n=[(z_1+z_2)^2(σ_1^2+σ_2^2)]/(μ_1−μ_2)^2\]

where \(σ_1\) and \(σ_2\) are the standard deviations of the outcome variable in groups 1 and 2, respectively.For 90% power and significance at \(p<0.05\),this simplifies to:

\[n=10.5(σ_1^2+σ_2^2)/(μ_1−μ_2)^2.\]

To calculate the power of a trial of specified size, calculate the following, and refer the value of \(z_2\) to Table 5.1:

\[z_2=\{√[n/(σ^2_1+σ_2^2)]\}(|μ_1−μ_2|)−z_1.\]

Estimates of σ1σ1and σ2σ2may be obtained from previous studies or from a pilot study. If appropriate values cannot be determined, an alternative is to dichotomize the continuous outcome variable and use the sample size formulae for comparison of proportions given in Section 4.1. This will give a conservative estimate of sample size, as it ignores some of the information, but will ensure an adequate sample size in the face of uncertainty regarding the standard deviations.

Example: In the mosquito-net trial, the mean PCV in the control group at the end of the trial is expected to be 33.0, with a standard deviation of 5.0. To have 90% power of detecting a significant effect if the intervention increases the mean PCV by 1.5, the number of children required in each group is given by:

\[n = \left[ ( 1.96 + 1.28 ) ^ { 2 } \left( 5.0 ^ { 2 } + 5.0 ^ { 2 } \right) \right] / ( 1.5 ) ^ { 2 } = 233\]

Suppose it turns out that only 150 children are available for study in each group. The power in these circumstances is given by the following, corresponding to a power of about 74%:

\[z_2=\{√[150/(5.02+5.02)]\}(|1.5|)−1.96=0.64.\]

A summary of the various formulae that have been given for calculating the trial size requirements for the comparison of two groups of equal size is given in Table 5.4.

Table 5.4 Summary of formulae for calculating trial size requirements for comparison of two groups of equal size

Type of outcome	Formula	Notation	Section in text
A: Choosing trial size to achieve adequate precision
Proportions:	\(n = \left( 1.96 / \log _ { e } f \right) ^ { 2 } \left\{ \left[ ( R + 1 ) / \left( R p _ { 2 } \right) \right] - 2 \right\}\)	n = number in each group R = prop. in group 1/prop. in group 2Gives 95% CI from R/f to Rf	3.1
Rates:	\(e _ { 2 } = \left( 1.96 / \log _ { \mathrm { e } } f \right) ^ { 2 } [ ( R + 1 ) / R ]\)	\(e^2\)= expected events in group 2 R = rate in group 1/rate in group 2 Gives 95% CI from R/f to Rf	3.2
Means:	\(n = ( 1.96 / f ) ^ { 2 } \left( \sigma _ { 1 } ^ { 2 } + \sigma _ { 2 } ^ { 2 } \right)\)	n = number in each group \(σ_i=SD\) in group i D = mean in group 1 − mean in group 2 Gives 95% CI of \(D±f\)	3.3
B: Choosing trial size to achieve adequate power
Proportions:	\(n = \left[ \left( z _ { 1 } + z _ { 2 } \right) ^ { 2 } 2 p ( 1 - p ) \right] / \left( p _ { 1 } - p _ { 2 } \right) ^ { 2 }\)	n = number in each group \(p^i\)= proportion. in group i p = average of p1p1and p2p2	4.1
Rates:	\(y = \left[ \left( z _ { 1 } + z _ { 2 } \right) ^ { 2 } \left( r _ { 1 } + r _ { 2 } \right) \right] / \left( r _ { 1 } - r _ { 2 } \right) ^ { 2 }\)	y = person-years in each group \(r^i\)= rate in group i	4.2
Means:	\(n = \left[ \left( z _ { 1 } + z _ { 2 } \right) ^ { 2 } \left( \sigma _ { 1 } ^ { 2 } + \sigma _ { 2 } ^ { 2 } \right) \right] / \left( \mu _ { 1 } - \mu _ { 2 } \right) ^ { 2 }\)	n = number in each group \(σ_i=SD\) in group i \(μ_i\)=mean in group i	4.3

\(z_1=1.96\) for significance at \(p<0.05\)

Power 80%, 90%, 95%

\(z_2=0.84, 1.28, 1.64\)