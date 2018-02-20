We introduce two case studies with large cluster sizes. These case studies raise a number of important issues, many of which are common to large pragmatic CRTs. We discuss each of these case studies in turn, considering for each the ethical implications of including excessive cluster sizes. We consider in particular the risks and burdens for participants, and whether these can be justified when it is known that some participants provide little information to the primary outcome.

When interventions are low risk and study participation poses little or no burden, for example, when data are routinely collected and non-identifiable, 12 13 there may be no ethical implications or implications may be inconsequential. However, excessive cluster sizes will have ethical implications when the intervention poses some risk to participants or there is some burden posed with trial participation. The objective of this paper is to raise awareness of these issues to prompt funders, researchers and ethics committees to ensure maximum social and scientific value of cluster trials, while at the same time minimise the number of participants who are exposed to the burden and risks of trial participation.

In this paper, we argue that this issue is of particular importance in a CRT. This is because, uniquely to a CRT, some participants in a cluster may contribute little information to the study. 7–10 The statistical implications of these excessive cluster sizes have been outlined elsewhere, including how simple changes to the trial design can deliver close to the desired statistical power without compromising the value of the trial. 11 However, the ethical implications of excessive cluster sizes may vary according to the type of CRT, but have not yet been fully articulated.

As with all research involving human participants, CRTs should be conducted in accord with appropriate scientific and ethical principles. One scientific and ethical standard is that an appropriate power calculation should be carried out, to ensure the study has enough participants to detect the target difference. However, it has been argued that trials that accrue needlessly large numbers of participants may violate ethical principles by exposing excessive numbers of participants to the burdens and risks of study participation. 5 6 We use the term ‘excessively large cluster sizes’ to notionally represent cluster sizes in which some observations make a negligible contribution to the study’s primary aims.

The cluster randomised trial (CRT) is increasingly being used to evaluate interventions that cannot be evaluated using the conventional individually randomised trial. CRTs proceed by randomising clusters of individuals to intervention or control conditions, with typical examples of clusters being hospitals, general practices and schools. These designs are used across a breadth of contexts, and evaluate a diverse range of interventions from health services or policy interventions 1 to drug therapies. 2–4

It is important to bear in mind that these calculations depend on estimated parameters such as the ICC. Ideally, power and precision curves should take into account the uncertainty in these estimates and should be considered across a range of scenarios, as with any power calculation. All our calculations have assumed equal cluster sizes and large sample theory. Specific recommendations have been provided for inflating the sample size to accommodate varying cluster sizes using the coefficient of cluster size variation, and this equates to increasing the number of clusters per arm by 2 or 3. 16 17 There are other CRTs that are statistically more efficient, that is, deliver the same power for a substantially reduced total sample size. These include the cluster randomised cross-over trial (only a feasible design choice when the intervention can be withdrawn) and some trials with premeasurements in a baseline period. 18

An additional complication is that while participants’ contribution to the power may be negligible, participants might still make a valuable contribution to the study should the minimal important effect be smaller than the target effect assumed in the sample size calculation. For example, in figure 1 , when the ICC is 0.01 the precision (ie, 1/variance of the treatment effect) continues to increase up to cluster sizes of about 1000. This means that while cluster sizes of about 100 would be sufficient to detect an effect size of 0.25, the CI width around smaller effect sizes will continue to decrease up to cluster sizes of about 1000. This may be valuable if the actual effect size is smaller than the target difference but still clinically important.

Illustration of diminishing returns in precision as cluster size increases, for typical intracluster correlation coefficient (ICC) values. Curves show increases in power (blue line) and precision (red line) as cluster size increases. All power curves correspond to a cluster randomised trial with 12 clusters per arm, designed to detect a standardised effect size of 0.25. At a significance level of 5%, 338 individuals per arm are required to yield 90% power under individual randomisation. Precision curves are independent of effect size (assumed to be continuous outcome). Dashed lines represent required sample size per arm under individual randomisation.

How the increase in power begins to level off as the cluster sizes increase is illustrated in figure 1 using a series of power curves (for full details of calculations, see Hemming et al 11 ). For example, when the ICC is 0.01 the power to detect a standardised effect size of 0.25 with 12 clusters per arm plateaus when the cluster size is about 100. This means that, when the standardised effect size of 0.25 is truly the minimally important clinical difference, participants above a threshold of 100 add little value to the analysis of the primary outcome of the study.

It has been known for some time that the power in a CRT begins to reach a plateau as the cluster size is increased, 7–10 and a set of practical recommendations for designing cluster trials so that they make efficient use of their data has recently been published. 11 At the core of these recommendations is the fact that, in cluster trials, increasing cluster sizes, even by a vast amount, may result in negligible gains in power. This levelling-off is dependent on several design parameters, including the intracluster correlation coefficient (ICC) (which measures the extent to which observations within a cluster are correlated) as well as the target effect size. Target effect sizes should be clearly justified and should ideally be based on the minimally important difference. 14 15

Ethical issues in studies with excessively large cluster size

A cornerstone of the ethics of medical research is that the risks to study participants must be reasonable when considering the potential benefits to them and to society. Researchers and research ethics committees must (1) minimise risks to participants consistent with sound scientific design and (2) ensure that knowledge benefits outweigh risks.19 20 Studies that accrue needlessly large numbers of participants may violate one or both requirements by exposing some participants to research burdens and risks needlessly.

Risks to research participants may include exposure to a treatment or service that is inferior (eg, when a trial continues beyond the point at which the inferiority of an intervention could have been detected, thereby needlessly exposing intervention participants to an inferior treatment); being denied access in a timely manner to treatment (eg, when trial continues beyond the point at which the superiority of an intervention could have been detected, thereby needlessly exposing control participants to an inferior treatment); or exposure to the burdens associated with data collection or other non-therapeutic study procedures.

Burdens and risks to society also include being denied access to treatments or services when a trial continues beyond the point at which the superiority of an intervention could have been detected. Other burdens and risks to society include a financial cost and delay in answering the study question.

Appropriate identification of research participants is essential to conducting a proper analysis of benefits and harms: only those individuals who are research participants logically fall under the remit of the research ethics committee. Therefore, identification of research participants can help determine if excessive cluster sizes are consequential. The Ottawa Statement provides explicit criteria for identification of research participants in a CRT20: in brief, individuals are research participants when they are targeted by a study intervention and/or have their identifiable private information collected for research.

Implications in lower risk or low burden settings Many CRTs evaluate interventions such as health promotion or knowledge translation interventions, which pose little burden or risk to trial participants. Burdens are minimal because outcomes might be obtained in de-identified form using routinely collected data sources. In this situation, the data might be considered available for ‘free’. That is, once data linkage has been established, the number of observations included per cluster, whether from 500 or 5000 participants, is largely immaterial. If the intervention is low risk and routinely collected outcome data are available, there are few ethical implications of large cluster sizes. Some CRTs evaluate interventions targeted at health professionals but assess outcomes on patients. In such trials if patients are not interacted with for elicitation of outcome data they will not be research participants. However, healthcare professionals might still be research participants, although are frequently not identified as such.21 In case study 1, the ASCEND trial,22 cluster sizes were set by the average number of invitations that each screening region posted each day. Figure 2 shows that a similar level of power could have been achieved with a much smaller cluster size (while retaining 50 clusters) for the anticipated ICC of 0.0002 (see online supplementary appendix 1 for more details on the calculations). However, the figure also demonstrates that all observations would make a material contribution to the precision of the treatment effect, if the effects of the intervention were smaller than that allowed for in the power calculation. Small absolute changes might be materially important to the screening programme, but if so should be specified in advance.14 Case study 1 The ASCEND trial: evaluation of a low-risk intervention with excessively large sizes Background A cluster randomised trial published in The Lancet in 2016 evaluated an intervention to increase colorectal screening uptake and reduce the socioeconomic differences between uptake rates.22 The intervention consisted of a leaflet designed specifically to communicate complex information to readers with low literacy (trial 1 in paper). The trial was powered for a primary analysis of testing whether the effect of the intervention differed across different socioeconomic status groups (ie, an interaction). This trial lasted 10 consecutive working days and was run across five UK screening regions. The outcome data were routinely collected. The unit of randomisation was a region-day (the clusters); thus, the trial included 50 clusters: 25 randomised to receive the intervention and 25 randomised to current practice. The cluster size was fixed by the number of invitations sent out (per region) in any given day. Power calculation The trial was designed with 90% power and 5% significance and required 13 500 observations per arm under individual randomisation. This was inflated to allow for clustering, assuming an intracluster correlation coefficient of 0.0002. The trial protocol states that the sample size was further inflated to increase generalisability and reduce internal validity issues associated with trials with a small number of clusters. The final number of clusters was 5 regions over 10 days (ie, 50 clusters). Number of participants included The trial enrolled 50 clusters and a total of 163 255 people contributed to the primary outcome, equating to an average cluster size of 3265. Supplementary data [bmjqs-2017-007164-SP1.pdf] Figure 2 Power and precision curves for ASCEND trial. Curves show increases in power (blue line) and precision (red line) as cluster size increases, assuming a cluster randomised trial with 25 clusters per arm, designed to detect a standardised effect size of 0.04, at a significance level of 5% (which requires a sample size of 13 500 per arm under individual randomisation), and assuming an intracluster correlation coefficient of 0.0002. Vertical solid line represents cluster size in the study (3000). Full details of calculations are provided in online supplementary appendix 1. If it is the case that smaller effects were not important, not all observations would have made a material contribution to the prespecified primary outcome analysis. Smaller cluster sizes might then have been achievable, for example, by composing half region-days instead of full region-days and running the trial for 10 half days instead of 10 full days (and so retaining 50 clusters). However, logistics may have meant that it was not practically possible to change the leaflet type midway through the day, or this may have added cost implications. In this situation, an acknowledgement and awareness of the negligible contribution of a substantial proportion of the data could have allowed important secondary outcomes (or safety outcomes, in other trials where applicable) to be included as fully powered analyses. In this trial, citizens were research participants: the study intervention is a leaflet that seeks to increase their participation in cancer screening. Researchers should consider carefully the implications of excessive cluster sizes on the study research participants. However, because the intervention likely poses negligible risks or burdens to participants and may even be beneficial for low socioeconomic groups, and because larger cluster sizes do not increase the costs or time required for the trial, the excessive cluster sizes are likely to be inconsequential. Nonetheless, the benefits to society may have been increased if the trial had made more efficient use of the data (eg, prespecified important secondary analyses).