The last 10 years have seen an extraordinary surge of interest in ‘stepped wedge’ designs for evaluating interventions to improve health and social care. Reviews of published trials and registered protocols have shown an exponential increase in the number of trials citing a stepped wedge approach.

But published examples of stepped wedge evaluations in quality improvement illustrate some of the practical challenges. On the one hand, limited research resources may force investigators to stagger implementation at different sites

The vast majority of stepped wedge trials are cluster randomised, and when people refer to stepped wedge designs this is usually what they have in mind. A cluster randomised trial is a trial in which all the participants at the same site or ‘cluster’ are allocated to the same intervention.

Exactly what it means for the timescale to be ‘extended’ will depend on the trial. Stepped wedge trials come in many and varied forms.

The same study also took a series of cross-sectional samples from the larger cohort of patients (not necessarily the same patients each time) to assess quality of life and satisfaction.

A more common approach is to recruit eligible participants as they present at clusters in a continuous stream.

Research designs are shaped as much by practical constraints as by abstract schemes, and it is always a good idea to start with the constraints and work towards a design, rather than start with a design and try to fit it to constraints. These constraints will be unique to each research context, and

Are there limits on the time available to complete the evaluation, on the number of clusters, or on the number of participants (or the rate at which you can recruit participants) at each cluster? These constraints put limits on the overall scale of the evaluation, or force trade-offs between different design characteristics.

How will participants and their data be sampled in your study: as a series of cross-sectional surveys, as a continuous stream of incident cases, as a cohort followed over time, or some other way? Does the timescale divide into cycles, seasons or milestones that influence how you will sample participants and data?

Is there a limit on how many clusters can implement the intervention at the same time in the evaluation? If this is constrained by research resources (eg, if there are only enough trained research staff to implement the intervention one cluster at a time) then implementation

If implementation is to be staggered, is there a minimum ‘step length’? If the same team delivers the intervention in different clusters at different steps, then bear in mind it may take some time to get the intervention fully operational at a site, and the team will also need time to relocate from one cluster to the next.

Stepped wedge trials are suited to situations where, while it might be easy enough to introduce the experimental intervention to a cluster, it is much harder (practically or politically) to take it away again. These are interventions that change practice or are difficult to unlearn, or that policy has decreed will be rolled out anyway. This restriction is sometimes referred to as one-way crossover. (There are certainly interventions that can be crossed both ways, from control to intervention and back again, but in this case a design with

Stepped wedge designs also implicitly require that all of the clusters that will participate in the trial are ready to start (to be randomised and commence data collection) at the same calendar date—in other words that there is no long, drawn-out period of recruitment of sites. Studies where site recruitment will be a drawn-out process must follow an alternative strategy where each cluster is randomised as and when it is recruited, either to the control or to the intervention—just as you would randomise individuals in the simplest design for an individually randomised trial.

Remember, also, that one defining feature of a stepped wedge trial is that it runs over an extended time period. One of the most important questions to ask is whether this is necessary at all. In research on health services and quality improvement, marshalling good evidence

The motivation for conducting a stepped wedge trial that is most commonly cited is also the most questionable: that a stepped wedge design is necessary when you want everyone to have the opportunity to access the intervention. This is often portrayed as an incentive for sites to participate, or as an ethical obligation, or as a justification based on a concern that sites might seek the intervention for themselves outside of the trial protocol. We will square up to the logic of this argument in the next section.

A much more pertinent question to ask than ‘should I give every site the intervention?’ is ‘how long can I reasonably ask any site to wait for it?’ This will help you understand how much time you have to conduct a truly randomised evaluation. If you believe, incidentally, that you have an ethical obligation to give everyone the intervention immediately, and if you can, then a stepped wedge trial is

So, what if we have an intervention that can only be crossed in one direction, and we have a number of clusters that are ready to be randomised at the same time to a trial conducted over an extended period of time. How do we arrive at a stepped wedge as our design choice rather than any alternative?

Suppose we want to design a trial in a maternity unit setting, recruiting women with suspected pre-eclampsia, and randomised by maternity unit. Suppose we have identified 10 maternity units willing to take part, and we are not hopeful of finding any more. For this example, we will divide the timetable for the study into whole months for convenience and assume that in each unit four women are recruited every month. Here we explore the statistical power—the likelihood of finding evidence for an important effect—of different designs. More details on the assumptions behind our power calculations are given in

Sample size calculations for trials usually determine the number of participants needed to achieve given statistical power,

Designs for cluster randomised trials allowing crossover (in one direction) from a control to an intervention condition, either during or after the end of the trial, showing the statistical power of each design in a particular scenario (see

First, a sense-check: do we really need to extend the timescale of our trial? What if we recruited women over a single month, with half the maternity units allocated to the intervention condition and half to the control (20 women in each condition)? This design is shown schematically in

Now, a perceived advantage of the stepped wedge design is that all the sites end up receiving the intervention. But sites still have to wait: for the design in

If we go further, and abandon the idea that all clusters must begin in the control condition and end in the intervention condition, we arrive at the design in

But what about that excess power? Could we get away with collecting

So far, we have deliberately focused more on the design and conduct of stepped wedge trials than their analysis, but the two are connected and the latter generates just as much discussion. Combining quantitative information from between-site and within-site comparisons is relatively easy, although the methods that are commonly used—mixed regression and generalised estimating equations—rely heavily on statistical modelling.

One of the most important things when analysing a stepped wedge trial is to allow for the possibility of secular changes in outcomes over time (this is because time is confounded with treatment in a stepped wedge design). Yet we know from the work of others that this and other aspects of the analysis of stepped wedge trials are often handled inadequately in practice.

Stepped wedge designs provide a formal framework for evaluating interventions implemented at multiple sites. In this article we have focused on randomised evaluations, although non-randomised studies of interventions implemented at different times in different sites will share many of the features of stepped wedge trials.

Staggering the introduction of the intervention at different sites can offer statistical efficiency as well as practical benefits. But while efficiency and practicality may drive the choice of a stepped wedge design,

