Article Text

Are quality improvement collaboratives effective? A systematic review
1. Susan Wells1,
2. Orly Tamir2,
3. Jonathon Gray3,4,
4. Dhevaksha Naidoo5,
5. Mark Bekhit6,
6. Don Goldmann7
1. 1 Epidemiology and Biostatistics, School of Population Health, University of Auckland, Auckland, New Zealand
2. 2 Center for Research and Policy in Diabetes, The Gertner Institute for Epidemiology and Health Policy, Sheba Medical Center, Tel-Hashomer, Israel
3. 3 Victoria University, Wellington, New Zealand
4. 4 South West Academic Health Science Network, Exeter, UK
5. 5 RMIT University, Melbourne, Victoria, Australia
6. 6 School of Medicine, University of Auckland, Auckland, New Zealand
7. 7 Institute for Healthcare Improvement, Cambridge, Massachusetts, USA
1. Correspondence to Associate Professor Susan Wells, Epidemiology and Biostatistics, School of Population Health, University of Auckland, Auckland 1071, New Zealand; s.wells{at}auckland.ac.nz

## Abstract

Background Quality improvement collaboratives (QIC) have proliferated internationally, but there is little empirical evidence for their effectiveness.

Method We searched Medline, Embase, CINAHL, PsycINFO and the Cochrane Library databases from January 1995 to December 2014. Studies were included if they met the criteria for a QIC intervention and the Cochrane Effective Practice and Organisation of Care (EPOC) minimum study design characteristics for inclusion in a review. We assessed study bias using the EPOC checklist and the quality of the reported intervention using a subset of SQUIRE 1.0 standards.

Results Of the 220 studies meeting QIC criteria, 64 met EPOC study design standards for inclusion. There were 10 cluster randomised controlled trials, 24 controlled before-after studies and 30 interrupted time series studies. QICs encompassed a broad range of clinical settings, topics and populations ranging from neonates to the elderly. Few reports fully described QIC implementation and methods, intensity of activities, degree of site engagement and important contextual factors. By care setting, an improvement was reported for one or more of the study’s primary effect measures in 83% of the studies (32/39 (82%) hospital based, 17/20 (85%) ambulatory care, 3/4 nursing home and a sole ambulance QIC). Eight studies described persistence of the intervention effect 6 months to 2 years after the end of the collaborative. Collaboratives reporting success generally addressed relatively straightforward aspects of care, had a strong evidence base and noted a clear evidence-practice gap in an accepted clinical pathway or guideline.

Conclusions QICs have been adopted widely as an approach to shared learning and improvement in healthcare. Overall, the QICs included in this review reported significant improvements in targeted clinical processes and patient outcomes. These reports are encouraging, but most be interpreted cautiously since fewer than a third met established quality and reporting criteria, and publication bias is likely.

• quality improvement
• collaborative, breakthrough groups
• implementation science

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

## Introduction

A quality improvement collaborative (QIC) is an organised, multifaceted approach that includes teams from multiple healthcare sites coming together to learn, apply and share improvement methods, ideas and data on service performance for a given healthcare topic. The teams are supported by a faculty of experts who identify best practices and facilitate implementation of strategies to improve care. Between meetings, teams are usually provided with coaching and tasked to apply quality improvement methods, undertake rapid cycle tests of change (such as Plan-Do-Study-Act cycles) in their own work environment, and share data, innovations and lessons learnt from their implementation efforts. The theory of change is that teams learn faster and are more effective in implementing and spreading improvement ideas and assessing their own progress when collaborating and benchmarking with other teams.1–3 Effective teams, in turn, are able to improve processes of care and if sustained, improve patient outcomes and reduce healthcare costs.4 5 QICs have been advocated as a potentially powerful way to scale up and spread innovations and create long-term learning networks.6

The QIC methodology originated in North America in the late 1980s (New England Cardiovascular Disease Study Group 1986 and Vermont Oxford Network 1988) with the approach becoming more formalised as the Breakthrough Series by the Institute for Healthcare Improvement (IHI) in 1995. There have been various modifications on the collaborative methodology. For example, the Keystone collaboratives included a strong safety culture focus.7 8 Activities have evolved over the last 20 years with increasing use of virtual (internet-enabled) communications and meetings and electronic data capture. Even with these advances, however, QICs remain resource intensive, generally requiring sustained efforts for at least 12–18 months, funding for infrastructure and coaching, and person-time away from clinical care and administrative responsibilities. Despite these issues, QICs have increased in number and popularity. They have been initiated in diverse healthcare systems ranging from developed economies in the North America, Europe and Australasia,9 to low and middle-income countries, such as those in Latin America and Africa.10 11

QICs have proliferated in the absence of strong evidence for effectiveness, cost-effectiveness or long-term change.9 12 Much of the literature to date has been based on case studies, single-site evaluations and qualitative studies of experiences and determinants of success. While these provide important insights into how and in what ways QICs might work, little is known about whether or not they are effective. A systematic review of QICs in 2008 identified nine controlled studies of which seven (including one randomised controlled trial (RCT)) reported a positive effect on some of the outcome measures.12 The authors concluded that the evidence underlying QICs was positive but limited and the effects could not be predicted with great certainty.12

We aimed to re-evaluate the empirical evidence for the impact of QICs in the last 20 years.

## Methods

We have followed both the PRISMA13 14 guidelines for reporting this review and were guided by the Cochrane Systematic Review Handbook15 and the Cochrane Effective Practice and Organisation of Care (EPOC) Review Group checklist.16 Our primary research question was structured according to the PICOS approach (participants, intervention, comparison, outcomes and study design) and was as follows: For multidisciplinary healthcare teams (and patients that they care for), does involvement in a quality improvement collaborative compared with usual care lead to improvements in healthcare delivery and outcomes and, if successful, is it sustained 6 months or more postintervention?

A secondary objective was to assess the quality of reporting of the QIC intervention using ‘Planning the intervention,’ a subset of SQUIRE v.1 standards for reporting excellence.17 We did not publish a review protocol prior to commencement.

### Inclusion criteria

QIC studies were included for review if they were written in English, conducted in a healthcare setting and the reported primary effect measures were healthcare processes or patient outcomes. The criteria for the ‘intervention’ were critical to this review. Drawing from Schouten and colleagues’ analysis of the theoretical literature of an organised QIC,9 12 studies were included if they reported a multifaceted intervention with five core elements:

• a specific healthcare topic;

• a group of experts (clinical and quality improvement) to bring together the scientific evidence, practical contextual knowledge and quality improvement methods, usually within a ‘change package’ or toolkit;

• multiple teams from multiple sites that elect to participate;

• a model or framework for improvement that includes measurable aims, collecting data on planning and performance, implementing and evaluating small tests of change;

• a set of structured activities that promoted a collaborative process to learn and share ideas, innovations and experiences (eg, face-to-face or virtual meetings; visits to other sites; visits by experts or facilitators; web-based activities to report changes, results and comparisons with other teams; and coaching and feedback by improvement experts).

In addition to meeting the above criteria, studies were only included if they met the Cochrane EPOC Review Group minimum study design characteristics for inclusion in a systematic review of an intervention.16 These criteria are applied to RCTs, non-randomised controlled clinical trials, controlled before-after (CBA) studies and interrupted time series (ITS) studies with or without control sites. The minimum inclusion criteria for a CBA study were contemporaneous data collection, appropriate choice of control site and a minimum number of two intervention and two control sites. For an ITS study that investigated a change in trend attributable to an intervention, there were two minimum inclusion criteria: a clearly defined point in time when the intervention occurred, and at least three data points recorded before and after the intervention. While this number of data points is not generally considered sufficient for standard improvement science methods such as statistical process control (SPC), the EPOC checklist stipulates this as a minimum.

### Search strategy and data sources

We searched Medline, Embase, CINAHL, PsycINFO and the Cochrane Library databases from January 1995 to the end of 2014 using search terms as appropriate for each of the databases. We followed the search strategy (box) used to conduct the original systematic review published in 2008 by Schouten et al.12 In addition, we manually searched the reference lists of all previous reviews and relevant studies. If only conference abstracts were found, where possible, authors were contacted for full publications.

Box

### Search terms for systematic review12

Set 1 (non-MeSH)

(quality and improvement and collaborative) or ((series or project) and breakthrough)

Organisational-Innovation, Models-Organisational, Cooperative-Behavior

And Program-Evaluation, Total-Quality-Management, Quality-Assurance-Health-Care

And Outcome-and-Process-Assessment-Health-Care

And Health-Services-Research, Regional-Medical-Programs, Organi?ation* near (collabor* or participa*).

The steps in set 2 combined with: Statistics, Statistics-and-numerical-data

### Data extraction and risk of bias assessment

Each potentially eligible study was independently screened by SW and one other author (JG, OT, DG, MB, DN) according to the study inclusion criteria and EPOC minimum characteristics for inclusion. Data from all included studies were then extracted independently by SW, OT and DN onto a standardised spreadsheet. Authors (SW, OT and DN) used the Cochrane Systematic Review Handbook and the EPOC Review Group checklist to assess the quality of the studies and risk of bias according to study design. Using ‘planning the intervention,’ a subset of the core elements of SQUIRE v.1 standards,17 we assessed if authors described the QIC intervention. Two authors (DN and SW) extracted how the intervention was implemented (what was done and by whom), how well the intervention was implemented (dose or intensity of exposure) and any reported measures of site engagement in the activities (ie, the degree to which this dose was received and acted upon). In addition, we searched for the main factors that contributed to the choice of the intervention—why authors thought a QIC might work in their own context. Specifically, we looked for a rationale for the QIC based on internal factors (such as barriers, organisation dynamics, sociocultural and other factors) or external environmental factors.17

Disagreements between authors were resolved by a consensus process in which all reviewers discussed the issue.

### Data synthesis

Detailed summary tables of the included studies were developed according to setting of care, study design, country of origin, principal patient group, number of sites, whether the QIC type was specifically referenced (eg, Breakthrough Series, Vermont Oxford Network), clinical topic, primary effect measures and reported outcomes.

We were unable to use meta-analysis to pool effect sizes due to the substantial heterogeneity between settings and target populations, study design, nature and complexity of the reported QIC intervention, significant differences in the topic of interest and effect measures (processes and outcomes) and the unit of measurement for some outcomes (such as composite bundles of care vs single primary effect measures).

Measures of association are those reported by the individual studies. A QIC was deemed ‘effective’ if a statistically significant difference was reported in one or more of the study’s primary effect measures. We also applied a more conservative approach, noting where a study reported statistically significant improvements in at least 50% of the primary effect measures.

## Results

We identified 4609 records published from January 1995 until December 2014 through database searching (figure 1).

Figure 1

Systematic review study flow diagram. CBA, controlled before-after study; EPOC, Cochrane Effective Practice and Organisation of Care; ITS, interrupted time series study; RCT, randomised controlled trial.

After screening the abstracts, 3028 records were excluded as they were not QICs or were duplicates. An additional 54 articles were identified from searching reference lists and the grey literature. From the 1095 full-text articles screened, 848 were excluded as they either did not have data on effectiveness or did not meet all the QIC inclusion criteria. This left 220 studies (247 articles), from which a further 156 studies were excluded as they did not meet one or more of the EPOC minimum quality criteria for inclusion in a review (see online supplementary file 1). The majority were uncontrolled before-after studies or ITS studies that did not provide at least three data points before and after the intervention. A group consensus process was required for 15/220 (7%) studies to resolve disagreements. Figure 2 plots the number of QIC studies published over time according to EPOC minimum criteria. While publication of studies has increased, the ratio of ‘excluded to included’ studies in the last 4 years was approximately 2:1.

### Supplementary file 1

Figure 2

Study inclusion according to Cochrane Effective Practice and Organisation of Care (EPOC) minimum quality criteria 1996–2014. *Included studies: RCT cluster randomised controlled trial, CBA controlled before-after study, ITS interrupted time series study. **Excluded studies were uncontrolled before-after studies or study designs (CBA, ITS) that did not meet EPOC minimum quality criteria.

### Summary of included studies

There were 64 studies included in the final review: 10 cluster RCT, 24 CBA and 30 ITS studies of which 6 ITS studies also included a contemporaneous control group. The majority of the studies have been conducted in an acute hospital setting (61%), a third in ambulatory care/general practice (31%), with the remainder in nursing homes and in English regional ambulance services. A full description of the studies and findings is available in online supplementary file 2.

### Supplementary file 2

Hospital-based QICs (39 studies) targeted a wide variety of inpatient populations, including adult medical and surgical admissions, obstetric patients, intensive care patients (adult, paediatric and neonatal) and organ donors. Seven out of the nine cluster RCTs included in this review were hospital based. The median number of participating sites (QIC sites and control sites) was 32 (range 3 sites to 1071 intensive care units (ICU) in 793 hospitals). The most frequent topic for improvement was hospital-associated infections (13 studies), with 8 studies focusing on central line associated bloodstream infections (CLABSI).

Ambulatory care QICs (20 studies) were generally conducted on a smaller scale than hospital-based QICs (median 24 sites; range 1–69 sites), with just over half seeking to improve the management of chronic conditions, such as diabetes, asthma, congestive heart failure and depression.

Nursing home QICs (four studies) focused on pain management, falls prevention and pressure ulcers. Siriwardena et al 18 reported the only ambulance service QIC, with 12 sites participating in a QIC on prehospital care for acute myocardial infarction and stroke.

Tables 1–3 summarise the reported results by setting. For the primary effect measures of interest, a third of the studies investigated both clinical processes and patient outcomes, one-third a clinical process/es only and a third choosing a patient outcome/s only.

Table 1

Summary of findings of QIC studies conducted in a hospital setting (39 studies)

Table 2

Summary of findings of QIC studies conducted in an ambulatory care or general practice setting (20 studies)

Table 3

Summary of findings of QIC studies conducted in nursing home (four studies) and prehospital paramedical care (one study)

By care setting, an improvement was found for one or more of the study’s primary effect measures in 83% of the studies (31/38 (82%) hospital based, 17/20 (85%) ambulatory care, 3/4 nursing home and the sole ambulance QIC).

When a more conservative definition of effectiveness was applied (statistically significant differences in at least 50% primary effect measures studied), positive results were reported in 73% of the studies (31/39 (79%) hospital-based QICs, 13/20 (65%) ambulatory care, 2/4 nursing home and the sole ambulance study).

By study design, 6/10 (60%) RCT, 20/24 (83%) CBA and 27/30 (90%) ITS studies reported statistically significant improvements in at least one primary effect measure.

Where reported, absolute differences ranged from modest to very impressive, with a median of 12% improvement (range 4%–61%). Some marked improvements were seen for provider compliance with composite ‘bundles’ or sets of quality indicators. For example, an ITS study focusing on obstetric labour found that compliance with all components of the active management of third stage of labour went from 0% to 95%, with vaginal deliveries complicated by postpartum haemorrhage reducing from 83% to 5%.19 Patient outcomes such as CLABSI rates were reported in seven ITS studies, with a median CLABSI reduction of 43% (range 25%–74%) occurring via using central line insertion and maintenance bundles.

### Sustainability and cost-effectiveness

Eight studies (one RCT,20 21 four CBA22–25 and three ITS studies26–28) specifically investigated sustainability of the QIC result, defined as continued data collection 6 months or more after the end of the intervention study period. All found sustained improvements. Four studies reported on cost-effectiveness.19 21 29 30 All had positive findings (see online supplementary file 2). For example, in a collaborative focusing on neonatal hospital-associated infections, the median change in treatment cost per infant declined $10 932 in the intervention group but increased$12 249 in the control group (p<0.0001) and was sustained 1 year postintervention.21

The authors’ judgement of risk of bias by study design and assessment of the reported QIC intervention using a subset of SQUIRE v.1 standards are detailed in online Supplementary files and summarised below.

#### RCT risk of bias

The 10 RCTs were set in Mexico,31 the Netherlands,30 32 Canada,33 UK34 35 and USA.20 36–39 The QIC intervention was compared with usual care control sites in six trials.32 34–36 38 39 One trial assigned one group to reducing bronchopulmonary dysplasia and the other group to reducing hospital-associated infections, each providing a comparison to each other.33 Three trials included partial interventions in control sites, two provided limited feedback reports to controls,20 37 and one implemented a clinical information system for diabetes at all participating sites.31 The trial periods ranged from 12 to 24 months (median 20 months) with 80% or more follow-up in all but two studies (75% and 79% follow-up).34 38

While all studies reported randomisation, five did not describe a methods for random sequence generation,31 33 36 37 39 eight did not describe allocation concealment methods31 33–39 and only three trials reported intracluster correlation coefficients used in their analyses.20 34 36 Given the nature of QICs, blinding of participant sites/teams was not possible. However, five studies either had blinded outcome assessment or the outcome was assessed as being sufficiently objective (eg, an automated laboratory test).31 32 34 37 39 Where outcome measures were obtained from chart extraction or collected by individuals, no study reported inter-rater reliability audits.

#### CBA studies risk of bias

There were 24 CBA studies, 18 from USA,3 24 40–57 3 from the Netherlands,25 58 59 2 from Sweden60 61 and 1 from the UK.62 63 Study periods ranged from 7 to 60 months (median 27 months). Follow-up of patients and/or providers was 80%–100% in all but six studies (two studies where there was 71% follow-up in intervention sites41 49 and four studies42–45 where attrition was not specified). The major risks of bias were in the comparability of control sites, baseline differences in performance and the lack of blinded or reliable assessment of study outcome measures.

#### ITS studies risk of bias

There were 30 ITS studies, 6 (20%)64–69 with contemporaneous control sites. The majority (24/30) were located in the USA,26 28 64 66 68 70–88 3 from the UK18 65 67 and 3 from Africa.19 69 89 The median study period for ITS studies was 42 months (range 11 months to 10 years), far longer than in CBA studies (median 27 months) and RCT studies (median 20 months); 11/30 (37%) studies collected data for 4 or more years.26 27 64 65 69 71 79 81 83 87 89 For 25/30 studies, there was over 80% site follow-up. Of the remaining five studies, three had between 60% and 77% follow-up28 66 75 and two did not report site attrition.26 82 Only 10/30 (33%) were analysed according to EPOC criteria using autoregressive integrated moving average (ARIMA) or time series regression models, with an additional 8 studies employing SPC analyses. (Note: EPOC criteria do not specifically provide guidance on the appropriateness of SPC to control for temporal trends.) The major areas of risk of bias were the lack of control sites (or information about control sites) and lack of blinded or reliable assessment of study outcome measures.

#### Variability of the QIC implementation

Meetings where multiple sites would get together to learn and share experiences were a common feature of all the QICs. However, the number of meetings varied from one to nine (median three meetings). Where reported (23/64 studies), meetings ranged from 1 to 2 hours to 3 consecutive days. The topic was generally well described with 60% noting the development of a change package, bundles of care or an evidence-based toolkit. The make-up of the expert faculty and how these change packages were developed were less reliably reported. Two-thirds of the studies provided brief information on the educational curriculum at combined site meetings. Health professional composition of site teams was documented in about half (35/64) of the studies. Between-meeting activities such as coaching, site visits, conference calls or use of electronic tools (eg, email, listserv or webinars) to facilitate a learning network were described in 90% of the studies. The frequency of these structured activities varied greatly. For example, conference calls in between learning sessions ranged from biweekly to monthly or quarterly. One-third of the included studies reported site visits by quality improvement (QI) facilitators, by a case management advisor or by peers visiting each other. The intensity of QI coaching and facilitation by person/hour was only reported in two studies: 1–2 hours biweekly89 and 10 hours/week.55 Over half of the studies (39/64) described organisational or other support such as funding or sponsorship19 24 33 57 62 63 73 84 88 90; infrastructure, technical, logistic or information systems support48 66 76 79 80; executive leadership commitment to allow staff time to participate84; and the availability of project management resources.59 77

In terms of sufficient reporting of the QIC intervention and its component parts, only three studies described all aspects (number, length and curricula of meetings, topic, expert faculty and development of toolkit, site team composition and frequency of activities between meetings).41 55 89

#### Site engagement measures

Team participation at learning sessions was reported in 12/64 studies,32 36 37 43 44 46 49 56 57 59 64 66 and even fewer reported participation in other structured activities (eg, teleconferences).24 37 43 44 49 66 89 On-site quality improvement activities or some measure of engagement in local service improvement was infrequently described and variable. One study found that of the 12 intervention primary care practices, 7 were fully engaged, 2 failed to participate in the intervention and 3 never fully engaged in developing collaborative processes.39 Other examples of local site engagement measures were the submission of improvement activity logs,37 change activities made46 52 54 61 and quality improvement plans submitted.35

#### Rationale for QIC intervention

Two-thirds of the included studies (42/64) provided some sort of rationale for choosing a QIC intervention, although most were non-specific (eg, a continuous QI approach to close evidence-practice gaps,36 an approach to accelerate learning and adoption of best practices84 or a method to allow healthcare professionals compare their knowledge, theories and practices79). The most common justification cited for choosing a collaborative was that this method had been successful elsewhere.18 57 73 78 80

## Discussion

A collaborative is a complex and time-consuming intervention for clinicians, teams and sites and represents major financial, organisational and political investment. The QIC studies in this review were conducted in diverse healthcare settings in multiple countries including those with low-middle income. Therefore, despite methodological and reporting caveats, it was encouraging to find that the QICs we reviewed were largely effective in improving some of the processes and outcomes they addressed.

We found a statistically significant improvement in at least one primary effect measure in 83% of the studies (32/39 (82%) hospital based, 17/20 (85%) ambulatory care, 3/4 nursing home and the sole ambulance QIC). By study design, evidence for effectiveness was found for 6/10 (60%) RCT, 20/24 (83%) CBA and 27/30 (90%) ITS studies. When a more conservative definition of effectiveness was applied (statistically significant differences in at least 50% primary effect measures studied), positive results were reported in 73% of the studies.

While the evidence is still limited, eight studies reported sustainability of the intervention effect 6 months to 2 years after the end of the QIC, and four studies reported that the intervention met their definition of cost-effectiveness.

Although the number of QIC studies has increased over time, the quality of studies has not improved when measured against standard epidemiological criteria for methodological rigour. Of the 220 studies identified, 156 (71%) did not meet EPOC inclusion criteria for a variety of reasons. Some of these excluded studies included large numbers of teams and sites and described substantial improvements that were far greater than what would be reasonably predicted by chance. Many were uncontrolled before-after studies–a design that provides little protection against bias and confounding, and does not account for external factors that might influence secular trends (the counterfactual that improvement would have occurred with or without participation in the QIC).9 12 91 Some QICs were excluded because they lacked enough repeated observations before starting the intervention. While this probably reflects real-life challenges of data collection, it severely limits the ability of these studies to attribute change to the interventions promoted by the QIC.

The authors of most of the papers that met inclusion criteria stated that they were using the Breakthrough Series collaborative model developed by IHI, or a variant of this approach. The SQUIRE guidelines have been revised and simplified recently in version 2.0.92 Although these recommendations were not available at the time many of the papers were published, they synthesise concepts that have long been considered important by improvement and implementation scientists. This review revealed major opportunities for improvement in reporting of the QIC intervention. Deficiencies were noted on three levels: the rationale or theory for choosing a QIC approach, the structures and processes for initiating and running the collaborative itself, and the adoption and sustained engagement of the practices targeted by the collaborative. Few papers provided sufficient detail to determine whether this model was followed with fidelity, what occurred at the learning sessions, how coaching was deployed between learning sessions, the expertise of the coaches, the composition of the teams or their level of institutional support. As Wilson et al noted, evaluating the impact of QICs is problematic as what is inside the box varies considerably. 93 The publication of QIC protocols would be very useful to guide future implementers.

However, all reports included the core elements of a QIC: having a clear aim, utilising QI methods and providing a set of structured activities that engaged multiple teams in iterative shared learning. It may be that the components are not as important as how they affect teams and institutions. It may be that the value of a QIC is more in the ensuing culture change where all staff can participate in change efforts.3 According to Kilo,94 a QIC provides a new cooperative space driven by an explicit aim statement that aligns with values and drives intrinsic motivation to do better for the patients in our care.94

QICs are vulnerable to a variety of biases. Attrition bias has been a major issue in QICs in the past. In 2002, Øvretveit et al estimated that up to 30% of organisations would drop out of a collaborative before the end of the initiative.5 Our findings are more encouraging; over three quarters of the included studies had 80% or greater follow-up of sites. Other risk of bias exists. Some of the RCTs lacked information on randomisation and allocation concealment—factors linked to an overestimate of effects.95 The lack of comparators in many of the included ITS studies left them vulnerable to selection bias, as motivated, high-performing sites may volunteer preferentially for a QIC—an ‘enthusiast’ effect. Both RCTs and CBAs tended to provide sparse baseline data on control sites, complicating efforts to ensure the comparability of intervention and control groups. Furthermore, the very nature of QICs precludes postassignment blinding. Most of the QICs depended on the collection of self-reported clinical data without independent audit or blinded assessors, although a few were able to collect well-defined objective outcome data through automated systems.

We followed a thorough approach to conducting this systematic review using standardised methods and criteria for including studies. Building on the work by Schouten et al,12 we have extended the previous systematic review that included papers published from 1995 to 2006. All data were abstracted in duplicate and consensus reached at every step. However, we did not search all electronic publication databases and only included English language studies. Furthermore, we did not systematically search the grey literature or follow-up with authors who have published grey literature reviews. Hence, we will have missed many QICs published via organisational reports which might have altered our overall findings. Examples include large-scale collaboratives conducted in Scotland,96 Canada97 and multiple countries through the US University Research Company (URC).98 We also did not evaluate how many of the collaboratives included a formative evaluation. This is especially important since the implementation plan should adapt to local context and interorganisational learning as the work progresses.

A second limitation is the likelihood of publication bias: that QICs with negative findings are less likely to be published than those studies with positive or statistically significant results. However, from our search and exclusions, studies that might have been missed are twice as likely to be uncontrolled before-after studies or ITS studies without sufficient data points.

It is interesting to note that some interventions, including all initiatives addressing CLABSI, had uniformly high levels of improvement. There is general agreement99 100 that successful implementation of improvement interventions is related to the strength of the supporting evidence base, the relative simplicity and practicality of the intervention components and contextual factors that support or hinder implementation.101–103 CLABSI prevention ‘bundles’ have a strong evidence base developed over a number of years, are relatively simple to implement (eg, point-of-care checklists, limited number of straightforward bundle components) and generally are deployed in microsystems (such as ICUs) where implementation of bedside protocols is well established. Moreover, CLABSI was an early and widespread focus for QI, and there has been an expanding dialogue about strategies to accelerate adoption and overcome potential barriers to change.104

It is possible that less time and resource-intensive methods (including virtual and web-based communication platforms) would be effective but evidence is very limited.105 Innovations and interventions that have a weaker evidence base, less field experience, target more heterogeneous microsystems and contexts, and are more complex may be less amenable to the learning collaborative approaches such as the Breakthrough Series (BTS) model, let alone ‘lighter touch’ virtual approaches.

Rapid advances in automated data collection, data registries and technology are likely to revolutionise the nature of QICs. Process and outcome data increasingly will be available from the electronic medical record or data repositories, reducing the burden of chart review and the bias inherent in self-reporting. Some organisations have developed sophisticated and comprehensive clinical registries and are using them to leverage improvement efforts, including benchmarking and feedback loops.6 57 106 The direct involvement of patients in learning and improvement is an especially promising and attractive aspect of virtual shared learning systems.107

## Conclusion

QICs have been adopted widely as an approach to shared learning and improvement in healthcare. The authors of the publications included in this review reported that their QICs were largely effective in reaching their stated aims, with 83% claiming improvement in some of the clinical processes and outcomes investigated. However, enthusiasm for these encouraging findings must be tempered by reflection on the limitations in design and reporting of many of these QICs, as well as likely publication bias. Improvement and implementation scientists need to help the field address significant, persistent gaps in QIC design, quality of reporting, sustainability and cost-effectiveness.

## Acknowledgments

Theauthors would like to acknowledge the initial systematic review by Loes Schoutenand colleagues that formed the starting point for this update.

View Abstract

## Footnotes

• Handling editor Kaveh G Shojania

• Contributors All authors have made substantial contributions to the conception or design of the work; the acquisition, analysis, or interpretation of data for the work; drafting the work or revising it critically for important intellectual content; and final approval of the version to be published. All authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

• Competing interests SW reports grants from The Stevenson Foundation, during the conduct of the study; grants from Health Research Council of New Zealand, grants from Roche Diagnostics Ltd, grants from National Heart Foundation of New Zealand, outside the submitted work; JG reports grants from Stevenson Foundation, during the conduct of the study.OT, MB, DN and DG no competing interests

• Provenance and peer review Not commissioned; externally peer reviewed.