Article Text

Download PDFPDF

Creating effective quality-improvement collaboratives: a multiple case study
  1. Mathilde M H Strating,
  2. Anna P Nieboer,
  3. Teun Zuiderent-Jerak,
  4. Roland A Bal
  1. Institute of Health Policy and Management, Erasmus University Rotterdam, Rotterdam, The Netherlands
  1. Correspondence to Mathilde M H Strating, Institute of Health Policy and Management, Erasmus University Rotterdam, PO Box 1738, 3000 DR Rotterdam, The Netherlands; strating{at}


Objective To explore whether differences between collaboratives with respect to type of topic, type of targets, measures (systems) are also reflected in the degree of effectiveness.

Study setting 182 teams from long-term healthcare organisation developed improvement initiatives in seven quality-improvement collaboratives (QICs) focusing on patient safety and autonomy.

Study design Multiple case before–after study.

Data collection 75 team leaders completed a written questionnaire at the end of each QIC on achievability and degree of challenge of targets and measurability of progress. Main outcome indicators were collaborative-specific measures (such as prevalence of pressure ulcers).

Principal findings The degree of effectiveness and percentage of teams realising targets varied between collaboratives. Collaboratives also varied widely in perceived measurability (F=6.798 and p=0.000) and with respect to formulating achievable targets (F=6.566 and p=0.000). The Problem Behaviour collaborative scored significantly lower than all other collaboratives on both dimensions. The collaborative on Autonomy and control scored significantly lower on measurability than the other collaboratives. Topics for which there are best practices and evidence of effective interventions do not necessarily score higher on effectiveness, measurability, achievable and challenging targets.

Conclusions The effectiveness of a QIC is associated with the efforts of programme managers to create conditions that provide insight into which changes in processes of care and in client outcomes have been made. Measurability is not an inherent property of the improvement topic. Rather, creating measurability and formulating challenging and achievable targets is one of the crucial tasks for programme managers of QICs.

  • Quality improvement collaborative
  • effectiveness
  • patient safety
  • collaborative
  • healthcare quality improvement

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: and

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


There is a strong demand for quality improvement, which is often pursued by setting up quality-improvement collaboratives (QICs). Some studies on QICs suggest positive effects in participating organisation, but results are mixed.1–4 Moreover, most studies evaluating QICs investigate their effectiveness rather than describing how the collaborative was designed by programme managers and how conditions for establishing effectiveness were constructed, which seems crucial for interpreting the effectiveness results. QICs differ greatly in topic, improvement methods3 and functioning in different countries and healthcare sectors, which is suggested to influence their effectiveness.5 6 Øvretveit et al3 7 suggest that ‘success’ depends on the ways in which programme managers of QICs deal with challenges interwoven with the topic under study.

A first challenge for collaborative organisers is selecting the topic or quality problem.3 5 It has been suggested that the availability of best practices and evidence of effective interventions for selected topics is essential for creating effective collaboratives.3 Evidence of this suggestion is however lacking.

A second crucial challenge is related to teams' targets, which should be both challenging and achievable.3 Collaborative organisers may consider results from best practices to define what is achievable. The degree to which they establish the right balance between challenge and perceived achievability may determine the effectiveness of the improvement efforts. Moreover, for teams to be successful in the end, they should define their targets early and measure progress regularly.3 This leads us to the third challenge: measurability.

Improvements in processes of care are to be achieved by Plan–Do–Study–Act cycles: stepwise changes to care practices guided by measured results for which appropriate measures and usable data-collection tools should be available. Although the philosophy of Breakthrough is that teams set local goals and report on local indicators, agreeing on a common set of measures which all teams will register enables teams to learn from each other and keeps teams focused on the collaborative target. As effectiveness is based both on the interventions carried out and on the way the improvements are measured,8 it is crucial to explore to what extent collaborative organisers succeed in creating measurability.

Until now, no empirical studies were found comparing different collaboratives to investigate to what extent programme managers meet these challenges in creating effectiveness and how this is associated with variations in success of collaboratives. Our study involves an evaluation of a quality-improvement programme for the long-term care in The Netherlands, called Care for Better, and deals with seven QICs focusing on patient safety and client autonomy. The aim of this multiple case study is to explore whether differences between collaboratives with respect to type of topic, type of targets and measures (systems) are also reflected in the degree of effectiveness.


Setting and design

This study included quality-improvement teams from nursing and residential care homes, home care, and care for mentally and physically disabled participating in a QIC.9 Programme management was in the hands of the long-term care knowledge institute Vilans. The Breakthrough method10 including use of Plan–Do–Study–Act cycles and small-scale testing was used.

Data collection

Outcome data on targeted measures

Table 1 gives an overview of the target, targeted measures, type of measurement and level of data collection of each collaborative as formulated by programme management (more information in appendix 1). In total, 182 teams participated in one of the collaboratives with one or more pilot wards or locations in the same collaborative (n=233 pilot locations). Complete baseline and end-measurement data were available for 139 locations.

Table 1

Overview of care for better collaboratives

Level of evidence was assessed by two researchers independently based on collaboratives' plans of action. The highest level (+) was assigned if evidence-based guidelines, measurements instruments and national measurements were available. A moderate level of evidence (±) was assigned if improvement efforts and good examples were described. The lowest level (−) was assigned if hardly any improvement efforts or good examples were described in the literature.

Survey data

Seventy-five (41%) team leaders returned a end-measurement questionnaire which was sent as part of our evaluation study. Almost 70% were nurses, about 10% were quality workers, 10% were occupational or physiotherapist, and about 10% were executive manager of the pilot location. Unfortunately, we had outcome as well as survey data of only 52 teams, and therefore present the results of these two sources separately.

The team leaders rated several statements on their collaborative on three dimensions: measurability, achievability of the targets and whether or not the targets were challenging (seven-point scale ranging from their ‘strongly disagree’ to ‘strongly agree’). Factor analyses confirmed these three dimensions (data not shown, but available on request). Measurability was assessed by: (1) measuring indicators helps to monitor progress, (2) there were clear agreements on measuring central indicators, (3) programme management offered a standardised set of indicators to monitor progress and compare results, and (4) timely and accurate progress information was available at all times.11 The reliability was 0.86. Achievability was assessed by: (1) collaborative targets are achievable, (2) programme management made clear how to achieve collaborative targets, (3) programme management offered good practices and evidence on achievable results, and (4) programme management gave specific instructions on how to improve interventions.11 The reliability was 0.77. The extent to which the targets were conveyed as challenging by programme management was measured with the item: ‘Programme management set high expectations with regard to performance and improvement possibilities.’11


Mean changes between measurements were examined using paired-samples t tests. To assess teams' effectiveness in terms of achieving collaborative targets, teams were assigned to one of three groups on the basis of changes in a particular measure: improvement (target achieved), deterioration and stable. Teams that started at baseline with a prevalence of 0% were not included in the analyses, since no relative change score can be calculated.


Measurability, achievable and challenging targets

Significant differences in perceived measurability and achievability of targets between the various collaboratives were found (F=6.798 and p=0.000 and F=6.566 and p=0.000 respectively; see table 2). No significant differences between the collaboratives were found with regard to challenging targets. Team leaders in the Problem Behaviour collaborative and the collaboratives on Autonomy and Control scored significantly lower on measurability and challenging targets. Also, with respect to achievability, the Problem Behaviour collaborative scored significantly lower.

Table 2

Descriptives measurability, achievability and challenging targets

Collaborative characteristics and effects on targeted measures

The following section describes whether these differences between collaboratives are also reflected in the degree of effectiveness of each collaborative.

Pressure ulcers

Although the level of evidence on preventing pressure ulcers is high, and team leaders perceive a high measurability, the collaborative target is not perceived to be achievable and challenging. Accordingly, overall effectiveness is moderate. On average, prevalence rates decreased from 18% to 10% (table 3). Only six of the 16 teams improved prevalence by more than 50% and achieved the collaborative target (table 3 last three columns).

Table 3

Descriptive results for different collaboratives

Eating and drinking

For this collaborative, the level of evidence is high. Measurability and achievability are perceived as positive, which is in accordance with the positive results. Overall prevalence of malnutrition significantly decreased from 31% to 24%. Eighteen teams (47.4%) were able to decrease prevalence of malnutrition by 40% or more, thus achieving the collaborative target.

Prevention sexual abuse

Although only a few good examples of how to prevent incidents of sexual abuse are available, team leaders perceived high measurability and achievability of targets. Baseline mean scores for attitude and competence of professionals were moderate, respectively 7.82 and 6.30 on a 1–10 point scale. These scores improved at end-measurement by 9% and 41%, respectively. The baseline mean score on perceived steering of management was very low (2.64 on a 0–10 point scale). On average, participating teams were able to realise 320% improvement on this measure. The percentage of teams achieving the target ranged from 40% to 65%.

Medication safety

This collaborative distinguishes itself positively with respect to achievability of targets. Team leaders rated the achievability of targets with a score of 5.30 (SD 0.44), which is in line with the positive results. On average, teams were able to decrease the number of medication errors by 68%, which is considerably higher than the 30% aimed for. Twenty-six teams (96.3%) were able to realise the target. One team wanted to address the problem of under-reporting, and the team's local target was to increase the number of errors, the opposite of the generic collaborative target.12

Problem behaviour

For this collaborative, good examples for effective interventions were not available. Implementing suitable interventions for individual clients was rather a matter of trial and error. Team leaders perceived a low achievability of targets, thought that results were hard to measure and, targets being not defined, did not find them challenging, which may be related to why only nine of the 14 pilot teams collected complete data. On average, teams monitored 2.71 (SD 1.27) clients intensively. Significant changes between baseline and end-measurement were found (t=2.58 and p=0.016). Teams also collected prevalence data for the whole pilot location. On average, the number of incidents decreased significantly. Since the improvement target was undefined, all (or no) teams achieved the target.

Fall prevention

There was considerable evidence on interventions to prevent fall incidents. Team leaders rated measurability, achievability and challengeability of targets only moderately. At baseline, the average prevalence of incidents was 23%, which decreased significantly to 8%, corresponding with a relative improvement of almost 60%. Furthermore, 19 teams were able to realise the improvement target of 30% decrease.

Autonomy and control

Although there is some evidence on how to improve quality of life and autonomy of clients, and several measurement instruments are available, the Autonomy and Control collaboratives had some difficulty in formulating specific measurable achievable relevant time-based targets and setting up the measurement. Consequently, team leaders scored significantly lower on measurability than in the other collaboratives, which is also reflected in the low percentage of teams with complete data. Scores on achievability and challengeability of targets were lower as well. Client data on five of the six care organisations for physically disabled showed no significant changes in client-centred care. For the other projects on Autonomy and Control, no comparable measures across teams and measurements were used, and thus the effectiveness of these projects cannot be assessed.


Based on a comparison of a wide range of QICs, our study is one of the first that empirically takes into consideration the influence of characteristics of the collaborative to understand the (construction of) effectiveness of different QICs.

One of the often mentioned criteria for being called a collaborative is having a topic with a large gap between current and best practice,3 4 13 and it can be debated whether all Care for Better collaboratives meet this criterion. For some of the topics, it was not clear how large the gap was between current and best practice, since best practices were unavailable. For example, within the Problem Behaviour collaborative, no good examples were available. The first round of this collaborative was used as a stepping stone for later rounds and functioned as a ‘learning laboratory,’14 in which instruments and tools were (co-) designed and tested by the health professionals themselves and resulted in a toolkit for other teams.

As was shown in the study by Benn et al6 and Nembhard,15 quality-improvement methodology, programme faculty support and monitoring are important for success. Our results are in line with this; programme management of several collaboratives experienced difficulties, especially with regard to formulating clear and measurable targets, since they mainly had to rely on literature based on other care sectors, making it hard to translate to long-term care.

The present study shows that the overall results of the Care for Better programme are mostly positive; nevertheless, there is considerable variation between teams and collaboratives. Our results showed, not surprisingly and in line with observations by Øvretveit et al3 and Wilson et al,5 that a high level of evidence for a certain topic does not guarantee effectiveness in improving quality. The converse finding is more revealing: little evidence base may still lead to improvements in processes of care on the condition that programme management reflects upon how achievable and challenging targets should be formulated and which instruments and indicators should be used to demonstrate whether targets are realised.


The lower percentages of complete data for some collaboratives seemed to be indicative of the difficulty programme managers had in selecting indicators and questionnaires, and setting up measurement systems. Outcome data on these collaboratives should be interpreted with caution, although no significant differences in baseline scores between teams with complete data and incomplete data were found.

Some of the outcome indicators were based on self-reported data. The situatedness of improvement efforts calls for simple measurements that are relevant for a specific team, to monitor progress which is an essential part of the improvement method. From an evaluation perspective to enable comparison, however, the (external) validity and reliability of these data are uncertain. This friction is inherent to the evaluation of complex improvement programmes.16

Another limitation is the overall moderate response on the evaluation survey. There may be two explanations: (1) the collaboratives on Pressure Ulcers, Eating & Drinking and Prevention of Sexual Abuse were not informed beforehand about the evaluation; and (2) several team leaders left their job halfway through the collaborative.

As a final limitation, this study did not include control sites, which makes it difficult to rule out possible secular trends. However, randomised controlled trials are hardly an option given the changing and complex features of QICs. Evaluation studies should therefore combine effectiveness measures with detailed descriptions of the programme and its context to understand under what conditions participating teams are able to realise changes in quality of care.


This large multiple-case before–after study suggests that the effectiveness of a QIC is associated with the efforts of programme managers to create conditions that provide insight into which changes in processes of care and in client outcomes have been made. We suggest that in preparing and organising a QIC, programme managers should more carefully consider the type of quality problem or topic addressed and invest in formulating achievable and challenging targets. In the presence as well as the absence of evidence, a crucial task for programme management is to create measurability, as this proved not to be an inherent quality of improvement topics.


The authors thank the participating improvement teams and respondents of the questionnaire.

Appendix 1

Collaborative measurements

In the Pressure Ulcers collaborative, measurements were organised by the National Expertise centre for Nursing and Caring, at baseline, mid-term and the end of the collaborative. The following data were collected three times a week for four successive weeks: (1) the number of useful preventive interventions; (2) the number of unuseful preventive interventions; (3) incidence of pressure ulcers degree 2 or higher (pressure ulcers degree 2 and higher are internationally seen as the most reliable indicator; within the Pressure Ulcers collaborative, also special attention was paid to recognising degree 1, since this was an important point for preventing new incidents.); and (4) prevalence of pressure ulcers degree 2 or higher.

In the Eating and Drinking collaborative, data were used of the annual National Prevalence measurement of Care problems of April 2006 and April 2007. The design of this measurement involves a cross-sectional, multicentre point prevalence measurement. Measures were: (1) 10 structure measures, such as availability of a protocol, signalling system and client evaluation in multidisciplinary teams, (2) the number of clients screened for (risk for) malnutrition and (3) prevalence of malnutrition.

Within the collaborative on Prevention of Sexual Abuse, programme management used a self-developed measurement system. Team leaders monthly rated the attitude and competence of the health professionals working at their pilot location (on average five professionals). The attitude scale consisted of three items to be rated on a 0–10 point scale. The competence scale consisted of five items to be rated on a 0–2 point scale. Scale scores were summed for each professional and then aggregated to the team level, so that the potential range of the attitude and competence scale runs from 0 to 10. Finally, team leaders rated management steering on a 0–10-point scale.

Also within the Medication Safety collaborative, programme management developed their own measurement protocol for recording medication errors called the ‘Post-it measurement.’ Teams were asked to place Post-it stickers on a large sheet of paper each time a medication error was made, which was daily entered in an Excel spreadsheet. Each Post-it described the type of error, that is prescribing error, delivery error or intake error. Four measurements, each over 4 weeks, were executed, of which the first and fourth served as baseline and end-measurements.

To assist teams participating in the Problem Behaviour collaborative, programme management asked teams to place Post-it stickers on a large sheet of paper each time a client of the pilot location showed inappropriate behaviour. Teams were asked to score only two or three problematic behaviours, such as verbal or physical aggression, screaming or claiming behaviour. In addition, teams were asked to focus their interventions on two or three clients and score the number of incidents of problematic behaviour for these clients.

Within the Fall Prevention collaborative, teams were asked to place Post-it stickers on a large sheet of paper each time a fall incident occurred, which were daily entered in an Excel spreadsheet. Each Post-it listed the time the incident occurred, the degree of physical harm and information on what caused the incident. Baseline and end-measurement were used to assess effectiveness.

The Autonomy and Control collaborative comprised four smaller projects for each care sector separately. Within the two collaboratives for nursing homes and residential care homes (n=28), some teams used a quality-of-life questionnaire and some an observation instrument (both part of a Dutch ‘Vision on my own Life’ assessment). Since the content of these two tools is different, no comparison across teams can be done. Care organisation (n=13) for mentally disabled clients were asked to use a quality-of-life questionnaire developed by Vilans project managers. However, only a small number of organisation actually performed the baseline measurement, and programme management decided to withdraw the end-measurement. Instead of collecting quantitative data on quality of life of clients, they asked teams to make a portfolio with examples of interventions, illustrations of achieved improvements and storeys of clients' experiences.

Within the project for care organisation for physically disabled clients, teams were asked to use the Client Centered Care Questionnaire17 to assess perceived client-centredness of nursing care. Instead of collecting quantitative data on quality of life of clients, care organisation for mentally disabled clients were asked to make a portfolio with examples of interventions, illustrations of improvements they were able to realise and stories of how clients perceived improvements in quality of life and quality of care.



  • Funding The Care for Better programme and the evaluation study are funded by The Netherlands Organisation for Health Research and Development (ZonMw grantnr 5942). The researchers of the evaluation study are independent from this funding organisation.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.