Article Text

Meta-analysis of the central line bundle for preventing catheter-related infections: a case study in appraising the evidence in quality improvement
  1. Perla J Marang-van de Mheen,
  2. Leti van Bodegom-Vos
  1. Leiden University Medical Centre, Medical Decision Making, J10-S, Leiden, The Netherlands
  1. Correspondence to Dr Perla J Marang-van de Mheen, Leiden University Medical Centre, Medical Decision Making, J10-S, Leiden 2300 RC, The Netherlands, p.j.marang{at}lumc.nl

Abstract

Background The central line bundle to reduce central line-associated bloodstream infections (CLABSI) is widely regarded as one of the most evidence-based quality improvement (QI) interventions. Yet, two high-quality trials reached different conclusions about its effectiveness.

Objective To assess the overall evidence on the effectiveness of the central line bundle and also to illustrate issues related to appraising the effectiveness of QI interventions.

Methods We searched the English-language literature (MEDLINE to Sept 2014) for prospective evaluations of the central line bundle (hand hygiene, chlorhexidine skin antisepsis, maximum sterile barrier precautions, optimal catheter site selection, daily review of line necessity) on CLABSI. Mantel–Haenszel risk ratios were calculated using a random effects model. Risk of bias was assessed on five domains: comparability of subjects, definition of intervention, assessment of outcome, statistical analysis and co-interventions/heterogeneity. Strength of the evidence was assessed following the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) approach, a widely recommended framework for assessing the robustness of treatment effect and the likelihood of change as a result of future studies.

Results Across 59 studies, the central line bundle effectively reduced CLABSI by 56% (relative risk 0.44 (95% CI 0.39 to 0.50)). Studies that assessed bundle compliance at the individual patient level reported slightly higher reductions than other studies. Considerable heterogeneity was present in most subgroups. Most studies had unclear or high risk of bias, with only six (10%) studies exhibiting low risk of bias on at least four domains without any high risk. In this subset of higher-quality studies, the reduction was 52% (95% CI 32% to 66%) without heterogeneity. Applying the GRADE framework, the overall strength of the evidence was low, but moderate in quality for the six high-quality studies. This rating is typically interpreted as meaning that further research is likely to have an important impact on our confidence in the effect estimate and may change the estimate.

Conclusions That the central line bundle could receive only a moderate evidence rating may suggest that the GRADE framework, developed mostly for traditional clinical therapies, requires modification for QI interventions. GRADE does not distinguish prospective trials (eg, controlled before-after studies and interrupted time series) from lower-level observational studies. On the other hand, that the two highest quality studies reached different conclusions makes it difficult to conclude that future research would not change the effect estimate, especially given evidence of secular trends and the variability of co-interventions to ensure bundle compliance, which created heterogeneity across studies.

  • Evaluation methodology
  • Healthcare quality improvement
  • Implementation science
  • Checklists

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

For years, there have been many quality improvement (QI) initiatives, for different patient populations and focused on different outcomes. Well-known examples include the implementation of surgical checklists1 and the sepsis and central line bundles.2 ,3 These initiatives aim to reduce the occurrence of adverse events and improve patient outcomes, thereby making healthcare safer and improving patients’ health. However, questions may arise if studies have different results on the effectiveness of the intervention. In these circumstances, a meta-analysis is often performed in traditional medicine to come up with an overall estimate of the effectiveness and to assess the quality of the evidence.

Within the general context of appraising the evidence base for QI,4 the question is whether a meta-analysis can and should also be performed regarding QI interventions. As argued previously, QI interventions should undergo the same rigorous testing and rest on a similarly strong evidence base as medical interventions.4 ,5 This suggests that a meta-analysis should be performed in case of different study results and may contribute to the evidence base for QI interventions in the same way as for any other intervention.6 ,7 The published user guide ‘How to use an article about quality improvement’8 shows which factors affect the quality of an individual study. In addition, the Standards for Quality Improvement Reporting Excellence guidelines9 and the Strengthening the Reporting of Observational Studies in Epidemiology statement10 both have given guidance on how to report the results of QI reports, to enable data extraction from different studies when performing a meta-analysis. However, to define the quality of a study in a systematic review and meta-analysis, we need to define the critical elements that determine study quality (and minimise risk of bias),11 applicable to the range of study designs used in QI. Consequently, the strength of the total evidence can be graded, that is, whether future studies are likely to change the conclusions or not, to inform decisions on implementation.12

Looking at the central line bundle, being one of the most well-known and successful QI interventions we know, many studies have reported on the effectiveness since the Michigan Keystone project,3 but with different study designs and results. A meta-analysis by Blot et al13 showed that bundled or checklist interventions in before-after studies have a stronger effect on reducing central line-associated bloodstream infections (CLABSI) than non-bundled interventions. However, studies published more recently with stronger designs were not included in this meta-analysis. For instance, the Matching Michigan study in the UK—using a non-randomised stepped-wedge design—recently showed that the reduction of CLABSI is likely to be part of a wider secular trend.14 On the other hand, the results of a recent cluster randomised controlled trial (RCT) showed the central line bundle to be effective and a causal relationship when the intervention was also applied to the control group.15 Furthermore, Blot et al only included studies among adults performed in the intensive care unit (ICU) setting, whereas we might expect stronger impact if implemented in an entire organisation. In addition, they did not assess risk of bias in individual studies and the strength of the total evidence, which may inform us whether future studies are likely to change the estimated effectiveness. This calls for another meta-analysis to review the evidence, but also raises the question whether to include all studies or only the high-quality studies.

In the present study, we perform a meta-analysis on studies reporting the effectiveness of the central line bundle to reduce CLABSI and assess the strength of the evidence. The central line bundle is used for illustrative purposes to highlight the issues and questions one faces when trying to appraise the evidence regarding effectiveness of QI interventions.

Materials and methods

Systematic literature search

To gather all studies reporting on the effectiveness of the central line bundle and reduction of CLABSI, a systematic literature search was carried out in PubMed (September 2014) by a trained librarian (see online supplementary appendix). These references were reviewed independently by two reviewers on title and abstract. The article was selected if the study fulfilled the following criteria:

  • prospective study;

  • central line bundle or checklist including the Institute for Healthcare Improvement elements (ie, hand hygiene, chlorhexidine skin antisepsis, maximum sterile barrier precautions, optimal catheter site selection, daily review of line necessity);16

  • reporting on effectiveness to reduce CLABSI;

  • language restriction: English.

Differences between reviewers were resolved by discussion until consensus was reached. If different articles described the same study or patient population, the article with the longest duration of follow-up was selected. The following data were extracted from each study:

  • first author, year of publication, country;

  • study design;

  • patient population (adults, paediatric/neonatal or both) and setting (ICU, non-ICU or both);

  • definition of intervention: checklist or bundle, which elements, compliance assessment;

  • definition of outcome central line infection, who collected data, independent from intervention or not;

  • year of intervention, duration of follow-up;

  • number of CLABSI and catheter days baseline/control group and intervention group (or sufficient data to calculate these numbers).

A study was considered an interrupted time series if it had at least three time points before and three time points after the intervention. The number of catheter days and CLABSI was calculated from the charts in the periods as analysed. For controlled studies, the total number of CLABSI and catheter days before and after implementing the intervention were added to take into account possible differences in CLABSI rates at baseline and to reflect the impact of the intervention on the average CLABSI rate in both groups. A study was considered to have a stepped-wedge design if each group contributed to both the control and intervention group in the analysis.

For the study of Bion et al,14 data over time were available as an appendix in which it was specified when the different clusters started. We assumed that the post-intervention number of CLABSI and catheter days contributed by a cluster would remain the same on average when subsequent clusters started, given that the authors reported each successive cluster joining at an entry level close to the post-intervention level of the previous cluster.

Rating the quality of individual studies and grading the evidence

As is common in conducting meta-analyses, the quality of included studies was assessed based on the extent to which the design and conduct of a study have been shown to protect against different types of bias.17 Specifically for QI, it has been shown previously which domains need to be evaluated on risk of bias, to be able to grade the quality of individual studies.18 ,19 As most QI initiatives are observational studies, the following critical domains can be used to rate the quality of individual studies18:

  1. Comparability of subjects: randomisation and allocation of patients in comparison groups, similarity of groups at baseline (to avoid selection bias). Low risk if randomised or if statistical testing shows no difference in patient population.

  2. Intervention: clear definition, who assessed compliance with intervention and was compliance reported (to avoid performance bias). Low risk if compliance was assessed and/or reported for every patient.

  3. Outcome measures: clear definition, standardised measurement independent from intervention assessment, who collected outcome data and were they trained (to avoid detection bias). Low risk if outcome was assessed by other person than who assessed whether intervention was applied, and following clear definition and standardised measurement.

  4. Statistical analysis: missing data, loss-to-follow-up, appropriate (reporting of) statistical analysis (to avoid attrition and reporting bias). Low risk if all patients were included in the analyses, and high risk if no statistical testing was applied to assess differences.

  5. Heterogeneity or co-interventions: check for homogeneity of the data and co-interventions, as heterogeneity is often encountered in QI studies. Low risk if only education/training was given or other elements being part of the bundle like a line supply cart. High risk if significant heterogeneity in effect was reported, the presence of (financial) incentives or a change in catheter type, which are likely to bias the effect estimate.

Each domain was scored as low, high or unclear risk of bias, independently by two reviewers. Differences between the reviewers were resolved by discussion until consensus was reached. The same domains can be used if both RCTs and observational studies are included, so that risk of bias is assessed in the same way for all studies.

Consequently, the strength of the total evidence was assessed using the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) approach,12 where grades of evidence are defined as:

  • High quality: further research is very unlikely to change our confidence in the effect estimate.

  • Moderate quality: further research is likely to have an important impact on our confidence in the effect estimate and may change the estimate.

  • Low quality: further research is very likely to have an important impact on our confidence in the effect estimate and is likely to change the estimate.

  • Very low quality: we are very uncertain about the estimate.

The GRADE approach and the underlying principles have been adopted by many organisations including the WHO, Cochrane collaboration, BMJ Clinical evidence and the National Institute for Health and Care Excellence. Having a clear set of rules to be able to judge the quality of the evidence clearly reduces the subjectivity that would otherwise be present in such an assessment. An initial grade is given based on the study design of most included studies: high-quality evidence if only randomised trials are included and low-quality evidence for observational studies. Subsequently, this initial grade can be upgraded or downgraded. In the GRADE approach, downgrading is possible on five items:

  • Risk of bias of included studies: based on the assessment of two independent reviewers of each individual study, with differences resolved by discussion. If most studies had unclear or high risk of bias, we downgraded by one level.

  • Inconsistency or unexplained heterogeneity in study results: assessed by looking at differences in the effect estimates and at the heterogeneity within subgroups. If there was heterogeneity without a plausible explanation that affected the interpretation of the effect (ie, whether the intervention results in benefit or harm), we downgraded by one level.

  • Indirectness of the evidence: downgrading by one level in case of indirect comparison of intervention or population.

  • Imprecision of study results: downgrading by one level in case of wide CIs surrounding the effect estimates, due to few studies or few events.

  • Publication bias: given the problems in detecting publication bias by using funnel plots either by visual inspection or by statistical tests,20 no downgrading was performed on this item, thereby assuming no publication bias.

Upgrading is possible on three items:

  • The magnitude of the effect: upgrading by one level if the overall effect estimate is larger than two or smaller than 0.5, by two levels if effect estimate is larger than five or smaller than 0.2.

  • Plausible confounding would change the effect: upgrading by one level if all plausible confounding would reduce the magnitude of effect or show a spurious effect if the results show no effect, provided no downgrading on any of the items. An example of plausible confounding would be that only sicker patients receive the intervention.

  • A possible dose–response gradient: apply only if no downgrading on any of the items.

Analysis

All extracted data were entered and analysed in Review Manager 5.3. Data were pooled using a random effects model, and Mantel–Haenszel risk ratios were calculated. Effect estimates for individual studies may differ from those reported in the original papers, as these were often adjusted for other factors. Heterogeneity was tested using the I2 statistic, which can be interpreted as the percentage of the total variability in a set of effect sizes due to between-studies variability. We considered the results to be heterogeneous when I2 was 50% or higher, defined as representing substantial or considerable heterogeneity.11

If heterogeneity was present, subgroup analyses were conducted. The following subgroup analyses were defined a priori: by patient population (adult vs paediatric/neonatal vs all patients), by study design as weaker designs are known to overestimate the effectiveness of interventions and by setting (ICU vs non-ICU vs all). In addition, we conducted subgroup analyses by study quality based on the assessed risk of bias domains to take into account that there may be good quality studies with weaker study designs and vice versa and because we hypothesised that studies with low risk of bias might result in more homogeneous effect estimates. Analyses were conducted for each domain separately and across multiple domains as an indicator for high-quality studies (at least four domains low risk and no domain high risk vs all other studies).

Finally, to gain more insight in causes of heterogeneity between studies, we compared the two studies with the strongest study design on factors associated with bundle compliance (elements included, co-interventions to ensure compliance) and with the outcome measure (definition, inclusions/exclusions, baseline rates and effectiveness).

Results

The search strategy resulted in 491 references. Based on title and abstract, 63 articles were selected.3 ,14 ,15 ,21–80 Cross-checking references resulted in an additional five studies81–85 so that 68 full-text papers were examined. Based on the full-text paper, 59 studies were selected (see online supplementary appendix table). Reasons for exclusion were: other article has more complete data and/or reports on longer follow-up (five studies3 ,40 ,64 ,69 ,77), and does not fulfil selection criteria (four studies44 ,66 ,74 ,75).

Risk of bias

The risk of bias for individual studies is shown in figure 1. For each of the five risk of bias domains (A–E), low risk of bias is indicated by a green circle, unclear risk by a yellow circle and high risk by a red circle. Ten studies (17%) had high risk of bias on comparability of subjects as they reported significant differences between intervention and control patients. Regarding definition and compliance with the intervention, 14 studies (24%) had high risk of bias mostly because compliance was not monitored for every patient or reported. Regarding definition and measurement of the outcome, many studies (44%) had unclear risk of bias, mostly because it was not clear whether outcome measures were assessed and collected independently from the intervention. Most studies (75%) had low risk of bias regarding missing data and statistical analysis, but this assessment has been quite mild as hardly any study explicitly stated how they made sure including all potential patients and collecting all checklist forms. High risk was assessed in 13 studies (22%) where no statistical testing was done. With respect to co-interventions and heterogeneity, most studies had unclear (37%) or high (39%) risk of bias. Aggregating over all domains, only six (10%) studies had low risk of bias on at least four domains without any high risk. This percentage was lower among uncontrolled before-after studies (3 of 49, 6%) than among other designs (3 of 10, 30%).

Figure 1

Effectiveness of the central line bundle to reduce central line-associated blood stream infections, all studies including risk of bias assessment.

Effect of intervention

For four included studies, no numbers of CLABSI and catheter days were reported or could be calculated. These were two uncontrolled before-after studies,32 ,34 one controlled before-after study79 and one cluster RCT.15 This leaves 55 studies for which the data were pooled in a meta-analysis. These studies represented 9 615 904 catheter days, of which 5 125 171 during an intervention and 4 490 733 in the baseline or control period. During the intervention, 8653 CLABSI were observed (1.69 per 1000 catheter days) versus 10.477 in the control group (2.33 per 1000 catheter days). On average, the central line bundle reduced CLABSI incidence by 56% (figure 1, relative risk (RR) 0.44 95% CI (0.39 to 0.50)). Considerable heterogeneity was present in the effect estimate (I2=90%).

Subgroup analyses revealed that the effect varied significantly by patient population and setting (table 1), with the strongest effect in studies including all patients (63% reduction) and performed in the ICU (58% reduction). The effect also varied by study design (table 1), with higher reductions in before-after studies than in the other (stronger) designs. Unfortunately, data from the only cluster RCT could not be used, with low risk of bias on four domains (figure 1), as only rates were reported so that it was not known how many catheter days or CLABSI should be included to contribute to the overall effect. However, as this RCT showed a 70% reduction in the intervention group against 21% in the control group,15 this suggests that 49% reduction can be attributed to the intervention, similar to the 48% reduction in controlled studies in our meta-analysis. Considerable heterogeneity was still present in most subgroups, except in studies performed in the non-ICU setting or entire hospitals.

Table 1

Effectiveness of the central line bundle to reduce central line-associated blood stream infections, in various subgroups

Table 2 shows the differences in effectiveness for each risk of bias domain. Only studies with low risk of bias in the assessment of the intervention, meaning that bundle compliance was assessed and/or reported for every patient, seem to report a higher reduction (61%) in CLABSI. However, considerable heterogeneity was still present in this subgroup. The subgroup of studies with low risk of bias due to co-interventions or reported heterogeneity showed homogeneous results (48% reduction). Furthermore, looking across risk of bias domains as an indication for high-quality studies, a homogeneous estimate was also found for studies with low risk of bias across four domains and no high risk in any domain, indicating a 52% reduction (table 1).

Table 2

Effectiveness of the central line bundle to reduce central line-associated blood stream infections, by risk of bias domains

Comparing the two studies with the strongest study designs, the (non-randomised) stepped-wedge design by Bion et al14 and RCT by Marsteller et al15 give more insight in the reasons for the considerable heterogeneity as well as for the difference in reported effectiveness (table 3). With respect to the intervention, only the Marsteller trial assessed bundle compliance for each patient by an observer. This is likely to explain part of the difference in effectiveness, given our result on the larger reduction in CLABSI reported in studies with low risk of bias on this domain. Furthermore, different co-interventions and heterogeneity were reported, of which a difference in the blood culture sampling rates may be a specific cause for unexplained heterogeneity (reported by Bion et al but not in any of the other studies). A strong point is the randomisation in the Marsteller trial, but it is surprising to find a large difference in mean CLABSI rates at baseline, which is more than half the reduction found in the intervention group in the initial period. Furthermore, a reduction in CLABSI was also observed in the control group (21%), consistent with the findings of Bion et al reporting a general secular trend. In the Bion study, a distinction was made between ICU-acquired versus pre-ICU infection, whereas the early infections were excluded in the Marsteller trial. Bion et al reported a similar decline in both pre-ICU and ICU-acquired infections, but these were not reported by Marsteller. So it is not known whether a similar trend in these early infections would also be found in the Marsteller trial data, consistent with a general secular trend.

Table 3

Comparison of studies with strongest study design

Grading the strength of the evidence

As only non-randomised studies were included in the meta-analysis, the initial grade given by the GRADE approach is low-quality evidence. This initial grade was downgraded one level as the majority of the studies had unclear or high risk on multiple domains. No downgrading was performed on inconsistency, even though subgroup analyses revealed significant differences and considerable heterogeneity, because these were differences between larger or smaller benefit and not between benefit and harm.12 Indirectness of the evidence and imprecision of the effect estimate were both not applicable here. We upgraded one level based on the magnitude of the effect, given the overall effect estimate of 0.44 and that the effect estimates in most subgroups were also lower than 0.5. As we downgraded the evidence based on risk of bias, upgrading regarding plausible confounding or a dose–response gradient was no longer applicable. This resulted in a final assessment of low-quality evidence, meaning that further research is very likely to have an important impact on our confidence in the effect estimate and is likely to change the estimate. If we would only include high-quality studies (low risk of bias on at least four domains and no high risk), the resulting grade would have been moderate quality evidence. For this selection of studies, no downgrading would be performed given the low risk of bias and upgrading by one level due to the magnitude of the effect (overall estimate of 0.48). As not all studies in this selection reported the compliance with the intervention for all patients, a possible dose–response gradient could not be studied so no further upgrading was performed.

Discussion

The present study has shown that the central line bundle effectively reduces CLABSI, on average, by 56%. Subgroup analyses showed that this effect varied by patient population and setting, with the largest reductions among studies including all patients (63%) and performed in the ICU setting (58%). Studies with bundle compliance assessed for every patient seem to report a higher reduction (61%) than other studies. Considerable heterogeneity was present in most subgroups. Looking across risk of bias domains as an indicator for high-quality studies, only six (10%) studies had low risk of bias on at least four domains and no high risk, reporting a 52% reduction and no heterogeneity. Including all studies, the resulting strength of the evidence is low-quality evidence due to risk of bias in many studies. Including only a selection of high-quality studies, strength of the evidence is still only moderate quality evidence, meaning that further research is likely to have an important impact on our confidence in the effect estimate and may change the effect estimate.

Limitations of this meta-analysis include that bias may have occurred due to confounding. In the original papers, effect estimates were in some cases adjusted for confounding. This may result in different study-specific estimates but could also influence our overall effect estimate. By conducting subgroup analysis, we aimed to limit this effect, but as heterogeneity was still present within most subgroups, it is likely that some confounding is still present. Furthermore, strength of the evidence as assessed by the GRADE framework was determined largely by the initial grade of low-quality evidence given to observational studies, which will often be the case for QI interventions. The GRADE approach does not distinguish within non-randomised designs, assigning higher quality to stronger non-randomised designs like stepped wedge and time series, than to uncontrolled before-after studies. This raises the question whether the GRADE framework, developed mostly for traditional clinical therapies, requires modification for QI interventions. It seems that to obtain a high-quality rating for non-randomised studies any downgrading should be avoided and a very large effect size is needed (to upgrade by two levels or by one level combined with plausible confounding being present). So one could argue that most QI interventions are likely to have low ratings, even when multiple studies of fairly high design support an effect, or moderate ratings if very large effect sizes are found. On the other hand, looking at the central line bundle, the two studies with the strongest designs reached different conclusions, which seems consistent with some uncertainty in the effect estimate. Furthermore, particularly given the evidence of secular trends and the variability of co-interventions to ensure bundle compliance, as reasons for heterogeneity across studies, it seems to make sense that the resulting assessment is that further research is likely to affect our confidence in the effect estimate and may change the estimate. The results of this meta-analysis may help to shape future studies to have lower risk of bias on all domains, but also by identifying new questions, for example, on the importance of specific co-interventions for the resulting effect estimate.

Various implications for clinical practice as well as future research can be identified. This meta-analysis shows that larger reductions in CLABSI were reported when compliance with the intervention was checked for every patient. The remaining heterogeneity may be explained by varying compliance rates both within and between the studies (from 0% to 99%, or not reported specifically) with more to gain if pre-intervention compliance is low. Future studies are thus recommended to report the exact compliance before and after the intervention, rather than only monitoring the outcome, to enable assessment of a dose–response gradient, that is, whether stronger improvement in compliance is associated with stronger reductions in CLABSI. At the same time, baseline CLABSI rates also varied strongly even within studies with low risk of bias on this domain, from 0.1/100080 to 22.7/100035 catheter days, and stronger reductions may be found in studies where baseline rates were high. It seems likely that high baseline infection rates and low pre-intervention bundle compliance are associated, but can only be disentangled if future studies report both consistently. The same is likely to apply for other QI interventions.

Another source of remaining heterogeneity is that the reported CLABSI rate is affected by the extent to which cultures are taken, which may vary greatly between studies. Similarly, the extent to which early infections are included, not likely to be related to insertion of the central line, is also not consistently reported and may vary between studies. This relates to the question whether the control group may be contaminated due to some centres already starting to adopt the intervention as part of a secular trend, as suggested by Bion et al.14 Similar trends can be seen in other studies, for example, Berenholtz et al79 showed a 28% reduction in the control group and Marsteller et al15 a 21% reduction. Part of the reduction estimated in our meta-analysis is thus likely to be due to a secular trend. Furthermore, treatment in the control group may change over time, with (elements of) the bundle being applied as usual care so that an RCT with a control group without the bundle being applied may not be possible. This is also common in other areas in medicine and not specific for QI interventions. It is only a reminder to be specific on the compliance or treatment in the control group because this may otherwise result in confounding and lower reported effectiveness. All of these sources of heterogeneity may impact the effectiveness of the central line bundle, consistent with the meaning of low-quality evidence, that is, that future studies may influence our confidence in the effect estimate and are likely to change the estimate.

Context is usually put forward as being particularly relevant for the effectiveness of QI interventions.6 ,86–88 Many of the included studies in this meta-analysis are conducted in multiple centres or even multiple countries, but still report the central line bundle to be effective. So context does not determine whether the bundle is effective, but may be important for the magnitude of the effect. Variation in co-interventions between different contexts may be partly responsible for the heterogeneity in effect, even though their specific influence is usually not assessed,87 and particularly relevant for QI interventions as we cannot assume them to happen equally in both intervention and control groups. For example, we considered (mandatory) education, training sessions and a line supply cart as an integral part of bundle implementation, and thus low risk of bias. However, factors such as (financial) incentives for senior leaders, mandatory exams that professionals had to pass or the parallel introduction of, for example, antibiotic impregnated catheters, were considered to induce risk of bias consistent with the previously noted domain of external factors determining the effectiveness of QI interventions.6 ,88 This may have considerable impact on the effectiveness of the central line bundle and their impact may even vary between different centres, thereby creating heterogeneity particularly in multicentre studies. Other factors like the sense of urgency, leadership or safety culture are far more difficult to grasp but are possibly even more important for the success of the intervention.88

The question is what the active ingredient of this intervention is or that if it works particularly if combined with co-interventions.89 ,90 If the co-interventions are crucial for the intervention to have its effect, this will determine the generalisability to other hospitals and context.87 Studies have shown that the approach initiated by Pronovost could be generalised but often with lower effectiveness.15 ,24 ,31 ,51 However, the Matching Michigan study was not able to do that, likely due to a more top-down character rather than being a bottom-up initiative of professionals who experienced sense of urgency to improve quality.91 Given the results of the current meta-analysis, not assessing the bundle compliance for every patient may also have been an important factor. As has been put forward by others, the checklist or bundle itself may not always be context-free and therefore be only part of the effect of the intervention.91 ,92 The work by Dixon-Woods shows how mixed methods may contribute in disentangling these crucial qualitative factors in the context.91 Taken together, this will contribute to updating the theory on how the intervention works and achieves its success when implemented in various contexts.4 ,87 ,90 Only then are we truly able to assess what the actual intervention is and how it can be successfully generalised to other situations with similar effectiveness.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors PMvdM conceived the study. PMvdM and LvBV reviewed all references, extracted data and rated studies on risk of bias, PMvdM carried out the analyses and wrote the first draft. Both authors have read and approved the final version of the manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.