Background The term continuous quality improvement (CQI) is often used to refer to a method for improving care, but no consensus statement exists on the definition of CQI. Evidence reviews are critical for advancing science, and depend on reliable definitions for article selection.
Methods As a preliminary step towards improving CQI evidence reviews, this study aimed to use expert panel methods to identify key CQI definitional features and develop and test a screening instrument for reliably identifying articles with the key features. We used a previously published method to identify 106 articles meeting the general definition of a quality improvement intervention (QII) from 9427 electronically identified articles from PubMed. Two raters then applied a six-item CQI screen to the 106 articles.
Results Per cent agreement ranged from 55.7% to 75.5% for the six items, and reviewer-adjusted intra-class correlation ranged from 0.43 to 0.62. ‘Feedback of systematically collected data’ was the most common feature (64%), followed by being at least ‘somewhat’ adapted to local conditions (61%), feedback at meetings involving participant leaders (46%), using an iterative development process (40%), being at least ‘somewhat’ data driven (34%), and using a recognised change method (28%). All six features were present in 14.2% of QII articles.
Conclusions We conclude that CQI features can be extracted from QII articles with reasonable reliability, but only a small proportion of QII articles include all features. Further consensus development is needed to support meaningful use of the term CQI for scientific communication.
Statistics from Altmetric.com
Continuous quality improvement (CQI) represents a set of methods for improving healthcare1–4 that originated from industrial process improvement approaches.5 6 One evidence review describes CQI as ‘a philosophy of continual improvement of the processes associated with providing a good or service that meets or exceeds customer expectations’.7 Although a useful starting point, this definition has not emerged from formal consensus processes, has not been tested for reliability, and may therefore be difficult to operationalise in evidence syntheses. Greater consensus on key features of CQI that could be reliably operationalised would improve the reporting, cataloguing, and systematic review of CQI interventions.
We acknowledge that meanings fluctuate over time.8 9 The term CQI has a complex heritage from use in both industry and healthcare, and seeking to create a normative definition may perturb this evolution.9 Science, however, depends upon clear word usage for communication, and efforts to understand scientific meaning have often promoted scientific development in both clinical10–12 and methodological13–19 domains. In the work presented here, we aimed to understand the current usage of the term CQI as a step towards improving scientific communication in the quality improvement field.
This work is part of the ‘Advancing the Science of Continuous Quality Improvement’ (ASCQI) Program funded by the Robert Wood Johnson Foundation, a US-based healthcare-oriented philanthropic organisation. One ASCQI aim was to ‘develop methods, tools and standards for the design, conduct and reporting of CQI research and evaluations, including standardised typologies, definitions and measures of key concepts and consensus statements’.20 Towards that aim, this study developed a screen for CQI features, tested it for reliability, and applied it to electronically identified quality improvement intervention (QII) articles to assess which key CQI features are most commonly present in today's quality improvement literature.
We first elicited a broad range of existing definitions for CQI, distilled them into candidate key features, and engaged an expert panel to rate and refine the features. We then used a previously published QII definition21 as the basis for a QII screening form and applied it to articles from a broad electronic search. Finally, we operationalised the highest-scoring consensus-based CQI features as an assessment form and applied it to the QII article set.
Identification of potential key features of CQI
To identify key features of CQI, we conducted a simplified, sequential group consensus process, similar to a repeated focus group with feedback. We organised a 12-member expert panel, intentionally encompassing a diverse range of methodological perspectives representing both quality improvement and research expertise. Individual experts included process consultants, researchers and institutional decisionmakers from both the USA and the UK; several additionally serve as editors of clinical or quality improvement journals (see ‘Acknowledgements’ for complete list). To begin generating potentially definitional features of CQI, Robert Wood Johnson Foundation staff reviewed grant applications to the ASCQI Program and abstracted 48 phrases used by applicants to define ‘CQI’. Two authors (LR, SH) independently reviewed these phrases to ascertain common themes, reconceptualised them as a list of unique, potentially definitional features, and then met to discuss and reach agreement on the list.
The expert panel then completed an online survey of the features, reviewed survey results, and discussed the results on two conference calls. The survey asked, for each feature: ‘Is this feature necessary (definitional) for CQI?’ (5=definitely; 4=probably; 3=no difference; 2=probably not; 1=definitely not). The survey and discussion process enabled the addition of features to the original set from other sources, as suggested by the panel or research team.20–29 Table 1 lists 12 features (A–L) finalised when the process had ceased generating additional potential features. Panelists rated the 12 features again at a final in-person meeting, resulting in six features (A, C, D, E, G, K) rated as ‘definitely’ or ‘probably’ necessary (definitional) for CQI (median value ≥4.0). The final column in table 1 shows which items on the final CQI features assessment form reflected each ‘definitely’ or ‘probably’ definitional feature.
Criteria for identifying QII studies
We focused on QII studies that, as described previously,21 addressed effectiveness, impacts, or success; qualitative, quantitative and mixed-methods studies were all considered eligible. Our QII screening form identified articles that: (QII-1) reported on an intervention implemented in or by a healthcare delivery organisation or organisational unit; (QII-2) reported qualitative or quantitative data on intervention effectiveness, impacts, or success; (QII-3) reported on patient (or care giver) health outcomes; and (QII-4) aimed to change how delivery of care was routinely structured within a specific organisation or organisational unit. All four QII criteria had to be present according to two independent reviewers for an article to be included. We thus excluded studies that only reported cost or provider knowledge/attitude measures.21 The fourth criterion, QII-4, conceptually overlaps with potential CQI feature E (‘The intervention aims to change how care is organised, structured, or designed’) as identified by our CQI expert panel. Because we selected articles for our QII sample based on this criterion, 100% of studied articles had this feature.
Criteria for assessing CQI features
With feature E already part of the QII screening form, we then incorporated the five remaining ‘definitely’ or ‘probably’ definitional CQI features into a six-item assessment form (table 2), and refined the form and its guidelines through pilot testing. These five features were CQI-1 (‘The intervention involves an iterative development and testing process such as PDSA (Plan-Do-Study-Act)’), CQI-2 (‘The intervention involves feedback of data to intervention designers and/or implementers’), CQI-3 (‘The intervention uses systematic data-guided activities to achieve improvement’), CQI-4 (‘The intervention identifies one or more specific methods (eg, change strategies) aimed at producing improvement’) and CQI-6 (‘The intervention is designed/implemented with local conditions in mind’). The concept of being ‘data driven’ was a consistent theme of the expert panel discussions, manifested through items CQI-1, CQI-2 and CQI-3, which all reflect data use but do not use the term ‘data driven’. We added CQI-5 as a potentially more direct assessment.
We used a three-point scale with explicit criteria for all items during pilot testing. However, CQI-5 (data driven) and CQI-6 (designed for local conditions) were not reliable in this form. We therefore used a five-point implicit (reviewer judgement-oriented) review scale for these two items. Pilot testing showed better reliability for the three-point and five-point scales than simple yes/no responses.
To further enhance our understanding of the QII and CQI literature, we collected additional information on reviewed articles. We assessed setting, evaluation target (ie, change package, change method, or both), evaluation design (ie, presence/absence of a comparison group), researcher involvement in authorship, results (ie, whether the intervention demonstrated positive effects), and journal type (to explore potential differences in reporting across publication venues). ‘Change package’ describes the set of specific changes for improving care (reminders, tools, or other care model or prevention elements) implemented in a QII, while ‘change method’ describes the approach used to introduce and implement the change package (eg, CQI, Lean, Six-Sigma, Reengineering). For assessing journal type, we characterised journals as clinical (general, nursing, or specialty) or quality improvement/health services research.
QII sample identification and screening
To reflect usual methods for evidence review, we began with electronically searched articles. We developed search strategies for the MEDLINE (Ovid) and PubMed databases based on free text words, medical subject headings, QI intervention components, CQI methods, and combinations of the strategies (Hempel et al, submitted). Searches included a broad range of terms (‘quality’ AND ‘improv*’ AND ‘intervention*’) indicating quality improvement in general, as well as the following CQI-related terms: ‘Plan-Do-Study-Act’, ‘Plan-Do-Check-Act’, ‘Define-Measure-Analyse-Improve-Control’, ‘Define-Measure-Analyse-Design-Verify’, ‘iterative cycle’, Deming, Taguchi, Kansei, Kaizen, ‘six-sigma’, ‘total quality management’,‘quality function deployment’, ‘House of quality’, ‘quality circle’, ‘quality circles’, ‘Toyota production system’, ‘lean manufacturing’ and ‘business process reengineering’. The search resulted in 9427 articles.
To identify candidate QII articles from this set, two authors (LR, PS) used previously described definitions22 to identify 201 potentially relevant titles and abstracts reporting empirical data on a QII from among 1600 randomly selected articles. We then screened the remainder of the 9427 articles using an experimental machine learning algorithm that utilised the manual title/abstract review as a learning set. We added 49 machine-screened articles that screened in at a maximal confidence level. Finally, we added 24 articles recommended by expert panel members as QII examplars, resulting in a total of 272 candidates.
We identified QII articles from among these 272 using the QII screening form with the explicit criteria discussed above (QII-1 through QII-4).21 Two reviewers (MD plus SH or SO) reviewed the full text of each candidate article to apply the screen, and consulted LR for resolution when there was disagreement.
Assessment of CQI features
Two reviewers (YL, RF) pilot tested the initial CQI features assessment on a subset of 45 included QIIs. Two reviewers (YL, SO) applied the final CQI features assessment to the remaining 106 included QIIs.
In calculating consensus results, we adjusted for reviewer effect. Some reviewers consistently rate items lower on a scale (ie, the mean, or midpoint, around which their ratings vary is lower) and some reviewers rate consistently higher.30 Reviewer effect adjustment normalises raters to a common mean. We computed the inter-rater reliabilities of the QII and CQI features assessment using κ statistics (for bivariate assessments) and intra-class correlations (for scales).
We counted a CQI feature ‘present’ in an article if both reviewers rated that feature as ≥2 (on a three-point scale) or ≥3 (on a five-point scale). We weighted items equally and did not prespecify a cut-off for qualifying a study as ‘CQI.’ However, to explore potential cut-off points, we created a composite rating by averaging across all CQI features for each article. We applied cut-offs by using the average composite rating across both reviewers, as well as by requiring both reviewers' composite ratings to independently surpass the cut-off. For composite ratings, we analysed results both with and without items CQI-5 and CQI-6 to account for the use of a five-point scale.
QII screen results
QII screening resulted in 151 included QII articles. Inter-rater per cent agreement for application of the explicit screening form (prior to resolution of disagreements) was 85.7% (κ=0.71). The final inclusion set comprised 106 QIIs. Table 3 shows that most reported QIIs were hospital or outpatient based (56% and 33% respectively). Most studies (77%) reported no comparison group and 83% reported improvements following interventions. About half of the articles involved an author who had a PhD or master's degree; 10% indicated an academic professorial type position. Articles appeared predominantly in clinical journals (64%).
CQI features assessment
Table 4 shows inter-rater reliability (intra-class correlation) and per cent agreement between reviewers for CQI features. Per cent agreement ranged from 55.7% to 75.5% for the six items, and reviewer-adjusted intra-class correlations ranged from 0.43 to 0.62 (in the ‘fair to good’ reliability range).
Among features, feedback of systematically collected data was the most common (64%), followed by being at least ‘somewhat’ adapted to local conditions (61%), feedback at meetings involving participant leaders (46%), using an iterative development process (40%), being at least ‘somewhat’ data driven (34%), and using a recognised change method (28%). Articles in quality improvement or health services research journals reported all CQI features more often than clinical journals, significantly more for two features, feedback of systematically collected data and being at least ‘somewhat’ data driven.
Table 5 shows that 14% of articles included all six CQI features at a score of two or more (see table 2 for scoring). Table 6 shows another approach to assessing cut-offs for considering an article to represent CQI methods. This approach uses a composite rating of ‘CQI-ness’ based on the average score across all features for each article. Based on achieving a composite score of two or more, 44% of QII articles showed some level of CQI-ness, and could be so identified with a κ for reliability of 0.49 (fair reliability). Depending on the cut-off value used, the number of interventions in our QII sample qualifying as CQI interventions ranged from 1% to 44%.
This project used expert consensus methods to develop and apply potential CQI definitional features to a comprehensive sample of QII literature. We found reasonable inter-rater reliability for applying consensus-based features to electronically identified candidate QII articles. This indicates that systematic sample identification of CQI intervention articles is feasible. We found considerable variation in the reporting of individual features.
We aimed to assess the feasibility of creating a consensus-based definition of CQI for evidence review. We found that while experts could agree on a core set of important features, and these features could be reliably applied to literature, few articles contained a consistent core set. Alternatively, we tested a composite measure of ‘CQI-ness’ that reflected the quantity of CQI features reported. We found that this approach was feasible and may be useful for review purposes. This approach has important limitations, however, in that specific features may be of varying relevance depending on the purpose of the review.
As an illustration of the diversity of articles with CQI features, only one article was maximally rated by both reviewers on all features. Nowhere in that article does the phrase ‘CQI’ or even ‘quality improvement’ appear, which shows the disjunct between reporting of CQI features and use of the term ‘CQI’ itself.
During review, we noted that QII articles were inconsistently organised, with important methodological information about the intervention scattered throughout the sections of the articles. For iterative processes and data feedback in particular (CQI-1, CQI-2, and CQI-3), reviewers often had to extract data from tables (eg, monthly infection rates) rather than the main text. Development of a standard order for reporting CQI methods and results might make CQI articles easier to write and review.
Two items, data-drivenness (CQI-5) and degree of adaptation to local conditions (CQI-6), required implicit reviewer judgement due to our inability to develop reliable explicit criteria for assessing them. Some articles, for example, implied data-drivenness by alluding to quantitative audit/feedback mechanisms employed during implementation, but did not display any data. Multisite trials of standardised change packages, as another example, might imply methods for local involvement, but describe local adaptations only vaguely.
An earlier CQI evidence review7 also identified the issue of variable language use and reporting. Efforts to standardise reporting for randomised controlled trials13–15 and QIIs31 have proven useful. Our results support similar efforts for CQI interventions.
This study has limitations. The lack of relevant medical subject heading terms for either QII or CQI, in addition to inherent variation in CQI language use, may have reduced search sensitivity. To address this limitation, we used an inclusive electronic search strategy (Hempel et al, submitted) and additional expert referral of articles. This in turn resulted in a large candidate article set that required substantial screening. The number of electronically generated articles, however, is within the range of major evidence reviews.32–34 We further expect that studies may most likely apply our methods to smaller sets addressing CQI subtopics, such as CQI for diabetes. The expert panel portion of this study is limited by involvement of a small though diverse group of key stakeholders. The purpose of the study, however, was to clarify and describe variations in reporting of key CQI features rather than to propose a final definition.
Currently, given the low agreement on the meaning of the term ‘CQI’, readers can have very little confidence that reviews of CQI interventions will include coherent samples of the literature. Without explicit identification of specific CQI features, reviews will yield uninterpretable results. Continued work assessing CQI features in relevant literature will result in more efficient, effective learning about this important quality improvement approach. Meanwhile, the more explicit CQI authors can be in describing the key features of their CQI interventions,31 the more interpretable and useful the results of their work will be.
The authors would like to thank the members of the CQI stakeholder panel: David Atkins, MD, MPH, Department of Veterans Affairs; Frank Davidoff, MD, Institute for Healthcare Improvement; Martin Eccles, MD, Newcastle University (UK), Co-Editor-in-Chief, Implementation Science; Robert Lloyd, PhD, Institute for Healthcare Improvement; Vin McLoughlin, PhD, The Health Foundation (UK); Shirley Moore, RN, PhD, Case Western Reserve University; Drummond Rennie, MD, University of California, San Francisco, Deputy Editor, Journal of the American Medical Association; Susanne Salem-Schatz, ScD, HealthCare Quality Initiatives; David P. Stevens, MD, Dartmouth University, Editor Emeritus, BMJ Quality & Safety; Edward H. Wagner, MD, MPH, Group Health Cooperative; Brian Mittman, PhD, Department of Veterans Affairs, Greater Los Angeles Healthcare System, Co-Editor-in-Chief, Implementation Science; and Greg Ogrinc, MD, Dartmouth University. They would also like to thank the following individuals, who helped support and guide this project: Breanne Johnsen, Aneesa Motala, Siddartha Dalal, Kanaka Shetty, and Roberta Shanman, RAND; and Mary Haines, Sax Institute (Australia).
Funding Robert Wood Johnson Foundation (RWJF, PO Box 2316, Route 1 and College Road East, Princeton, NJ 08543, USA) grants were given to LR (Grant ID 65113: Advancing the science of continuous quality improvement: A framework for identifying, classifying and evaluating continuous quality improvement studies and Grant ID 67890: Providing a framework for the identification, classification, and evaluation of quality improvement initiatives). RWJF's assistance in collecting data was limited to allowing the authors to glean definitions of continuous quality improvement from grant applications to the Advancing the Science of Continuous Quality Improvement program. RWJF played no role in the study design, in the analysis and interpretation of data, in the writing of the report, or in the decision to submit the paper for publication. Additional funding was provided by the Agency for Healthcare Research and Quality, the Veterans Health Administration, the British Healthcare Foundation and RAND Health.
Competing interests None to declare.
Ethics approval This study was conducted with the approval of the RAND Corporation Human Subjects Protection Committee.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.