Objective Valid, reliable critical appraisal tools advance quality improvement (QI) intervention impacts by helping stakeholders identify higher quality studies. QI approaches are diverse and differ from clinical interventions. Widely used critical appraisal instruments do not take unique QI features into account and existing QI tools (eg, Standards for QI Reporting Excellence) are intended for publication guidance rather than critical appraisal. This study developed and psychometrically tested a critical appraisal instrument, the QI Minimum Quality Criteria Set (QI-MQCS) for assessing QI-specific features of QI publications.
Methods Approaches to developing the tool and ensuring validity included a literature review, in-person and online survey expert panel input, and application to empirical examples. We investigated psychometric properties in a set of diverse QI publications (N=54) by analysing reliability measures and item endorsement rates and explored sources of disagreement between reviewers.
Results The QI-MQCS includes 16 content domains to evaluate QI intervention publications: Organisational Motivation, Intervention Rationale, Intervention Description, Organisational Characteristics, Implementation, Study Design, Comparator Description, Data Sources, Timing, Adherence/Fidelity, Health Outcomes, Organisational Readiness, Penetration/Reach, Sustainability, Spread and Limitations. Median inter-rater agreement for QI-MQCS items was κ 0.57 (83% agreement). Item statistics indicated sufficient ability to differentiate between publications (median quality criteria met 67%). Internal consistency measures indicated coherence without excessive conceptual overlap (absolute mean interitem correlation=0.19). The critical appraisal instrument is accompanied by a user manual detailing What to consider, Where to look and How to rate.
Conclusions We developed a ready-to-use, valid and reliable critical appraisal instrument applicable to healthcare QI intervention publications, but recognise scope for continuing refinement.
- Quality improvement
- Evaluation methodology
- Evidence-based medicine
- Healthcare quality improvement
- Quality improvement methodologies
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
- Quality improvement
- Evaluation methodology
- Evidence-based medicine
- Healthcare quality improvement
- Quality improvement methodologies
Quality improvement (QI) interventions account for substantial investments by organisations aiming to improve healthcare quality, and a large volume of literature documents these efforts.1 QI research necessarily reflects work with organisational context and local environments. QI interventions tend to be complex, multicomponent, often uniquely tailored to settings, and may evolve over time.1 ,2 Intervention details, context and information on the QI process are critical to evaluate the success of QI interventions.
To address the unique requirements of QI research, the Standards for QI Reporting Excellence (SQUIRE) group has developed detailed guidance for reporting evaluations of QI interventions.3 The reporting guideline helps authors describe QI interventions so that they can be identified as such in electronic databases. It aims to ensure readers can understand and appraise the intervention and its evaluation by identifying for authors the details they need to report. However, tools are also needed to guide the critical appraisal of published QI studies. Critical appraisal assesses the quality of publications, informs decisions about applicability of results, and aims to identify high-quality published studies. While reporting guidelines can be aspirational and comprehensive because they are designed for future publications, critical appraisal tools must be applicable to the wide range of completed studies and concentrate on key assessment domains if they are to be useful in practice.
Researchers have frequently questioned the methodological quality of QI studies.4 However, tools widely used for the critical appraisal of clinical interventions, such as the Cochrane Risk of Bias tool,5 may not encompass the domains most relevant to QI research. The lack of a QI-specific focus can limit the ability of researchers, practitioners and policy makers to identify—and learn from—higher quality QI studies.
We have developed the QI Minimum Quality Criteria Set (QI-MQCS) to appraise the quality of QI-specific aspects of QI publications. The QI-MQCS is intended as a resource for reviewers, assisting in synthesising the vast available evidence on QI interventions, and providing a framework for critical appraisal in this complex research area. This article describes the development and evaluation of the QI-MQCS.
Our international workgroup of QI and systematic review experts (subsequently called ‘workgroup’) followed a structured process to develop and evaluate the QI-MQCS. We used a broad and inclusive definition of QI interventions to ensure the QI-MQCS applies to a variety of efforts to change/improve the clinical structure, process and/or outcomes of care by means of an organisational or structural change.
The QI-MQCS reflects core domains developed through literature review, inputs from QI experts and stakeholders, and item development through iterative application to empirical studies. Formal reliability testing and reviewer guidance were used to enable consistent and replicable scoring. We designed the QI-MQCS items to be modest in number to ensure scoring feasibility, have strong face validity with QI stakeholders, meet psychometric standards to enable reliable assessment, avoid repeating internal validity items from study-design specific appraisal tools and applicable to a wide range of QI publications.
The following describes the development of the domains (the content the QI-MQCS aims to cover), the operationalisation as QI-MQCS items (the concrete appraisal questions and scoring criteria), the available tools and resources (the QI-MQCS form and manual) and the psychometric evaluation of the QI-MQCS.
To ensure that the QI-MQCS represents the breadth of relevant domains,6 we first reviewed a wide range of existing tools. We assessed widely endorsed general5 ,7 ,8 and specific critical appraisal tools;9 reporting guidelines for QI and3 ,10 ,11 behaviour change interventions;12 study design-specific guidelines;13 ,14 relevant frameworks such as Reach Effectiveness Adoption Implementation Maintenance;15 and the Medical Research Council (MRC) Guidance for Complex Interventions.16 Relevant resources were identified through a PubMed literature search for critical appraisal and QI; screening EQUATOR-network.com; critical appraisal resources provided by the Center for Reviews and Dissemination, the Evidence-based Practice Center programme of the Agency for Healthcare Research and Quality, the Oxford Centre for Evidence Based Medicine, the National Institute for Health and Care Excellence, and the Cochrane Effective Practice and Organisation of Care (EPOC) Review Group; and existing systematic reviews of critical appraisal and evidence level hierarchies.17–19 In addition, workgroup members assessed all 57 SQUIRE items for their relevance to a critical appraisal instrument. They rated 22 items as important or very important, for example ‘Describes the intervention and its component parts in sufficient detail that others could reproduce it’ and ‘Identifies the study design chosen for measuring impact of the intervention on primary and secondary outcomes’, but rated many other aspects of the reporting guideline as less important (eg, ‘Title states the specific aim of the intervention’, ‘Discussion relates results to other evidence’).
A consensus panel of international technical experts and key stakeholders in QI interventions, informed by the literature review and SQUIRE survey results, established the QI-MQCS domains.20 We elicited the input of this technical expert panel (TEP) through online surveys and in person meetings.21 The project aim was to establish a feasible instrument that covers core QI domains rather than compiling an exhaustive list of potentially relevant or intervention-specific elements. An overarching conclusion of the content discussions was that the QI-MQCS should address domains that complement, rather than replace, instruments addressing the internal validity of study designs.5
The workgroup operationalised the domains as a critical appraisal instrument. Items were iteratively developed to capture the content of the domains and to enable reliable scoring of published articles. We included a domain description (eg, ‘Rationale linking the intervention to its expected effects’), guide (eg, ‘Consider citations of theories, logic models, or existing empirical evidence that link the intervention to its expected effects’) and minimum standard for each quality criterion (‘Names or describes a rationale linking at least one central intervention component to intended effects’). This process involved translating conceptual constructs (eg, ‘Penetration/Reach’) and phrases open to interpretation (eg, ‘Intervention and its component parts described in sufficient detail that others could reproduce it’) into practical scoring rules (‘Describes the proportion of all eligible units that actually participated’; ‘Describes at least one specific change in detail including the personnel executing the intervention’). We sought to avoid conceptual overlap, so that scoring of one domain would not influence other domain scores. We refined the criteria by applying them to empirical examples of the literature.
Throughout the process, we held discussions with key informants and drew upon examples of empirical literature to define the domains and standards for published QI evaluations. We sought input from QI researchers, QI practitioners and systematic reviewers experienced in QI literature syntheses. We applied all suggested critical appraisal domains, reviewer guidance and scoring criteria to empirical examples of existing QI publications to establish the QI-MQCS.
Tools and resources
We designed a form that translates the established domains into critical appraisal items with a dichotomous answer mode and scoring criteria to help reviewers decide whether a minimum quality standard is met. In addition, we adopted the Appraisal of Guidelines for Research and Evaluation structure22 to provide detailed guidance for QI-MQCS users. The Description defines the domain, What to Consider lists aspects relevant to the domain, Where to Look directs users to where the information is typically found in publications and How to Rate guides item scoring. The guidance provides illustrative article excerpts relevant to each domain.
To test the psychometric properties of the QI-MQCS, we used a validated QI search strategy to identify an empirical sample of diverse published QI and continuous QI intervention studies indexed in PubMed.23 The strategy combined QI and continuous QI, QI intervention components and EPOC-eligible interventions search terms. We screened the search output to identify publications evaluating the effects of QI interventions. We applied our working definition of QI24 ,25 by using four broad criteria to select relevant studies: healthcare delivery organisation context; reporting data on the effectiveness, impacts or success of an intervention; reporting patient, caregiver, provider behaviour, or process of care outcomes; and interventions aiming to change how delivery of care is routinely structured. The interventions in the 54 studies included in the QI-MQCS evaluation data set focused on restructuring of departments and teams, checklists or audit and feedback to increase preventive services and performance indicators, shared medical appointments, pain management programmes, fall management and restraint prevention programmes, staff training and education restructuring, hospital care and diagnostic procedure redesigns, clinical guidelines, medication management models, incentive programmes to increase patient access, computerised registers, discharge planning, antenatal care restructuring, and telehealth.
Two reviewers agreed on the main intervention and outcome for each publication prior to quality appraisal; if publications referred to additional publications on the same study, we obtained them as well. Studies were reviewed by two independent reviewers in batches of nine and then reconciled, mirroring a systematic review process that uses independent reviewers and reviewer reconciliation to reduce reviewer errors and bias. In cases where we had to revise items to incorporate additional guidance, we discarded previous ratings. Psychometric results shown below reflect the final version of the QI-MQCS.
We analysed the answer frequency for each item (item endorsement rate: number of publications meeting the criterion in the sample) based on ratings reconciled across two reviewers.26 We measured two aspects of reliability: rater agreement and internal consistency.27 Agreement was measured through Cohen's κ and the per cent agreement of two independent reviewers before reconciliation. We assessed internal consistency and conceptual overlap across the QI-MQCS domains through interitem correlations across the 16 assessed items, across all assessed publications, and based on reconciled reviewer ratings (correlating each item score with all other item scores to quantify the empirical associations between individual items). Finally we identified sources of disagreement for each of the assessed publications.
The QI-MQCS addresses the following domains: Organisational Motivation, Intervention Rationale, Intervention Description, Organisational Characteristics, Implementation, Study Design, Comparator, Data Source, Timing, Adherence/Fidelity, Health Outcomes, Organisational Readiness, Penetration/Reach, Sustainability, Spread and Limitations. Table 1 describes each domain and table 2 shows the TEP's ratings of the importance of each domain (face validity).
Organisational Motivation assesses whether the motivational context of the organisation in which the intervention was introduced was described; for example to convey whether a given quality problem—such as shortcomings in quality of care indicators—was being addressed. Intervention Rationale assesses whether a rationale was given that suggests why the intervention may produce improvements in the outcome (empirical evidence, theories or logic models).
Intervention Description requires a detailed description of the change in the structure or organisation of healthcare, including personnel involved. QI interventions are diverse and may address changes in care processes (eg, use of care managers) or strategies aiming to change provider behaviour (eg, electronic reminders), and the content (eg, avoiding catheter-related blood stream infections), and the means to achieve the goal (eg, audit and feedback) are often intertwined. We restricted the definition to permanent structural or organisational changes, not temporary activities aiming to develop or introduce the change. This domain had the highest rating in the assessment of the domain importance shown in table 2.
Organisational Characteristics assesses whether key demographics of the setting are described to provide information that enables readers to assess the generalisability to their organisation.
Implementation addresses temporary activities used to introduce the permanent change, for example, staff education to introduce a new care protocol. The QI-MQCS focuses here on the introduction of the intervention into clinical practice, not its development.
Study Design assesses whether the evaluation design to determine whether the intervention was successful was identified. Acknowledging that different questions require different study designs, the quality emphasis is on outlining the evaluation approach, not on specific designs or features (eg, randomisation).
Comparator assesses the control condition to which the intervention is compared, for example, routine care before the intervention was introduced. We added this item, most prominently described in the Workgroup for Intervention Development and Evaluation Research (WIDER) criteria,12 in response to TEP discussions and empirical evidence.28 Given that healthcare contexts are continually evolving, it is important to know whether the comparison group comprised current ‘state-of-the-art’ or poor quality care. Data Source considers how data were obtained for the evaluation and whether the primary outcome was defined; conveying what exactly was measured should avoid a ‘false implicit understanding’ of terms and definitions24 and is independent from the study design selected for the evaluation.
Timing addresses the clarity of the timeline in relation to the evaluation of the intervention, for example, when a complex change was fully implemented and when evaluated, in order to determine the follow-up period. Adherence/fidelity addresses compliance with the intervention. QI interventions can be introduced with enthusiasm, but whether personnel actually adhere to them (eg, a new assessment tool) in busy routine clinical practice is another matter. Readers need to be able to judge whether any intervention failure was attributable to the intervention itself, suboptimal translation in clinical practice, or a combination of both. Any information on adherence (including the lack thereof) is acknowledged in assessing this domain.
Health Outcomes considers whether patient health outcomes are part of the evaluation. Although an intervention may result in changes in healthcare processes (eg, tests ordered), they may not necessarily improve patient outcomes. The QI-MQCS acknowledges studies that assess this crucial patient-centered question. Organisational Readiness refers to the QI culture and resources present in the organisation, which helps to assess the transferability of results. The TEP did not express strong unanimous support for including this item (table 2).
Penetration/reach assesses what proportion of eligible units participated. This domain requires a denominator; stating the number of participating sites without also reporting how many sites were initially approached or were eligible is not sufficient. Sustainability addresses whether information on the sustainability of the intervention is available; including positive evidence (eg, an extended intervention period) or acknowledgment that the intervention may be maintained only with additional resources.
Spread addresses the ability of the intervention to be spread to or replicated in other settings. The minimum quality standard is met if the potential or unsuccessful attempts at spread or positive evidence of spread (eg, large-scale rollouts) are presented. Limitation refers to disclosed limitations of the evaluation of the intervention.
Online supplementary appendix 1 shows the QI-MQCS, a ready-to-use form for critical appraisal. Online supplementary appendix 2, a user manual developed for the QI-MQCS, provides detailed information on each domain and scoring criteria, including What to consider, Where to look and How to rate.
The item endorsement rates (criterion met) ranged between 44% and 93% (table 2) with a median rate of 67% indicating that the QI-MQCS items were able to differentiate between high and low quality studies in an empirical sample of QI publications. Two items were endorsed in more than 90% of assessed QI publications (Intervention Description and Implementation).
The median inter-rater agreement between two independent reviewers across all items was κ=0.52 and 82% agreement (table 3). Coefficients ranged from κ=0.09 (Adherence/fidelity) to κ=0.82 (Sustainability) with corresponding per cent agreement values of 56% and 74%. Agreement for 81% of items was fair to good; the items Timing, Adherence/fidelity and Spread were below κ=0.40. Sources of disagreements between reviewers are documented in table 4 and encompassed omissions (ie, a reviewer overlooked reported information), the interpretation of the reported information (eg, associated with disagreements in Adherence/fidelity) and the interpretation of criteria (ie, sufficient to meet the criterion).
The mean interitem correlation across all QI-MQCS items in the empirical sample of QI publications was 0.08 (mean absolute interitem correlation 0.19) and all individual interitem correlations were below 0.67. Results indicated conceptual independence between criteria (discriminant validity); items showed some coherence but not identity of assessed domains. Correlations of 0.61 to 0.66 were found for the domains Intervention Description and Data source, Implementation and Organisational Readiness, and Data Source correlated with Penetration/Reach as well as Limitations.
The QI-MQCS is a critical appraisal instrument that assesses 16 expert-endorsed QI domains applicable to a wide range of QI studies. Its scoring guidance facilitates use by different raters with known psychometric properties. A structured critical appraisal instrument development process ensured feasibility, validity and reliability.
The QI-MQCS development included a comprehensive literature search to ensure content validity and iterative development of the operationalisation of domains applied to existing, published QI literature to ensure construct validity. The empirical test of the QI-MQCS shows sufficient ability to discriminate between studies, indicating that the QI-MQCS avoids items representing unattainable standards but includes items that discriminate quality across an empirical sample of publications. Furthermore, the QI-MQCS does not show excessive conceptual overlap across domains, and none of the items shows redundancy with content already captured through other items. Agreement between two independent reviewers was fair to good in a diverse sample of a complex research field.
Despite the careful, iterative development of items and scoring criteria, some domains showed limited rater agreement, such as adherence/fidelity and spread. Future work is warranted to test the reliability in a narrower set of interventions, for example, those included in typical systematic reviews, or to develop the criteria further in order to achieve better consensus. However, the QI-MQCS compared favourably to some other commonly used tools, such as the Cochrane Risk of Bias tool.29 Plus, few published quality assessment tools have been tested for their psychometric properties.17 Reviewer disagreements may be easier to anticipate and to avoid in a more restricted sample, for example, one that is limited to a set of selected QI interventions.
QI stakeholders agree on the pressing need for better research and better literature synthesis methods. The QI-MQCS was developed to support evidence synthesis by providing a critical appraisal tool to identify high quality QI studies, for example, in the context of a systematic review. It is designed to be applicable to a wide range of QI studies. Developing critical appraisal criteria for QI publications is challenging due to the diversity of QI interventions, interdisciplinary language and study designs. Consequently, the QI-MQCS assesses, for example, whether the rationale specified for the intervention links to the study's main outcome, without dictating which type of rationale (eg, which evidence-based intervention or theory) may be superior, given that this determination may depend on the specific interventions in this particular field of research. To ensure wide applicability, we purposefully applied the QI-MQCS to a diverse set of QI publications in the psychometric evaluation and did not limit the sample to specific clinical conditions, QI interventions, outcomes or study designs.
The QI-MQCS targets the informational value of the QI study, giving credit to publications that assess and provide information on crucial variables. Thus, for example, a publication that reports limited adherence to an intervention or describes that the spread of the intervention was unsuccessful receives credit for reporting on adherence and spread. Reviewers may want to highlight positive expressions of the domain, for example, evidence of adherence indicating that the intervention took place as outlined. In this case, the QI-MQCS can be used as a framework for a more refined assessment. The specific standards will depend on the individual field of application.
We designed the QI-MQCS to determine the minimum quality threshold of core QI domains. QI experts selected and prioritised domains in order to establish a feasible critical appraisal instrument. Furthermore, we developed detailed scoring criteria in an iterative process to ensure reliability. The assessment must rely on the information presented in the publication, and reliable scoring requires clear guidance that cannot be based on guessing or inside knowledge of individual reviewers. Nevertheless, reporting shortcomings may not necessarily indicate the absence of the process in the conduct of the study (eg, the publication's word limits may have precluded a full description of the methods) and the psychometric evaluation distinguished only whether the domain criteria were met or not. Using the QI-MQCS and its assessment domains as a framework may allow reviewers to further differentiate study quality by creating response options for partially met criteria; by differentiating unmet criteria into ‘unclear’ and ‘low quality,’ or by defining criteria for exceptionally high quality studies. Further differentiation and moving away from the dichotomy of minimum criteria met or not may also provide a resolution for some of the described disagreements between reviewers. Additional or alternative criteria, for example criteria capturing other aspects of QI interventions3 building on the QI-MQCS may be important in specific research contexts.
The QI-MQCS was explicitly designed to complement, not to replace, critical appraisal instruments focusing on the internal validity of study designs. Other tools that may be helpful to reviewers are the EPOC group criteria for randomised controlled trials, controlled trials and controlled before-after studies;30 the quality criteria for programme evaluations,31 and a published critical appraisal instrument for Plan-Do-Study-Act QI.9 Fan et al4 provide a hierarchy of methodological strength to evaluate a body of evidence for QI interventions.
The QI-MQCS facilitates access to the vast available literature on QI interventions by identifying high quality studies, for example in the context of a systematic review aiming to synthesise the available evidence for specific interventions or outcomes, and provides a framework for critical appraisal in this complex research area. It is accompanied by a ready-to-use standardised quality assessment form and a detailed user manual. However, we have deliberately titled the tool V.1.0, expecting that its use will lead to further refinement and improvements.
We developed a ready-to-use, valid and reliable critical appraisal instrument applicable to a wide range of healthcare QI intervention evaluation publications, but recognise scope for continuing refinement.
The authors thank the members of the international expert panel that guided the selection of the domains: David Atkins, Department of Veterans Affairs; Frank Davidoff, Institute for Healthcare Improvement; Martin Eccles, Newcastle University Institute of Health and Society; Robert Lloyd, Institute for Healthcare Improvement; Vin McLoughlin, The Health Foundation; Shirley Moore, Case Western Reserve University; Drummond Rennie, University of California, San Francisco; Susanne Salem-Schatz, Independent Consultant; David P Stevens, Dartmouth Institute; Edward H Wagner, Group Health Center for Health Studies; Brian Mittman, Department of Veterans Affairs; and Greg Ogrinc, Dartmouth Institute. The authors thank Sean O'Neill, Denise Dougherty, Judy Sangl and Larry Kleinman for valuable contributions to the workgroup, Roberta Shanman for the literature searches, Jeremy Miles and Marika Suttorp for assistance with the statistical analysis, Breanne Johnsen for project assistance and Sydne Newberry for editorial assistance.
Contributors SH is the guarantor of the work and had full access to the data. LVR and PGS obtained funding; SH, JL, MSD, MA, RF and Y-WL contributed to the data acquisition; JL and SH to the data analysis; LVR, SH, PGS, JL, Y-WL and RF to the interpretation of the data. SH drafted the manuscript, all authors provided critical revisions and approved the final version of the manuscript.
Funding The project was funded by a grant from the Robert Wood Johnson (RWJ) Foundation (ID 65113), the Veterans Affairs (VA) Greater Los Angeles Healthcare System, the RAND Corporation, and the Agency for Healthcare Research and Quality (AHRQ, 1R13HS018139-01). The funding agencies had no role in the study design, collection, analysis and interpretation of the data; the writing and the decision to submit the manuscript for publication. The findings and conclusions are those of the authors; the content of the manuscript should not be construed as the official position of the RWJ Foundation, the Department of Veteran Affairs, the RAND Corporation, or AHRQ.
Competing interests None declared.
Ethics approval The study was reviewed by the Human Subject Protection Committee (HSPC) of the RAND Corporation and determined to be exempt (ID 2009-0071).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The raw data can be obtained from the authors.