Expert consensus on the desirable characteristics of review criteria for improvement of health care quality
- H M Hearnshaw, senior lecturer in primary care 1,
- R M Harker, senior research officer2,
- F M Cheater, professor of public health nursing3,
- R H Baker, professor of quality in health care 4,
- G M Grimshaw, senior research fellow 5
- Centre for Primary Health Care Studies, University of Warwick, Coventry CV4 7AL, UK
- National Children's Bureau, London, UK
- School of Healthcare Studies, University of Leeds, Leeds, UK
- Clinical Governance Research and Development Unit, Department of General Practice and Primary Health Care, University of Leicester, Leicester, UK
- Centre for Health Services Studies, University of Warwick, Coventry, UK
- Dr H Hearnshaw
- Accepted 21 June 2001
Objectives—To identify the desirable characteristics of review criteria for quality improvement and to determine how they should be selected.
Background—Review criteria are the elements against which quality of care is assessed in quality improvement. Use of inappropriate criteria may impair the effectiveness of quality improvement activities and resources may be wasted in activities that fail to facilitate improved care.
Methods—A two round modified Delphi process was used to generate consensus amongst an international panel of 38 experts. A list of 40 characteristics of review criteria, identified from literature searches, was distributed to the experts who were asked to rate the importance and feasibility of each characteristic. Comments and suggestions for characteristics not included in the list were also invited.
Results—The Delphi process refined a comprehensive literature based list of 40 desirable characteristics of review criteria into a more precise list of 26 items. The expert consensus view is that review criteria should be developed through a well documented process involving consideration of valid research evidence, possibly combined with expert opinion, prioritisation according to health outcomes and strength of evidence, and pilot testing. Review criteria should also be accompanied by full clear information on how they might be used and how data might be collected and interpreted.
Conclusion—The desirable characteristics for review criteria have been identified and will be of use in the development, evaluation, and selection of review criteria, thus improving the cost effectiveness of quality improvement activities in healthcare settings.
Review criteria are the elements against which quality of health care is assessed in quality improvement.
An expert consensus view was generated of 26 desirable characteristics of review criteria.
Review criteria should be developed through a well documented process involving consideration of valid research evidence, possibly combined with expert opinion, prioritisation according to health outcomes and strength of evidence, and pilot testing.
Review criteria should be accompanied by full and clear information on how they might be used and how data might be collected and interpreted.
Information on the characteristics of review criteria will enable rational selection of review criteria for quality improvement activities.
What is already known on the subject
Review criteria are systematically developed statements that can be used to assess the appropriateness of specific health care decisions, services and outcome. There is no generally accepted method of defining appropriate review criteria. If quality of care is assessed against inappropriate criteria, resulting improvements in performance against these criteria may not effect any improvement in care, and resources may be wasted in ineffective quality improvement activities.
What this paper adds
An expert consensus on the desirable characteristics of review criteria has been generated and the criteria of quality for quality review criteria have been identified. Review criteria can now be effectively identified and presented so that data can be collected on justifiable, appropriate, and valid aspects of care.
The issue of defining and assessing the quality of health care is central to improving clinical practice. In recent years quality improvement methods such as clinical audit and clinical utilisation review have been actively promoted by many healthcare providers and policy makers.1–3
The first stage of quality improvement for a given topic is to establish the review criteria to use. Review criteria are “systematically developed statements that can be used to assess the appropriateness of specific health care decisions, services and outcomes”.4 If appropriate review criteria are used, improvements in performance measured against these criteria should result in improved care. In contrast, if quality of care is assessed against inappropriate or irrelevant criteria, then resulting improvements in performance against these criteria may not effect any improvement in care, and resources may be wasted in ineffective quality improvement activities.5
There is no generally accepted method of defining appropriate review criteria. A few authors have proposed what the desirable characteristics of review criteria are,4–6 but there is no clear indication of how appropriate criteria might be developed. Often review criteria have been generated from guidelines.7–10 Alternatively, instead of directly translating guidelines into review criteria, it has been argued that criteria should be based directly on high quality research evidence and prioritised according to strength of evidence and impact on health outcomes.11,12
Unfortunately, high quality research evidence is not readily available for all clinical topics,13 and expert opinion is often relied upon to develop criteria.14,15 Although some authors recommend that criteria should not be developed at all if research evidence is lacking,11 an alternative approach is to synthesise the two methods, drawing on expert opinion when there is no research evidence.17,16
Consensus methods are increasingly used to develop clinical guidelines18,19 and can provide a way of determining how review criteria should be selected and defining their desirable characteristics. The Delphi technique20 is a consensus method that gathers expert opinion through an iterative questionnaire process. The researchers communicate in writing with a panel of experts comprising between 10 and 50 members. Experts are anonymous to the extent that other panel members do not know their identity at the time of data collection.
It is recommended that the panel should include both “advocates” and “referees”.21 The expertise of advocates stems from participant involvement in the area under study—for example, clinicians or quality managers. Referees have less direct involvement and their expertise is derived from study of the topic—for example, academic researchers. Hence, we shall refer to advocates as practitioner experts and referees as academic experts.
Modifications to a “pure” Delphi process are common.22–24 The preparatory stage of formulating issues can be supplanted by reference to existing research25 and subsequent rounds can be used to develop, rather than directly reiterate, the concerns of previous rounds.20 This study aimed to determine both how review criteria should be selected and their desirable characteristics, using a modified Delphi technique. This information will inform both those who develop review criteria and those who select review criteria to make quality improvement in health care more effective.
A two round modified Delphi process was used to generate consensus amongst an international panel of experts. A decision was made to restrict the Delphi process to two rounds before inviting experts to participate, since the initial questionnaire was based upon a careful review of available literature. It was considered that two rounds would be enough to reach adequate consensus and would minimise the workload for participants.
THE EXPERT PANEL
We identified an international group of experts in quality improvement in health care from a variety of professional disciplines. Three sources of information were used to identify experts: (1) publication record, (2) membership of quality improvement groups in relevant organisations such as the Royal Colleges in the UK, and (3) recommendations from researchers in the field. Forty nine experts were contacted, mostly by email, and asked to contribute to the study. The expert group was categorised by the researchers into 26 academic experts (“referees”) and 23 practitioner experts (“advocates”). Individuals for whom contact details were rapidly obtained were contacted first. Having received a sufficient number of positive responses from these experts, we ceased to seek contact details for further individuals. Although we acknowledge that our list of experts is not exhaustive and other individuals could have contributed to the study, the expert group provided a wide range of views and adequate representativeness.20
Medline and Embase databases were searched from 1990 to March 1999 using the topic headings “clinical audit”, “medical audit”, “clinical utilization”, “quality assurance”, and “guidelines” and text words “review criteria”, “appropriateness criteria”, “clinical indicators”, and “performance indicators”. The abstract of each citation was reviewed and all studies concerned with the development of review criteria were retrieved. In addition, publications of expert panel members on the development of clinical guidelines were reviewed. The literature review was used to compile a list of identified desirable characteristics of review criteria from which the questionnaire for round 1 of the Delphi process was constructed. The questionnaire contained 40 items in three sections:
The process of developing review criteria (18 items).
Attributes of review criteria (11 items).
The usability of review criteria (11 items).
The experts were asked to rate importance and feasibility for each item using 7 point scales, anchored by “not at all important” and “very important”, and “not at all feasible” and “very feasible”. Free comments on each item and suggestions about items overlooked in the questionnaire were also invited. Questionnaires were distributed by email to 31 experts and by post to seven experts who did not have access to email. Experts were asked to complete the questionnaire within 2 weeks. Non-responders were sent reminders after 2 weeks and, where necessary, after a further 10 days. Round 1 was concluded 5 weeks after the distribution of the questionnaire.
Round 1 responses were aggregated and used to identify aspects to retain for round 2. A definition of disagreement based upon the RAND/UCLA appropriateness method16 was generated by the project team and used to exclude items from round 2 if their ratings were polarised to the extreme points of the scale—that is, if three or more experts gave a high rating of 6 or 7 whilst, in addition, three or more gave a low rating of 1 or 2. Cumulative percentage scores were then used to determine which of the remaining items met the inclusion criteria, firstly, of at least 80% of the expert panel providing an importance rating of 5 or more and, secondly, a feasibility rating of 4 or more. Items thus excluded showed lack of consensus on being given a rating for importance on the top two points on the scale. Nevertheless, there was opportunity later for experts to request the exclusions to be revoked. Where experts provided comments, these were carefully considered in project team discussions. Some comments resulted in a refinement of item wording for round 2 of the Delphi process, others led to the inclusion of additional items where experts felt significant omissions arose. The aggregation of round 1 results and subsequent development of round 2 occurred over a 2 week period. The round 2 questionnaire was ready 7 weeks after the round 1 questionnaire was sent out.
The round 2 questionnaire informed the experts of the method used to identify items to be included or excluded for round 2. Experts were asked to re-rate each item for round 2 and to provide additional comments if they wished. The questionnaire reminded experts of their own round 1 rating for each item and presented the expert group's mean rating for that item. If the wording of items had been altered, ratings for the original item were provided and the initial wording was shown below the altered item. Some new items were added to section 1 in response to expert comment. These were clearly labelled as being new. All excluded items were shown separately at the end of each section. Experts could alter their ratings for these items and comment on their exclusion, if they wished.
The same processes for distribution, reminding, and analysis were used in round 2 as in round 1. Items retained after round 2 identified the desirable characteristics of review criteria and the method of selecting them.
Members of the panel of experts were sent details of the outcomes of the Delphi process.
Thirty eight of the 49 experts invited to take part agreed to do so. The number of experts responding to each round of the Delphi is shown in table 1. The table also gives details of the number of practitioner and academic experts included in each round. There were no significant differences in the proportion of practitioners and academics responding to the initial participation request (χ2 = 0.3, p>0.05), nor did experts' status as a practitioner or academic relate to their likelihood of completing round 1 (χ2 = 1.5, df =1, p>0.05) or round 2 (χ2 = 0.5, df=1, p>0.05). Participating experts are listed in appendix 1.
From a starting point of 40 items in round 1, 26 items qualified for inclusion after two rounds of the Delphi process—that is, 80% of the experts gave importance ratings of 5 or more and feasibility ratings of 4 or more for these 26 items and there was no polarisation of expert ratings. Table 2 shows the number of items in each round resulting in exclusion and inclusion.
In the final list of desirable characteristics, 13 items retained the original round 1 wording, 12 items were reworded for round 2, and one item was introduced in round 2. Table 3 shows the final list and the mean importance and feasibility rating associated with each characteristic.
The round 2 questionnaire also allowed experts to reconsider the 11 items excluded at round 1. The round 2 responses confirmed that all these items should be excluded. In addition, a further five items were excluded after round 2 (including one of the items newly introduced at round 2). All the 16 excluded items are shown in table 4 with the reasons for exclusion.
The final lists of included and excluded items (tables 3 and 4) were given to all expert participants for comment. There was no dissent to the list.
It has been possible to define the desirable characteristics of review criteria by use of a modified Delphi process. This method has refined and validated the list of characteristics initially based on literature alone. The use of expert judgement has identified which of the literature based characteristics of review criteria are highly important and feasible. The final list of desirable characteristics can inform developers and users of review criteria and lead to more appropriate, less wasteful, use of review criteria in quality improvement activities in health care.
Our original list of important aspects of review criteria consisted mainly of items mentioned in publications by expert panel members. While the Delphi process confirmed the importance of most of these items, it also excluded some. For instance, although the process retained items such as “Criteria are based on a systematic review of research evidence” and “Expert opinion is included in the process of developing review criteria”, it excluded specifying the search strategy used or the names of the experts involved. We feel this has demonstrated the effectiveness of the Delphi process in excluding unimportant or extreme views to arrive at a more centralised accepted definition.
The literature based initial list of criteria included four items specifically related to patient issues. The items “Criteria include aspects of care that are relevant to patients” and “The collection of information for criteria based review is acceptable to those patients whose care is being reviewed” were retained in the final list. However, the item “The views of patients are included in the process of developing review criteria” was excluded as having low feasibility. Measuring the importance to patients of criteria could certainly be very costly in time and resources, making it infeasible, even though it was rated as important. The item “Criteria are prioritised according to their importance to patients” was rated as of low feasibility and low importance. The prioritisation of criteria was generally rated as of low importance, perhaps indicating that all criteria which met the other characteristics should be used, and prioritisation added nothing of importance. Thus, it was not specifically the patients' priorities that were being excluded, but prioritisation itself.
The external validity of the collective opinion produced by a Delphi method is dependent on the composition of the expert panel. This study aimed to include both referees (academic researchers) and advocates (quality improvement practitioners) to balance insight from theoretical understanding with practical experience. However, the eventual composition of our expert panel was marginally biased towards an academic research perspective as slightly more “referees” than “advocates” agreed to participate. This may have caused practical aspects of review criteria to be underrepresented in the final definition, and could explain the exclusion of items reflecting resource allocation and patient considerations. Given the increase in the political importance of resource allocation and patient considerations, these exclusions could be deemed inappropriate.
The identification of desirable characteristics of review criteria has emerged from the expert consensus process used in this study. Although the problems associated with the nature of consensus and the structure of our expert panel should be acknowledged, the definition created here represents a considerable advance in our understanding of what appropriate review criteria are and how they might be developed. Previous published literature alone did not directly translate into a definitive list of the desirable characteristics of review criteria. Inconsistency was found in the literature on the relative importance which individual researchers assigned to different aspects of review criteria. The use of this international panel of experts from a variety of professional disciplines lends a universal relevance to this definitive list. It is not dependent on the views of professionals from one specific context or working within one particular healthcare system.
The modified Delphi process used here was very efficient in terms of time and resources spent in consultation with the experts involved. The process allowed experts from geographically disparate areas to be included at relatively low cost. Distributing questionnaires by email further minimised the resources needed and the time taken to complete each round of the Delphi process. The method proved a fast and low cost way of gathering the opinions of an international panel.
The knowledge gained from this study should be of relevance to all those involved in the development of review criteria for quality improvement. We have provided information on how review criteria can be effectively identified and presented so that data can be collected on justifiable, appropriate, and valid aspects of care. This can also be used to guide the assessment of review criteria for those who need to decide which criteria are appropriate for their particular quality assurance or quality improvement project. In short, the criteria of quality for quality review criteria have been identified. Future studies should be directed at establishing the value of this set of meta-criteria as a developmental tool to aid the selection of review criteria for quality improvement activities. The set is currently being used to develop an instrument to appraise the quality of review criteria. This instrument will have the potential to raise the standard of all quality reviews and thus improve the quality of health care.
In summary, the expert consensus view is that review criteria should be developed through a well documented process involving consideration of valid research evidence, possibly combined with expert opinion, prioritisation according to health outcomes and strength of evidence, and pilot testing. Review criteria should also be accompanied by full clear information on how they might be used and how data might be collected and interpreted.
Appendix 1: List of expert participants
Andrew Booth, University of Sheffield, UK
John Cape, British Psychological Society, UK
Alison Cooper, Fosse Healthcare NHS Trust, UK
Gregor Coster, Auckland University, New Zealand
Susan Dovey, University of Otago, New Zealand
Jeremy Grimshaw, Aberdeen University, UK
Gordon Guyatt, McMaster University, Canada
Gill Harvey, Royal College of Nursing, UK
Nick Hicks, Oxford Public Health, UK
Jaki Hunt, Kettering General Hospital, UK
Lyn Juby, Clinical Governance Research and Development Unit, UK
Kamlesh Khunti, Clinical Governance Research and Development Unit, UK
Beat Kuenzi, SGAM Research Group, Switzerland
Mayur Lakhani, Clinical Governance Research and Development Unit, UK
Philip Leech, National Health Service Executive, UK
Katherine Lohr, Research Triangle Institute, USA
Adrian Manhire, Royal College of Radiologists, UK
Karen Mills, Warwickshire Multi-disciplinary Audit Advisory Group, UK
Andrew Moore, Bandolier, UK
Mary Ann O'Brien, McMaster University, Canada
Frede Oleson, Aarhus University, Denmark
Barnaby Reeves, Royal College of Surgeons, UK
James Rimmer, Avon Primary Care Audit Group, UK
Tom Robinson, Leicester General Hospital, UK
Martin Roland, National Primary Care Research & Development Centre, UK
Charles Shaw, CASPE Research, UK
Paul Shekelle, RAND, USA
Chris Silagy, Flinders University, Australia
Tim van Zwanenburg, University of Newcastle, UK
Kieran Walshe, Birmingham University, UK
Geoff Woodward, Royal College of Optometrists, UK
The researchers are indebted to all the members of the expert panel who contributed their time and expertise so willingly. Acknowledgement is also given to Dr Mayur Lakhani, University of Leicester, UK, for his helpful comments on an earlier version of this paper, and to the anonymous referees of an earlier version.
This research was funded by the UK National Health Service Research and Development Health Technology Assessment programme.
The views and opinions expressed here are those of the authors and do not necessarily reflect those of the NHS Executive.