Introduction Delphi procedures are frequently used to develop performance indicators, but little is known about the validity of this method. We aimed to examine the consistency of indicator selection across different procedures and across different panels.
Methods Analysis of three indicator set development procedures: the EPA Cardio project, which used international GP panels; the UniRap project, a Dutch GP indicator project; and the Vitale Vaten project, which used a national multidisciplinary health professional panel and a stakeholder panel.
Results With respect to clinical indicators, consistency between procedures varied according to the origin of the indicators. In Vitale Vaten the multidisciplinary panel of health professionals validated 63% from the international EPA Cardio indicators again. From the UniRap GP set only 13% was rated valid again.
Considering organisational indicators, 27 indicators were rated in both EPA Cardio and Vitale Vaten. In the Vitale Vaten project 17 indicators (63%) were validated, including eight of the nine indicators validated in EPA Cardio.
Consistency between panels was moderate, giving a decisive role to the health professional panel, being the most critical.
Conclusion The consistency of selected performance indicators varied across procedures and panels. Further research is needed to identify underlying determinants of this variation.
- Quality indicators
- Delphi technique
- cardiovascular diseases
- qualitative research
- healthcare quality
- quality of care
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- Quality indicators
- Delphi technique
- cardiovascular diseases
- qualitative research
- healthcare quality
- quality of care
Indicators for assessing quality and outcomes of healthcare delivery have been developed in many healthcare systems and countries. Indicators provide healthcare professionals formative feedback to enhance learning and improvement of clinical practice.1 They can also be used to create transparency on quality of care. Indicators should be valid and reliable, feasible, and effective with respect to their aims.2 This implies that indicators should be developed and evaluated systematically. Some methods for indicator development, such as the Delphi-procedure, have been adopted across the world.3
Nevertheless, many questions remain concerning indicator quality and appropriate development methods. This paper addresses the consistency of panel evaluations of indicators, using data from three indicator development projects in cardiovascular risk management (CVRM).
The importance of the quality of care for patients with cardiovascular diseases (CVD) and those at high risk remains undisputed: the incidence of CVD is high worldwide with high morbidity, mortality, and costs. Many countries have launched large programmes to improve CVRM.4 In The Netherlands, a multidisciplinary clinical guideline for CVRM was published in 2006.5 In 2009 a platform of stakeholder organisations (Platform Vitale Vaten) issued a multidisciplinary guideline with recommendations on the organisation, delivery and process of care (‘care standard’),6 largely based on Wagner's chronic care model.7 Simultaneously, a set of multidisciplinary quality indicators covering clinical and organisational aspects was developed, using a Delphi procedure with two panels: healthcare professionals and other stakeholders.
Consistency and confirmability are criteria mentioned to add to the reliability of indicator development procedures.8 Though widely used in medical science, the Delphi procedure itself is still being studied and compared with other methods of indicator development.9–12 Previously published Delphi procedures with different panels show variable effects and amounts of agreement between panels, challenging consistency.13–16 It has been shown that healthcare providers especially have a decisive role in multidisciplinary procedures, depending on the influence of a single panel in a multipanel procedure. In a procedure with different panels, an indicator is usually validated when validated by all panels. This methodology leads to a core set of generally supported indicators but may be too rigid.
We aimed to examine the consistency of indicator selection across different procedures. We assessed the results of consecutive validation ratings and compared the results of two Delphi procedures with different panels. Furthermore, we examined consistency across health professional and a stakeholder panels.
This paper is based on the analysis of the Vitale Vaten project, partly in relation to Epa Cardio and UniRap, projects in which performance indicators for CVRM were selected. The relationship between the projects and the comparisons made in this study is illustrated in the figure supplement (web). Table 1 presents some project features.
EPA cardio project
The EPA Cardio project is described elsewhere.17 In summary, GP panels from nine European countries (total n=101) rated 202 indicators for CVRM in primary care in two Delphi rounds. This resulted in a set of 35 clinical and nine organisational indicators, including primary prevention and risk management in patients with established CVD or diabetes.
A Dutch College of General Practitioners (DCGP) working group developed indicators for the Uniform Reporting Project (UniRap), as part of a project to create one national indicator set for GP care.18 The UniRap indicator set consisted of 17 indicators on established CVD and six concept indicators on primary prevention of CVD. The goal of UniRap was to provide indicators that met the criteria of content validity based on the CVRM guideline, and were feasible: GPs should be able to deliver the data from their medical record system for internal and external use. A set of indicators was discussed in an expert group and, in order to make them feasible, with representatives of GP information technology user organisations. Consequently, these indicators had little room for nuances. Various stakeholder organisations provided comments on draft sets, leading to revisions and a final set that was approved by the Dutch professional GP organisations.
Vitale Vaten project
In the Vitale Vaten project, indicators for CVRM were selected in a two-round Delphi procedure, involving multidisciplinary panels of health professionals and stakeholders from The Netherlands. Table 2 presents the panel's composition. As this project has not been published elsewhere, it will be described in more detail.
All participants were related to the Platform Vitale Vaten, an initiative of the Dutch Heart Association and many national stakeholders to improve CVRM. A major activity of the Platform was the development of a so-called ‘care standard,’ recommendations on the organisation of CVRM, complementary to the multidisciplinary clinical guideline of which key elements were included in the care standard. Simultaneously, performance indicators were developed. The board of the Platform recruited and motivated participants for the indicator selection procedure. All correspondence was via email.
The initial list of 58 clinical indicators presented in this project comprised two previously developed sets: the 35 EPA Cardio clinical indicators, and the 23 UniRap indicators. Clinical indicators were presented in the first Delphi round only.
The 74 organisational indicators presented in the first round of the Vitale Vaten project came from three sources. The 27 indicators on organisation presented in the original EPA Cardio list were all included. Furthermore, 38 indicators were formulated on the basis of the draft version of the care standard and nine on the basis of the DCGP's vision on care. Finally, five additional indicators were formulated based on comments in the first Delphi round.
Participants assessed the necessity of the clinical indicators, defined as ‘necessary to deliver and record in the patient's medical record.’ Organisational indicators were assessed regarding clarity (‘expressed in clear, precise and unambiguous language’) and necessity in the first round. Here, panellists had room to formulate comments and additional indicators. The panellists received feedback on the first-round necessity results and scored the organisational indicators in the second round regarding necessity and feasibility (‘availability of data on a consistent, comparable and reliable basis’). Necessity, clarity and feasibility were defined exactly the same as in EPA Cardio. All scores were on a scale from 1 to 9.
Table 3 shows the response rates of the panels. Because two out of five responding stakeholders did not score the clinical indicators, this selection was based on the results of the health professional panel only. Clinical indicators were validated when they had a median necessity score of 7, 8 or 9 with agreement, meaning that 80% of respondents scored 7 or higher.
Organisational indicators were validated if the second-round necessity score in both panels met the criteria as described for clinical indicators. Furthermore, both panels should rate the indicators feasible without disagreement. Indicators were rated feasible when the median score was 7, 8 or 9; less than one-third of the scores had to be 1, 2 or 3 in order to conclude that there was no disagreement. There was no preset maximum number of indicators.
To examine the consistency of indicator selection procedures we assessed the results on the clinical indicators from the Vitale Vaten project. As previously developed sets were the starting-point, the percentages of indicators validated reflected agreement and were taken as a measure for consistency.
In addition, we compared the validity rating results of the list of 27 organisational indicators rated in both EPA Cardio and Vitale Vaten.
To examine consistency of indicator selection across panels, we compared the results from the health professional and stakeholder panels scoring the organisational indicators in Vitale Vaten.
Consistency across indicator selection procedures
Table 4 focuses on the Vitale Vaten set of clinical indicators. The panel selected 25 indicators, including three out of 23 from the UniRap indicators (13%) and 22 out of 35 from the EPA Cardio set (63%). Table 5 presents the clinical indicators selected in Vitale Vaten.
Table 6 focuses on the Vitale Vaten set of organisational indicators. In general, 46 out of 79 indicators were selected. From the list of 27 indicators also presented in EPA Cardio, the Vitale Vaten panels validated 17 indicators (63%). In EPA Cardio, nine indicators from this list were selected; eight indicators were validated in both procedures. Table 7 shows these indicators with the results from EPA Cardio and Vitale Vaten. On the web, we present all organisational indicators validated in Vitale Vaten, ordered by origin.
Consistency across health professionals and stakeholders
Table 6 also shows the results of the two Vitale Vaten panels separately. The stakeholder panel selected 59 out of the 79 indicators (75%), while the health professional panel selected 46 indicators (58%), all included in the stakeholder set.
This study showed, first, that clinical and organisational indicators for CVRM, selected by international panels of general practitioners (EPA Cardio), were reasonably well validated by a multidisciplinary panel of health professionals in one country (Vitale Vaten). Conversely, indicators previously selected by a national GP working group (UniRap) were not validated.
Second, the study showed high consistency between health professionals and stakeholders (Vitale Vaten) regarding organisational indicators. Health professionals were most critical in their selection.
Several studies compared different consensus procedures showing consistency;19–22 other studies focused on quantifying the results of Delphi rounds, for instance assessing the result of each round.23–25 Research on Delphi procedures with different panels shows variable results depending on the validation criteria. Hardy et al accepted indicators when validated by at least one panel.13 In EPA Cardio, nine national GP panels validated 30–61% of the indicators.17 In a procedure with 11 different stakeholders panels rating indicators on primary mental healthcare services, agreement within panels was very high but low between panels.14 Indicators had to be validated by all panels, effectively giving physicians most influence. Comparing the results of healthcare managers and family physicians rating indicators of quality of primary care in the UK managers gave significantly higher ratings.16 In summary, the inclusion criteria in a multipanel Delphi procedure determine the final set, often giving a decisive role to the most critical panel.
We can only speculate on the factors underlying the variable consistency of Delphi procedures regarding clinical indicators. Considering the EPA Cardio indicators, low consistency might be expected, due to the international perspective on general practice only. On the other hand, EPA Cardio was rigorous, because the procedure ensured that indicators were excluded if not assessed highly necessary in all countries. The reasonable consistency between EPA Cardio and Vitale Vaten may also reflect the fact that CVRM mostly is a primary care activity, so the perspectives of health professionals from other backgrounds may not have changed much, compared with GPs only.
UniRap indicators were not very successful in Vitale Vaten, showing little consistency between procedures. Explanations may be that they were specifically for general practice or that development was very much driven by the possibility of registering the indicators in GP information systems with computerised extraction. Additionally, EPA Cardio indicators were formulated with nuances in contrast to the UniRap indicators, formulated in terms of measurability in electronic patient records. Furthermore, new insights in the prevailing guideline, reflected in the UniRap clinical indicators, may not been well known or accepted yet. Finally, UniRap indicators focus more on (proxy) outcome, a choice that is always debatable.
Regarding the organisational indicators rated in EPA Cardio and Vitale Vaten, consistency between procedures was high. Eight out of 17 indicators validated in Vitale Vaten were in the EPA Cardio set of nine indicators. As expected, the two-panel procedure was more liberal: it is of course more difficult to have understanding in nine country panels. Surprisingly, the one indicator from the EPA Cardio set not validated in Vitale Vaten was about offering flu vaccination to high-risk patients, a long-existing practice. The health professional panel did not agree on necessity.
The high consistency between health professionals and stakeholders regarding their selection of organisational indicators may reflect active involvement of all participants in the Platform Vitale Vaten. Consistency varied noticeably with the origin of the indicators. Many indicators validated by stakeholders, but not by health professionals, were derived directly from the care standard text. These indicators considered innovations such as an individual treatment dossier, and an explicitly appointed central care giver. The stakeholders validated all these indicators; the healthcare professionals only validated five out of seven indicators about the dossier. This gave the health professionals a decisive role, in agreement with previous studies.14 ,16 Probably, results would have changed with a central question about concordance between indicators and recommendations instead of necessity.
There are several explanations for the results concerning innovative issues. We assume that the presence of supportive evidence may be an important motivation for the necessity rating. As opposed to the enormous amount of clinical evidence in CVRM, evidence on organisational aspects is sparse, and new concepts obviously lack evidence, as developments and opinions precede research evidence. This may be a reason for panellists to reject these indicators. On the other hand, similar concepts used in diabetes care are evaluated positively.26 ,27 Our results show that healthcare professionals are yet not willing to accept indicators on important elements of the chronic care model as the basis for patient empowerment, supporting self management and clarifying central care givers' tasks in CVRM. Competition between indicators cannot explain the results: all indicators were rated separately without a maximum. Anyhow, results suggest that a Delphi procedure with health professionals is less suitable to select indicators concerning innovative organisational concepts. Other methods may be more appropriate, like expert meetings. On the other hand, as the Vitale Vaten projects show, stakeholders seem less suitable to rate clinical indicators.
Strengths and weaknesses
Clinical indicators presented in Vitale Vaten were the result of former selection procedures (EPA Cardio and UniRap), and a list of organisation indicators was presented identically formulated in identical procedures (Vitale Vaten and EPA Cardio). These are the strengths of this study. Nevertheless, the study should be regarded explorative and further research into underlying factors for consistency of Delphi procedures is needed. A limitation was that the response rates were not maximal, in particular of the stakeholders.
The stakeholders were selected because they represented different organisations. However, some panellists were also healthcare professionals. This ‘cross over’ may have increased consistency between panels.
The consistency of Delphi procedures to select indicators was mixed. Several factors related to the procedures and the panel's composition could influence this. Regarding CVRM, a large number of international indicators (EPA Cardio) were validated again. This finding supports the view that rigorous international indicator selection procedures are valuable. A limitation, however, seems to be that these tend to exclude indicators reflecting healthcare delivery innovations. Another challenge is to involve stakeholders meaningfully, as we found that they did not assess clinical indicators and that their assessments of organisational indicators largely reflected those of health professionals.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.