Formal consensus: the development of a national clinical guideline
- J Rycroft-Malone, research & development fellow
- Ms J Rycroft-Malone
- Accepted 8 August 2001
Background—There is currently a political enthusiasm for the development and use of clinical guidelines despite, paradoxically, there being relatively few healthcare issues that have a sound research evidence base. As decisions have to be made even where there is an undetermined evidence base and that limiting recommendations to where evidence exists may reduce the scope of guidelines, thus limiting their value to practitioners, guideline developers have to rely on various different sources of evidence and adapt their methods accordingly. This paper outlines a method for guideline development which incorporates a consensus process devised to tackle the challenges of a variable research evidence base for the development of a national clinical guideline on risk assessment and prevention of pressure ulcers.
Method—To inform the recommendations of the guideline a formal consensus process based on a nominal group technique was used to incorporate three strands of evidence: research, clinical expertise, and patient experience.
Results—The recommendations for this guideline were derived directly from the statements agreed in the formal consensus process and from key evidence-based findings from the systematic reviews. The existing format of the statements that participants had rated allowed a straightforward revision to “active” recommendations, thus reducing further risk of subjectivity entering into the process.
Conclusions—The method outlined proved to be a practical and systematic way of integrating a number of different evidence sources. The resultant guideline is a mixture of research based and consensus based recommendations. Given the lack of available guidance on how to mix research with expert opinion and patient experiences, the method used for the development of this guideline has been outlined so that other guideline developers may use, adapt, and test it further.
While there is a political enthusiasm in the UK for guidelines and their development at a national and local level is increasing, paradoxically there are relatively few issues that have a sound research evidence base.
Although guideline developers are frequently required to incorporate more than one strand of evidence in their guidelines, information and guidance on the way to achieve this is lacking.
Formal rather than informal consensus methods for guideline development are recommended.
Guideline developers need to be explicit about their methods for integrating expert opinion with research evidence and patient experiences.
The formal consensus method used in this paper provides a template method for other guideline developers to use and test.
The quest to deliver care based on the best possible scientific evidence, coupled with a political enthusiasm for driving up quality, has resulted in guidelines being viewed as an important clinical tool. As such, their development at a national and local level is increasing. A key defining attribute of clinical guidelines is that their recommendations are based on evidence. As Trickey et al1 point out, over the last decade or so it has been favourable to link recommendations to evidence derived from systematic reviews of critically appraised research literature. Paradoxically, however, there are still relatively few healthcare issues that have a sound research base2 upon which to develop recommendations.
As part of the ongoing national guideline development work, the Royal College of Nursing (RCN) Institute's Quality Improvement Programme was commissioned by the Department of Health to develop a guideline on risk assessment and prevention of pressure ulcers. Pressure ulcers represent a major burden of sickness and reduce quality of life for patients and their carers, requiring prolonged contact with the healthcare system and causing pain, discomfort, and inconvenience.3 The financial costs to the NHS are also substantial.4 This human suffering and the financial costs of pressure ulcers, the variation in practice in the UK, and a growing body of knowledge about care options highlighted the need for recommendations for practice. However, the research evidence base for this clinical topic is variable, both in quality and quantity, largely due to the complicated aetiology of pressure ulcer development and the diverse patient population at risk.
Randomised controlled trials have been championed as the preferred evidence source for clinical guidelines because their design should reduce the likelihood of intervening variables affecting study results, thereby providing clearer information about the effect of an intervention.5 It follows therefore that a systematic review of randomised controlled trials is even more powerful and less likely to be misleading.6 However, not all questions about treatment or care are amenable to the randomised controlled trial design.7 In the case of evaluating “risk”, for example, it could be argued that a prospective cohort study design may provide the best source of research evidence.2 It is therefore important to state that evidence is not scientifically valid by virtue of the fact that it is derived from a randomised controlled trial but that it comes from the most appropriate source for the question being posed. In addition, not all aspects of treatment or care will have been the subject of research. In cases where randomised controlled trials are not appropriate or have not been conducted and/or subject areas have not been researched, guideline developers have to rely on other sources of evidence. Accordingly, professionals can provide “expert” opinion and patients can also take part in developing clinical guidelines to provide “expert patient opinions” on care options. The question is: how to do this in the most appropriate way?
Definitions of evidence-based practice imply a broader definition of “evidence” than that produced by randomised controlled trials alone.8 Care must not only be the best identified from research, but must also be decided upon in response to the identified needs of a particular individual. The challenge for practitioners and patients is how to achieve a balance between the various types of evidence in the context of other influences on decision making. It is in making this judgement that expertise adds a third strand to the evidence base from which decisions are made. If we are to achieve clinically effective care we therefore need access to three strands of evidence:
knowledge from research findings;
knowledge from clinical experience (expertise); and
patient specific information that includes the preferences and the acceptability of an intervention to individuals.
These sources of “evidence” can also be used to inform the development of clinical guidelines.
The challenge with the guideline on pressure ulcers was to develop a method that incorporated different available strands of evidence but one that was systematic, rigorous, and eliminated as much bias as possible from the process. As Shekelle and Shriger9 point out: “There is little guidance on the optimum way to mix expert opinion with scientific literature” (page 1). There is even less guidance on how to incorporate the experience of patients or users in the guideline development process. The remainder of this paper describes the method that evolved during the development of this national clinical guideline which was guided by:
a systematic review including the use of formal consensus in guideline development14;
dialogue with key informants; and
criteria used to appraise the robustness of national guidelines.15
SOURCES OF EVIDENCE
The evidence available for this guideline came from a number of different sources: the Agency for Health Care Policy and Research evidence linked guideline “Pressure ulcers in adults: prediction and prevention”16; an update of sections of their literature base17; the Effective Health Care Bulletin “The prevention and treatment of pressure sores”18; a systematic review of the effectiveness of pressure redistributing devices19; and a systematic review of the effectiveness of risk assessment tools.20 A formal consensus development process was used to integrate the different evidence sources and, where there was a weak research base, to agree recommendations based on current best practice.
A formal rather than informal consensus method is preferred in guideline development. Guidelines developed using informal consensus methods formulate recommendations without drawing on research evidence.21 By definition this method tends to be based on poorly defined criteria and lacks the adoption of explicit consensus, resulting in guidelines which tend to be subjective and ill defined in nature. Formal consensus development methods provide a structure to the group decision making process by, for example, adopting rating methods to represent extent of agreement about predefined issues or questions.
A modified nominal group technique (NGT) approach was used similar to the RAND consensus panel method used in the USA.22 This approach was preferred to the Delphi and consensus development conference methods because (a) it allows participants to discuss issues face to face and (b) the structured process is believed to facilitate contributions from all group members and to limit dominance by eminent or eloquent individuals.1,14 The NGT has previously been used successfully in health care generally23,24 and to develop clinical guidelines.1,25,26
An interdisciplinary consensus development panel was convened comprising 10 people. This figure is based on limited research which suggests that large numbers can cause coordination problems and a smaller group could result in diminished reliability.27 The participants reflected the full range of people to which the guideline will apply—for example, individuals with expertise in tissue viability, service users, physicians, academics, and physiotherapists. Thus, the group's composition was heterogeneous, allowing areas of uncertainty to be fully explored28 and better group decision making performance.14 To ensure heterogeneity, participants were purposively sampled. They were asked to participate on the basis of their status (as UK recognised authorities in pressure ulcer management within their speciality), their knowledge of the research base, and their intended commitment to the process. Acquaintance with the guideline developer or the RCN was not used as a criterion for selection.
A number of statements (n=200) were formulated by the guideline developer around the issues included in the scope of the guideline (see fig 1 for examples). These statements were developed from the literature on risk assessment and prevention of pressure ulcers (systematic reviews and other evidence-based resources outlined above), current debates about issues in practice, and from issues raised in other national evidence linked guidelines.
Each statement was rated 1–9 on a Likert scale where 1 represented least agreement and 9 represented most agreement. For ease of use the statements were collated into a booklet.
Before the nominal group met face to face, participants were asked to consider and rate the statements taking into account (a) the research evidence, (b) their opinion/clinical expertise, and (c) the realities of healthcare delivery in the UK. The first rating round was conducted by mail.
Research has shown that, without a synthesis of pertinent information, participants in the consensus process are more likely to rely entirely on their own particular experiences.29 Also, when such information is used during deliberation, the consensus process tends to be more straightforward30 and to reflect the research evidence.14 In the light of this, the consensus group received relevant research and the two systematic reviews when they received the statement booklet. This approach was also designed to help to focus the group's attention to the task and encourage them to view it as a research based exercise rather than purely an opinion based one. Group members were encouraged to bring the research evidence to the meeting, including any notes they may have made on it.28
On receipt of the completed statement booklets the frequency of response to each statement was calculated. For each statement the pattern of responses for the group was presented alongside each member's response to that statement. This allowed participants to see the spread of agreement and how their response related to this during the nominal group meeting (see fig 2 for examples).
Nominal group meeting
The nominal group meeting was held in November 1999. Research suggests that a facilitative chairperson is one of the most important ingredients for successful meetings.31,32 This meeting was facilitated by an experienced NGT facilitator which helped to ensure that the process ran smoothly and that good quality decisions were made.14
The statements which the participants had already rated (via the postal round) were discussed in turn, focusing primarily on those that were the source of the most disagreement. When all members had been given the opportunity to respond there was a discussion to clarify, defend, or dispute the issues. There was then an opportunity for each participant to privately re-rate the statements.14 It was this rating (as opposed to the postal rating) that was used to determine the final level of agreement about each statement.
To provide context to the rated statements the meeting discussion was audio recorded and later transcribed verbatim. This enabled the guideline developer to clarify issues and draw upon the text when writing up the recommendations and their rationales at a later date.
Feedback from participants after the meeting, collected by a short questionnaire, indicated that all members of the group felt they had been free to express their opinions during the meeting, that adequate time had been spent discussing relevant issues, and that they had had the opportunity to discuss all the issues they wished to.
Analysis of statement responses
There is no agreement as to the best method of mathematical aggregation of responses to the statements.14 However, there are some suggested principles to which we adhered. Medians with interquartile range were used as measures of central tendency and dispersion and were calculated for each statement from the ratings of the second round. The recommendations were drafted on the basis of the panel's level of agreement about issues. If the median score of a statement was 7–9, this was considered to be agreement or “consensus” and it was developed into a practice recommendation (or became part of one). Likewise, if it did not reach this level of consensus it was rejected.
Data were managed and analysed throughout this process using SPSS for Windows.
Forty one statements (n=202) were rejected because they did not meet the criteria. A direct pre- and post-rating statistical comparison was not possible because some of the statements changed in their ordering and wording during the meeting in the light of clarifying comments of the participants. However, anecdotally and, to a degree, numerically, the meeting produced greater agreement than the postal round as fewer statements met the criteria (that is, a median score of 7–9) in the postal round than in the nominal group meeting.
The recommendations for the guideline were drafted on the basis of the panel's level of agreement about the statements and the tape transcription. As McIntosh33 points out, there is a difference between the passivity of “evidence” statements and recommendations (where “evidence” refers to research evidence, clinical expertise, and patient preferences) which should be “statements of action”.
The recommendations for this guideline were derived directly from the statements agreed in the formal consensus process and from key evidence-based findings from the systematic reviews. The existing format of the statements that participants had rated allowed a straightforward revision to “active” recommendations, thus reducing further the risk of subjectivity entering into the process. As the key findings of the systematic reviews were limited in number and were unambiguous, they were also easily transformed to recommendations.
Rationales were developed to accompany each recommendation. The main purpose of the rationale is to give an abridged summary of the evidence supporting the guideline recommendations. In the case of the consensus based recommendations, the transcript of the nominal group discussions informed the rationales. In the case of the research evidence, systematic review findings were drawn upon.
The recommendations were graded on their evidence base as follows (adapted from Waddell et al12):
I: Generally consistent finding in a majority of multiple acceptable studies.
II: Either based on a single acceptable study or a weak or inconsistent finding in multiple acceptable studies.
III: Limited scientific evidence which does not meet all the criteria of acceptable studies or absence of directly applicable studies of good quality. This includes expert opinion.
In addition, the recommendations derived from the consensus process have figures next to them—for example, median 9 (IQR 1.25). These show the results of that process. It was felt important to include these scores in order to give guideline users a clear idea of the extent of agreement within the panel (see box 1 for an example of recommendation, rationale, and grading from the guideline).
Pressure redistributing mattresses/overlays should be used on the operating table of individuals assessed to be at high risk of pressure ulcer development.
Three randomised controlled trials have evaluated different methods of pressure relief on the operating table (cited in Cullum et al19).34–36 Their results suggest that a reduction in postoperative pressure ulcers can be achieved using an alternative support surface to a standard operating table.
The three randomised controlled trials evaluated different methods of pressure relief, but it is currently unclear which type is the most effective.19 Nixon et al34 found dry viscoelastic polymer pads (Action Products Inc) to be more effective than a standard table, while Aronovitch35 and Dunlop36 reported in favour of the Micropulse system (an alternating pressure overlay) in comparison with gel pads during surgery and a standard mattress postoperatively.
Some laboratory research has suggested that the “standard” operating table mattress may be difficult to define and that any pressure redistributing properties are dependent on the construction of each product.37 Individuals who may be at a high risk are those undergoing vascular surgery (median score 8, interquartile range IQR 2.25), orthopaedic surgery (median 9, IQR 3.25), surgery classed as major (median 8.5, IQR 1.5), and those with one or more risk factors (median 7.5, IQR 3.25).
Strength of evidence: I
This recommendation is supported by the findings of a systematic review19 including three randomised controlled trials that evaluated support surfaces for pressure ulcer prevention on the operating table.
The draft guideline was then disseminated to a wider audience—including a multidisciplinary advisory group and patient/carer representatives—to elicit comment and endorsement. From this process of dissemination, minor comments were incorporated into a re-draft of the guideline.
The guideline includes:
Quick reference guide and summary of recommendations
A philosophy of care which makes suggestions about the environment within which the recommendations in the guideline should be implemented
Evidence-linked recommendations for:
identifying individuals at risk;
use of risk assessment scales;
recognising risk factors;
pressure redistributing devices;
use of aids;
education and training.
The guideline also contains a section on the essentials of care which covers nutritional status, continence management, and hygiene. The association between these aspects of care and pressure ulcers is well documented; they are not fully understood from the current research evidence base or, indeed, in consensus opinion. In recognition that they are key to raising standards of care, this section outlines some principles for practitioners to consider in relation to risk assessment and prevention of pressure ulcers. A further section considers quality improvement activities such as monitoring, discharge planning, and audit.
The recommendations in the guideline on pressure ulcer risk assessment and prevention were formed on the basis of a number of different evidence sources. Given that, in a seemingly “objective” guideline development process, the opportunities for subjectivity to interfere are many and varied,33 the possibility of elements of subjectivity creeping into the development of this guideline is recognised. However, the modified NGT used here was successful in that it made the best use of both research evidence and the collective experience of participants (including user representation) via a systematic process. The NGT resulted in a pool of statements that participants “agreed” with about the issues the guideline was to cover (based on the definition of agreement outlined above).
In terms of process, the modified NGT used here was fairly straightforward. The meeting day was demanding upon participants because they had a large number of issues to debate; key to this was the experience of the facilitator and the collective cooperation of the panel. It is difficult to evaluate the potential impact of psychosocial influences within group processes such as conformity, compliance, obedience, and persuasion as a study of dynamics was not included in the original design of the consensus process. The process was designed, however, to minimise the possible effects of these elements, such as using private rating rounds and appropriate and skilled management of group dynamics. In addition, the process contained a measure of consensus for each recommendation so that a record of consensus is provided across the guideline development process for all to judge and evaluate. Secondary analysis of the audiotaped nominal group discussion may answer some group dynamic questions and add to the body of knowledge about group processes in guideline development—as yet, a relatively undeveloped field of inquiry.
There are a number of cautionary points to be made about using consensus as a guideline development method. As Grimshaw and Russell5 point out: “Clinical guidelines are valid if, when followed, they lead to the health gains and costs predicted for them” (page 245). The “validity” of the guideline resulting from a process such as the one outlined above could be questioned. Grimshaw and Russell5 argue that validity is dependent upon how the evidence was identified and synthesised, how many guideline users and key disciplines were included in the guideline group, and how the guideline was developed. Based on these criteria, guidelines with greater scientific validity will be those that:
are developed from systematic reviews (or, even better, meta-analysis);
include few guideline users but most key disciplines; and
ensure the explicit linkage of recommendations with evidence.
Guidelines developed with a mixture of evidence derived from systematic reviews and from consensus opinion not entirely based on research evidence would, by the same criteria, be viewed as less scientifically valid.21
However, limiting recommendations to where evidence exists “would reduce the scope of the guidelines and limit their value to clinicians and policy makers who need to make decisions in the presence of imperfect knowledge”11 (page 47). Furthermore, as Trickey et al1 suggest, limiting the development of guidelines to those areas where there is a sufficient research base would reduce the possibility of improving the quality of care for those other areas which, by their nature, do not lend themselves to randomised controlled trials. Perhaps then, as Murphy et al14 propose, the best we can do is try to identify a method that will produce more, on average, “good” judgements than other methods. This begs the questions: what method ensures arrival at “good” judgements and what is a good judgement?
Murphy et al14 suggest a number of ways of assessing validity in consensus development processes which include predictive validity, concurrent validity, and internal logic. For example, if validity was concurrently assessed it would mean that decisions made within a group that directly conflict with the research-based evidence without good reason would be invalid. However, there is no absolute means by which to judge whether it was valid at the time a decision was made, and thus no means to assess the validity of the method by which it was made.14 Prospectively, the recommendations may be able to be assessed directly by, for example, an audit of guideline implementation.
The three issues listed above are all met, to a certain extent, with the method used in the development of this guideline. Some, but not all, of the recommendations were based on systematic review evidence (where available). Key disciplines were involved in the consensus and guideline development process, and recommendations are explicitly linked to the form of evidence from which they were derived. So, for example, where a recommendation is based solely on expert opinion, this is stated and the extent of panel agreement clearly identified.
In addition to the possible weaknesses inherent in this guideline development method there are some potential strengths. Firstly, it provided an opportunity for collaborative working between a number of different disciplines at all stages of the guideline development process. Equally as important, this method also provides a forum for patient and/or carer participation. This may be particularly useful in situations where there is very little research about patients' and carers' experiences of certain conditions and/or care. Finally, use of this interdisciplinary collaborative approach may enhance the credibility of the guideline in the eyes of the end users—that is, if the participants involved in the development process are recognised professionals in the particular field of practice, it may have a positive influence on the uptake of the resultant guideline.
This paper has outlined a formal consensus method implemented for the development of a national clinical guideline. This method has yet to be tested further in national guideline development processes and by other guideline development groups. However, in this case it proved to be a practical and systematic way of integrating a number of different strands of evidence. The resultant guideline is a combination of research-based and consensus-based recommendations. In reality, most guidelines (both nationally and locally) will contain such a mixture; however, few guideline developers are explicit about their methods for eliciting expert opinion.
The political enthusiasm for clinical guidelines continues to gather momentum, as witnessed by the ambitious guideline programme of the National Institute for Clinical Excellence. This proliferation in guideline development is taking place despite a lack of a sound research evidence base for many clinical topics. In the light of this, guideline developers need to strive to develop and refine their methods for consensus building. It is also important that they are explicit about the methods used and open them up for scrutiny. Only then will the science and art of guideline development be advanced and, hopefully, improvements in the quality of patient care realised.
The author thanks Dr Nick Black, Professor of Health Services Research, London School of Hygiene and Tropical Medicine for advice and guidance in the development of the consensus method used to develop the guideline and for facilitating the nominal group, and Dr Gill Harvey for comments on an earlier draft of this paper.