Objective To compare three methods of guideline development, to see whether using alternative evidence-based methods resulted in variation of recommendations for treating actinic keratosis.
Methods Method 1 followed a standard multiple session evidence-based approach with a working group. In method 2 recommendations were formulated by a working group during a 2-day conference. Method 3 used one epidemiologist to summarise the evidence and one dermatologist to make clinical recommendations afterwards. Graded recommendations and levels of evidence were compared per therapy across three draft guidelines. The primary outcome was the extent of accordance or discordance. Secondary outcomes were total costs and time period necessary to make a draft guideline.
Results Therapeutic recommendations and levels of evidence differed in some occasions. However, intraclass correlations between levels of evidence were significant (method 1 vs 2: p=0.003; method 1 vs 3: p<0.001). Regarding recommendation variation method 1 and method 2 correlated significant at 0.755 (p=0.001). Method 1 versus 3 and method 2 versus 3 also showed significant, but lower, correlation coefficients (respectively, 0.493 (p=0.026) and 0.673 (p=0.007)). Method 3 was the cheapest and quickest (24 770 euro and 4 months) and method 1 was the most expensive and slowest method (€48 100 euro and 14 months).
Conclusions The value of a guideline using alternative evidence-based methods seems to at least equal that of a guideline composed in multiple sessions, that is, for topics with a monodisciplinary character and a relatively small number of conducted trials. In addition, the presented alternatives were more time- and cost-efficient.
- Quality assurance
- practice guidelines as topic
- clinical practice guidelines
- evidence-based medicine
Statistics from Altmetric.com
- Quality assurance
- practice guidelines as topic
- clinical practice guidelines
- evidence-based medicine
Actinic keratosis is the most prevalent pre-malignant skin disease in people with skin types I–III and presents itself as solitary or multiple, variable erythematous and irregularly keratinised lesions in sun-exposed areas of the skin.1 2 Consequently, chronic exposure to ultraviolet radiation is an important aetiological factor.3 There are multiple treatment options, varying in (cosmetic) effectiveness and side effects.2 A Dutch clinical practice guideline (CPG) is available to guide dermatologists in treating actinic keratosis.4 Over the last few decades, the method of developing guidelines has changed from an expert consensus-based approach to an evidence-based approach.5 These days a search for the best available evidence in literature is obligatory. Although there are several methods used in daily practice, it is more or less standard to develop guidelines with a working group following a transparent evidence-based approach.6 However, the traditional evidence-based approach in multiple sessions has several problems, such as the time and money required to develop a CPG. The alternative methods presented here have elements that have the potential to be quicker (fewer sessions), thus making it easier to keep abreast of developments as they occur, and to perhaps better sustain group motivation (a problem often encountered by guideline developers). Nevertheless, the question remains whether the quality is affected or not. Hence, we conducted a prospective study, comparing three methods of guideline development, to test whether using alternative evidence-based methods results in variation of guideline recommendations and to analyse the cost-effectiveness.
The prospective three-arm comparative trial investigated the following guideline development methods.
A working group followed an evidence-based approach with multiple sessions (traditionally 6–12) and was blinded regarding the existence of other working groups. Clinical participants extracted the data in evidence tables and wrote texts. The working group of method 1 held six meetings in which they discussed their individual input and formulated conclusions and recommendations in a draft guideline.
A second working group was composed similar to that in method 1 to avoid confounding (table 1). This group, although aware of being part of an experiment, formulated recommendations during a 2-day conference, without knowing the topic in advance. Outside consultation was prohibited and any additional information used was documented. In 2 days a draft guideline with recommendations for the treatment of actinic keratosis was made, based on self-made evidence tables distilled from received articles. Evidence tables from the evidence report in method 3 were available and used in cases of insurmountable time pressure.
Method 3 used an epidemiologist who systematically summarised the evidence from the available literature in an evidence report with evidence tables. A single independent dermatologist (not part of any of the other methods) subsequently formulated recommendations for clinical practice, without knowledge of the results of the other groups. However, the dermatologist was aware of being part of a trial.
Receiving the same set of articles based on the results of a systematic literature search, each method yielded a draft guideline ready for comparison. While composing their guidelines, each working group benefited from the full methodological support (eg, retrieving and printing requested articles, take down during meetings, answering epidemiological questions), without content interference, of experienced guideline developers with knowledge of research methodology.
In Medline, EMBASE (Excerpta Medica Database) and CENTRAL (Cochrane Central Register of Controlled Trials) databases (from July 1986 until July 2005) relevant studies were searched.7 This search was updated to find additional studies. Titles and abstracts were screened for potentially relevant clinical trials and existing guidelines investigating treatment for actinic keratosis. Full text versions were assessed using predefined eligibility criteria. References from potential relevant studies were checked to find additional evidence. The search and selection resulted in a set of articles used in all three methods (table 2). To facilitate uniform background information for the three investigated methods a set of 43 articles regarding epidemiology, risk factors and aetiology was selected from a search performed homogenous to a systematic review.69 Additional literature brought in by working group members was allowed, but it was obligatory to document study details.
Each recommendation reported as such in the three different draft guidelines was compared per therapy. The primary outcome was the extent of accordance or discordance of each recommendation and underlying evidence across the three guidelines. The underlying scientific evidence for each recommendation was analysed by checking all the references, mentioning additional articles used and mentioning the level of evidence as formulated by the participants of the different methods (box).
Classification of literature according to the level of evidence (levels according to those set by the Dutch Institute for Healthcare Improvement, CBO)
Level of evidence
1=One systematic review (A1) or at least two studies of level A2 which were carried out independently
2=At least two independent studies of level B
3=One study of level A2 or B or studies of level C
4=Expert opinion, for example, the members of the working group.
For articles concerning intervention (prevention or therapy)
A1=Systematic reviews involving at least two studies at A2 level, of which the results of separate studies are consistent
A2=Randomised comparative clinical studies of good quality (randomised, double-blind controlled trails) of sufficient size and consistency
B=Randomised clinical trials of mediocre quality of insufficient size or other comparative studies (non-randomised, comparative cohort study, case–control study)
D=Expert opinion, for example, the members of the working group
A panel of three investigators (RB, JvE, CB) graded the recommendations. A grading system with three categories of perceived obligation to follow a recommendation was used: either negative or positive (table 3).70 71 Discrepancies were discussed until an agreement was reached. Factors attributing to possible differences were described. Secondary reported outcomes were total costs (eg, retrieval of articles, working group expenses, salary, catering and so on) and time period necessary to make a draft guideline (including preparatory work like searching databases and retrieving full text articles).
Statistical analysis was performed using SPSS statistical software, version 17.0. Statistical significance was defined as a p value less than 0.05. Recommendations and level of evidence were regarded as ordinal variables varying in value from 1.0 (highly negative recommendation) to 6.0 (highly positive recommendation) and 1.0 (level of evidence 1) to 4.0 (level of evidence 4). The correlation between the recommendations and the levels of evidence was tested using the intraclass correlation coefficient with an absolute agreement definition.
The three guideline development methods gave the results depicted in table 2 per treatment. The level of evidence behind the recommendation differed for several treatments: dermabrasion, (shave) excision, cryotherapy, photodynamic therapy (PDT), laser, 5-FU (5-fluorouracil), diclofenac, retinoids and colchicine. However, the correlations between the appointed levels of evidence remained highly significant (p=0.003 and p<0.001) (table 4).
The members of method 1 used five additional references, method 2 used three and method 3 used one additional reference (table 2). Regarding the grading of recommendations, differences across the three methods were present in the recommendations for (shave) excision, cryotherapy, PDT, laser, chemical peeling, diclofenac, imiquimod, retinoids, colchicine and masoprocol (table 2). The correlation coefficient between method 1 and method 2 was highly significant at 0.755 (p=0.001). There was a significant, but lower, correlation coefficient between method 1 and 3 and between method 2 and 3 (respectively 0.493 (p=0.026) and 0.673 (p=0.007) (table 4). With regard to the total costs and time period necessary to prepare a draft guideline, method 3 was the cheapest and quickest (€24 770 and 4 months) and method 1 was the most expensive and slowest (€48 100 and 14 months) (table 5).
A comparison of the three guideline development methods showed a highly significant correlation between the recommendations formed by the group that followed a traditional evidence-based approach with multiple sessions and one that followed a 2-day conference method, despite the differences in formulation. The third method consisting of a systematic evidence report and recommendations made by a single dermatologist was also significantly correlated with the other methods; although significance and correlation coefficient were lower (table 4). When considering the development process, there are various benefits and disadvantages in the characteristics of the three methods. The benefit of the standard multiple meetings method (method 1) was that there was time to adapt and reconsider texts and members could consult colleagues and experts between guideline development sessions; probably enhancing support for implementation of the final guideline.72 73 However, more sessions, time and costs were required (table 5) and the progress was often dependent on the effort of single group members. The conference method 2 was found to be highly satisfactory by its members, partly because of the direct discussion possible, and was time- and cost-efficient. Time, however, was also a crucial factor since a quality guideline needed to be prepared in just 2 days. Furthermore, the quality of preparatory work was essential. Still, preparatory work was perhaps more controllable due to the uniform and transparent way of searching, selecting and extracting data. A disadvantage could be the large amount of preparatory work needed, so correct scheduling is necessary. Method 3 is quick and relatively inexpensive (table 5), but potentially more susceptible to interpretative bias as it depends on the expertise of a single dermatologist. In addition, the field acceptance of method 3 is highly debatable.74 Moreover, an epidemiologist often has no clinical experience regarding the topic. On the other hand, a uniform review of the evidence by a single epidemiologist could reduce variation in the appointed level of evidence, because the comparison showed several differences. Additional literature could explain a part of the variation in the level of evidence for dermabrasion, excision, cryotherapy, PDT and laser therapy.
In four out of eight cases, the additional literature was distilled from existing guidelines.75–77 The other literature, not identified by the systematic literature search, was brought in by members of the working groups; a benefit for a group method over a single dermatologist. Apart from differences in assessing the underlying literature, variation in recommendations is probably caused by different considerations for clinical practice beyond the evidence (online supplementary table 1).
Although the working group of method 1 was blinded about the existence of other methods, the working group of method 2 knew that they were part of a comparison as soon as the topic was revealed at the beginning of the 2-day meeting. Also, the members participating in method 3 knew that they were part of a study. This could have resulted in the so-called Hawthorne effect—by being part of a study one feels special and puts in more effort to make something special of it. Positive results could therefore be overvalued.78
Moreover, the individual members of the working groups are a likely source of bias. By matching several characteristics (table 1) we tried to enhance comparability, as randomisation was not feasible. Still individual opinions and expertise vary and individual group members can play an important role in setting subtle textual differences in a guideline, which emphasises the importance of a rigorous way to develop consensus and address considerations beyond the evidence.79
Actinic keratosis is a well-described skin disease with a relatively small amount of published studies (some 100 therapeutic trials). Thus, one could say that actinic keratosis is most suitable for guideline development in one meeting, because a draft guideline can be written in a few days. Perhaps for a more intensively trialled, multidisciplinary or controversial guideline topic, a combination of methods 1, 2 and 3 will be the most suitable. Particularly, the combination of an externally generated, methodological conscientious evidence report with subsequent analysis and translation into clinical recommendations in a meeting of clinicians would be interesting for further research.
Another limitation is the way recommendations are being compared and formulated. There are various systems for grading the strength of a recommendation.80 We graded terms in a recommendation according to the perceived level of obligation to follow that so-called deontic term.70 Even if perceived obligation to follow a recommendation is a practical measure, recommendations often differ in a more subtle way. For instance, a more specific recommendation with information about dosage is more likely to be followed than a vague, broader recommendation.73 Major steps in setting a high quality standard in how to formulate recommendations in clinical practice guideline have been made with the introduction of the Appraisal of Guidelines Research & Evaluation (AGREE) instrument,81 the formation of (inter)national platforms (such as the Guideline International Network)82 and in promoting transparent guideline development and implementation methods (Grading of Recommendations Assessment, Development and Evaluation (GRADE), Guideline Implementability Appraisal (GLIA)).83–85 However, there is no such thing as a golden standard and much remains to be elucidated. Adoption of a single guideline system in the future can make comparison across guidelines easier. Also, adaptation of already existing guidelines (collaboration for ADAPTation of existing guidelines (ADAPTE)), another way to potentially reduce costs and effort, would be simpler.86 Matching deontic terms to grades of recommendation strength can help to further standardise the use of such expressions by guideline developers.70
The therapeutic recommendations and appointed levels of evidence for the treatment of actinic keratosis differed on several occasions when comparing three different guideline development methods. Despite the differences, the methods rendered a significant correlation compared with each other. Consequently, the value of a guideline using a 2-day conference development method, a method in which the preparatory work is carried out by an epidemiologist and finished by a single dermatologist or a combination of these two seems to at least equal that of a guideline composed in multiple sessions, that is for topics with a monodisciplinary character and a relatively small number of conducted trials. In addition, the presented alternative methods were more time- and cost efficient. The equality in guideline quality remains questionable for a more intensively trialled or controversial guideline topic. Further research and experience on this matter is warranted.
We thank all the participating dermatologists and pathologists for making this study possible.
Funding Independent funding by Dutch Society of Dermatology and Venereology. The sponsors had no role in the design and conduct of the study; in the collection, analysis and interpretation of data; or in the preparation, review or approval of the manuscript.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.