How “should” we write guideline recommendations? Interpretation of deontic terminology in clinical practice guidelines: survey of the health services community
  1. E A Lomotan1,
  2. G Michel1,
  3. Z Lin2,
  4. R N Shiffman1
  1. 1Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, Connecticut, USA
  2. 2Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut, USA
Objective To describe the level of obligation conveyed by deontic terms (words such as “should”, “may”, “must” and “is indicated”) commonly found in clinical practice guidelines.

Design Cross-sectional electronic survey.

Setting A clinical scenario was developed by the researchers, and recommendations containing 12 deontic terms and phrases were presented to the participants.

Participants All 1332 registrants of the 2008 annual conference of the US Agency for Healthcare Research and Quality.

Main outcome measures Participants indicated the level of obligation they believed guideline authors intended by using a slider mechanism ranging from “No obligation” (leftmost position recorded as 0) to “Full obligation” (rightmost position recorded as 100.)

Results 445/1332 registrants (36%) submitted the on-line survey; 254/445 (57%) reported that they have experience in developing clinical practice guidelines; 133/445 (30%) indicated that they provide healthcare. “Must” conveyed the highest level of obligation (median=100) and least amount of variability (interquartile range=5.) “May” (median=37) and “may consider” (median=33) conveyed the lowest levels of obligation. All other terms conveyed intermediate levels of obligation characterised by wide and overlapping interquartile ranges.

Conclusions Members of the health services community believe guideline authors intend variable levels of obligation when using different deontic terms within practice recommendations. Ranking of a subset of terms by intended level of obligation is possible. Matching deontic terminology to the intended recommendation strength can help standardise the use of deontic terminology by guideline developers.

Clinical practice guidelines are “systematically developed statements to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances”.1 However, problems with guideline clarity and specificity impede the incorporation of guidelines into medical practice.2 3 Non-adherence is common.4 5 Several groups have developed tools that enable guideline developers to identify potential gaps in guideline quality and implementability.6–8 High-quality guidelines make explicit the connections between evidence quality, recommendation strength and the language used to make recommendations.

Language such as “should consider” and “is recommended” appears frequently in practice guidelines and is related to deontic logic. Deontic logic is that branch of logic that concerns notions of obligation and permission.9 Clinicians' perceptions of obligation to undertake recommended actions, as influenced by deontic terminology, is unknown. An understanding of how readers interpret deontic terminology would allow guideline authors to strengthen the connection between guideline language and expected adherence to guideline recommendations.

Increasingly, expectations of adherence to recommendations are communicated by systems for rating quality of evidence and grading recommendation strength.10 Yet, differences among the grading systems reveal important disagreements.11 The GRADE system, for example, divides recommendations into two categories called “strong” and “weak”, whereas the US Preventive Services Task Forces applies five letter grades (A through D, plus an I statement) to indicate “suggestions for practice”.12 13 The American Academy of Paediatrics uses four categories (“strong recommendation”, “recommendation”, “option” and “no recommendation”).10 In England and Wales, the National Institute for Health and Clinical Excellence no longer assigns letter grades to its recommendations but recognises three levels of “certainty”.14 Many organisations that develop guidelines do not use any grading system for characterising recommendation strength.

Much attention has focused on transforming the knowledge contained in practice guidelines into computable formats.15 16 A major challenge is how to translate guideline recommendations into decision support tools. Without knowing how guideline authors use deontic terminology, investigators cannot reliably implement recommendations that remain faithful to the developer's intent and purpose.

Through the use of an electronic survey, we investigated the understanding of deontic terminology by members of the health services community. Our goal was to describe the level of obligation and variation in interpretation among deontic terms. To our knowledge, the study presented here is the first attempt to examine the impact of deontic terminology on the perceived level of obligation to undertake actions recommended within clinical practice guidelines.


Survey instrument

We compiled a list of deontic terms commonly found in guidelines by considering the English forms of the French deontic terms identified by Georg et al17 and the most frequent deontic terms appearing in the Yale Guideline Recommendation Corpus using the Simple Concordance Program (V.4.09 for Macintosh, Alan Reed, available from We also searched the websites of all organisations listing more than five guidelines on the National Guideline Clearinghouse website ( for any information relating to how they apply deontic terminology.

We constructed a web-based electronic survey that presented each of the 12 deontic terms within simplified recommendation statements. To isolate the effect of the deontic term from that of any other contextual feature, we instructed participants to assume that a particular drug “A” was effective for a clinical condition “X” and presented them with recommendation statements that varied only in use of the deontic term or phrase (see figure 1.)

Figure 1

Screenshot of the first three survey questions. Readers moved the slider to the left or right according to the level of obligation they believed guideline authors intended. The default position was recorded as 50 (shown above).

Participants used a slider mechanism to record the level of obligation they believed the guideline authors intended. The scale ranged from “No obligation” (leftmost position recorded as 0) to “Full obligation” (rightmost position recorded as 100). We gave explicit instructions that the term “No obligation” did not mean that prescribing drug A was prohibited; rather, selecting the leftmost position would indicate that the decision to prescribe drug A was completely “optional”. We also explained our understanding that clinical decisions rely heavily on individual patient circumstances and that answers about intended level of obligation may not necessarily reflect intended adherence.

We collected information about participant age, whether they provide direct healthcare to clients and whether they had ever participated in the development of clinical practice guidelines, clinical decision support systems or performance measures. The initial version of the survey presented each of the recommendation statements in random order to avoid the potential introduction of order bias.19–21 Early informal piloting, however, revealed that readers could not consistently record their answers without referring to their responses from previous questions. Based on this feedback, we instituted a revised version of the survey that presented questions in non-random order and allowed participants to adjust their answers until final submission of the survey. The final version of the survey can be found at

Study population

We invited all 1332 registrants for the 2008 annual conference of the US Agency for Healthcare Research and Quality (AHRQ) to participate. We emailed each registrant an introductory message and a direct link to our survey website. Following a formal pilot period using 50 randomly selected registrants, the main body of the study ran for 6 weeks from mid October 2008 to the beginning of December 2008. We sent two reminder emails during that time. The comments we received during the pilot period did not suggest any major difficulty in understanding the purpose of the study, the survey instructions or use of the slider mechanism.

Because the survey remained unaltered following the pilot study and pilot participants were selected from the complete list of conference registrants, we combined all responses into a single analysis. We calculated a response rate based on the Response Rate (RR2) definition from the American Association for Public Opinion Research.22

Data management and analysis

We used descriptive statistics to examine the variable response for each deontic term. Analysis was conducted using a standard statistical package (SPSS V.16 for Macintosh).


Upon registration for the AHRQ annual conference, attendees were given the option to describe themselves by choosing one or more of several professional categories (Howard Holland, AHRQ, personal communication, 2008). Results are summarised in table 1.

Table 1

Characteristics of conference registrants

The registrants (445/1332) submitted our on-line survey (response rate of 36%). More than half (57%, 254/445) reported that they have experience in developing clinical practice guidelines, and 30% (133/445) indicated they provide healthcare (table 1). We did not collect data from clinicians about their level of guideline usage. Respondents between ages 50 and 59 years represented the largest age group (38%, 171/445.)

Most organisations listing more than five guidelines on the National Guideline Clearinghouse website did not provide any insight into their use of deontic terminology. Those that provided guidance seemed to disagree on what and how terms should be used (eg, whether the terms should be linked to specific levels of evidence quality or recommendation strength). A partial list of organisations that commented either within published guideline manuals or directly on their websites on their use of deontic terminology is shown in table 2.

Table 2

Deontic expressions recommended for use by a sample of guideline-developing organisations

Figure 2 displays the median and interquartile range for each of the 12 deontic terms examined in our survey, arranged in descending order of perceived level of obligation. “Must” conveyed the highest level of obligation (median=100) and least amount of variability (interquartile range=5.) “May” (median=37) and “may consider” (median=33) conveyed the lowest levels of obligation. All other terms we examined conveyed intermediate levels of obligation characterised by wide and overlapping interquartile ranges (see figure 2).

Figure 2

Level of obligation conveyed by deontic terms commonly appearing in clinical practice guidelines. Bars represent simplified box plots displaying interquartile ranges and medians. Perceived level of obligation was recorded by a slider mechanism that ranged from 0 (“No obligation”) to 100 (“Full obligation”).


Our goal was to describe the level of obligation imposed by deontic terms commonly found in clinical practice guidelines. We found that the interpretation of deontic terms by the health services community varies and that ranking of deontic terms by level of obligation is possible. Using an internet-based survey, we showed that “must” conveys the highest level of obligation, while “may” and “may consider” convey lower levels of obligation. “Should” and all other deontic terms we examined convey intermediate levels of obligation.

Variable interpretation of expressions used in medicine has been well documented, most notably with regard to physician interpretation of probabilities.28–30 Kong et al demonstrated that medical professionals could agree on the ranking of common probability expressions, but there was wide variation in interpretation of each expression.31 Similarly, our survey demonstrates a ranking of a subset of terms but with considerable variability in interpretation of individual deontic expressions.

Figure 2 suggests for members of the health services community to recognise at least three levels of obligation. “Must” conveys the highest level of obligation, while “may” and “may consider” convey lower levels. Every other term we examined conveys an intermediate level. The addition of “consider” appears to decrease the level of obligation associated with “must” and “should” but does not change the impact of “may”, which already conveys the lowest level of obligation.

While the use of “consider” in a recommendation softens the obligation imposed, it also makes measuring performance and auditing adherence more difficult. Increasingly, performance measures are based on practice guidelines.32 Guidelines that use “consider” pose significant challenges to quality improvement teams because it is often impossible to determine whether a recommended activity was “considered”.

A guideline reader's formula to determine the level of obligation and intended adherence likely includes a plethora of linguistic and non-linguistic variables. In addition to the deontic term encountered, a reader is likely to assess such factors as the stated recommendation strength (if present), the type of action recommended (eg, to prescribe a medicine vs to order an invasive procedure or test), the severity of the patient condition under consideration and the organisation responsible for the guideline's development. To our knowledge, none of these other variables have been studied as potential influences on clinicians' perception of obligation to undertake recommended actions.

Suggestions for guideline developers

If deontic terminologies were used to strengthen a connection between recommendation language and expected adherence to recommendations, these data suggest that three separate levels of recommendation strength should be available to guideline developers. As long as terms conveying distinct levels of obligation were chosen (ie, non-overlapping interquartile ranges), guideline developers could take advantage of a natural ranking of deontic terms.

“Must” clearly defines the highest level of obligation; however, we anticipate only rare usage of the term. Based on our examination of the Yale Guideline Recommendation Corpus, “must” appears in only 19 recommendations.18 The use of “must” or “must not” may be limited to situations where there is a clear legal standard or where quality evidence indicates the potential for imminent patient harm if a course of action is not followed. “May” is an appropriate choice for the lowest level of obligation. We suggest avoiding any expression using “consider” for reasons mentioned earlier. The impact of “not” remains a topic for future study.

“Should” is the most common deontic verb found in the Yale Guideline Recommendation Corpus (appearing 709 times) and is an appropriate choice to convey an intermediate level of obligation. Alternatively, the intermediate level could be stratified into “should” and “is appropriate”. Overlapping ranges of obligation may be acceptable as long as guideline developers make explicit the connection between deontic terms chosen and their intended level of obligation. One strategy would be to link deontic terms to grades of recommendation strength. In this approach, the number of deontic terms used would depend on the particular grading system applied by the guideline developers.


Our response rate was low but consistent with response rates of internet-based surveys reported elsewhere.33–35 Generalisability to a wider population of clinicians and consumers of practice guidelines may be limited. However, our sample included key target audiences, including clinicians, developers of practice guidelines, developers of performance measures and developers of decision support systems.

We wrote simplified recommendation statements within a deliberately vague clinical scenario in our best effort to isolate the effect of deontic terminology from other contextual features. Use of actual, published recommendations or an examination of other deontic terms may produce different responses. We also did not take into account word preferences (eg, the use of “shall” vs “should”) or cultural norms (eg, American vs British) that may affect how people interpret and use deontic terminology. We permitted each participant to use his or her own understanding of the concept of obligation and did not measure intra-rater variability.


A focus on deontic terminology is a small but important step towards producing guidelines with more predictable influences on clinical care.3 “Must”, “should”, and “may” are well suited to represent three discrete levels of obligation recognised by the health services community. A standardised approach to the use of deontic terminology and the application of deontic terminology to systems for grading recommendation strength should be part of a larger set of standards for guideline development and presentation.


The authors wish to thank the US Agency for Healthcare Research and Quality and the National Library of Medicine for their support.

View Abstract


  • Funding This study was supported by grant R01-LM07199, which is cofunded by the National Library of Medicine and the Agency for Healthcare Research and Quality, contract AHRQ-07-10045, and by grant T15-LM07065 from the National Library of Medicine.

  • Competing interests None.

  • Ethics approval The Yale Human Investigations Committee approved our study under exemption status. The survey instrument did not collect any personally identifiable information.

  • Provenance and peer review Not commissioned; externally peer reviewed.

