Article Text

A structured judgement method to enhance mortality case note review: development and evaluation
  1. Allen Hutchinson1,
  2. Joanne E Coster1,
  3. Katy L Cooper1,
  4. Michael Pearson2,
  5. Aileen McIntosh1,
  6. Peter A Bath3
  1. 1Section of Public Health, School of Health and Related Research (ScHARR), University of Sheffield, Sheffield, UK
  2. 2Department of Clinical Evaluation, University of Liverpool, Liverpool, UK
  3. 3Information School, University of Sheffield, Sheffield, UK
  1. Correspondence to Professor Allen Hutchinson, Section of Public Health, School of Health and Related Research (ScHARR), University of Sheffield, Regent Court, 30 Regent St., Sheffield S1 4DA, UK; allen.hutchinson{at}


Background Case note review remains a prime means of retrospectively assessing quality of care. This study examines a new implicit judgement method, combining structured reviewer comments with quality of care scores, to assess care of people who die in hospital.

Methods Using 1566 case notes from 20 English hospitals, 40 physicians each reviewed 30–40 case notes, writing structured judgement-based comments on care provided within three phases of care, and on care overall, and scoring quality of care from 1 (unsatisfactory) to 6 (very best care). Quality of care comments on 119 people who died (7.6% of the cohort) were analysed independently by two researchers to investigate how well reviewers provided structured short judgement notes on quality of care, together with appropriate care scores. Consistency between explanatory textual data and related scores was explored, using overall care score to group cases.

Results Physician reviewers made informative, clinical judgement-based comments across all phases of care and usually provided a coherent quality of care score relating to each phase. The majority of comments (83%) were explicit judgements. About a fifth of patients were considered to have received less than satisfactory care, often experiencing a series of adverse events.

Conclusions A combination of implicit judgement, explicit explanatory comment and related quality of care scores can be used effectively to review the spectrum of care provided for people who die in hospital. The method can be used to quickly evaluate deaths so that lessons can be learned about both poor and high quality care.

  • Chart review methodologies
  • Quality improvement methodologies
  • Healthcare quality improvement
  • Patient safety
  • Qualitative research

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Hospital death rates are a matter of public concern in the UK and have been the subject of both country-wide data analysis and local intensive reviews, one of which has resulted in a major public debate.1 ,2 Concerns about hospital deaths in well-developed health systems, especially when linked to the occurrence of adverse events,3 have also been expressed internationally. This has resulted in a number of rigorous epidemiological studies of adverse event frequency, for example in Australia, Canada and Sweden.4–6 More recently, there have been large studies of hospital deaths, together with associated events, which have examined whether some hospital deaths might have been preventable.7 ,8 On a day-to-day level, however, there remains a need for rigorous methods to enable clinical teams to retrospectively assess quality of care in a timely manner and, thus, to identify when deaths were inevitable or whether they might have been prevented with better care. This could assist, for example, in the discussions on care that currently take place in hospital ‘morbidity and mortality’ meetings.

Internationally, case note review remains a prime means of retrospectively assessing quality of care,3–8 despite the known methodological and practical challenges of this review method.9–11 Two principal review methods are used: explicit criterion-based methods and implicit (sometimes called holistic) methods which are based on clinical judgement.

Criterion-based methods, usually using frameworks of pre-determined criteria to identify elements of care which are either met or not met, are useful for large-scale audits of care or for screening case notes using criterion-based trigger tools.9

Implicit review methods are based on clinical judgement, and are probably more effective for identifying and recording the detail and nuance of care (both unsatisfactory and good).12 Thus, implicit review methods are probably more appropriate for detailed exploration of the care for people who die in hospital. However, unstructured implicit review formats have been criticised for low inter-rater reliability (high variability) and for potential reviewer bias,9–11 ,13 whereas structured implicit review limits the variability and creates specific frameworks so that reviewers are able to make, justify and organise statements on care.14

Initial models of structured implicit review methods were actually a fusion of implicit judgements of quality of care which were required of the reviewer in order for them to check a set of explicit review criteria (eg, a criterion such as ‘no appropriate nursing interventions carried out’).14 A framework such as this was used by Pearson et al15 to monitor nursing care quality. More recently, Hogan et al8 used this approach in a study of the frequency of adverse events and preventable deaths in English hospitals, where a judgement-based structured explicit 1–5 scale was used by reviewers to rate quality of care from very poor to excellent. In a study of adverse event frequency and preventability on 8400 patient records in the Netherlands, Zegers and colleagues used two 6-point scales which reviewers employed to record their judgement as to whether injury was caused by healthcare management or the disease process and to assess the degree of preventability.7 ,16

However, this form of judgement-based structured implicit review only provides a scale-based quantitative result and there is no way to determine how or why the reviewer judgement was made. Thus the method is useful for large scale monitoring or epidemiological studies of adverse events, but has rather less value for more detailed review at the ward or hospital level of why an event occurred.

To increase the value of structured implicit review in the context of reviewing the whole spectrum of care quality, rather than focussing only on adverse event rates, we designed and tested a structured care review method, drawing on the initial work of Kahn and colleagues.14 This required reviewers to make implicit clinical judgements and to write explicit comments to support judgement-based quality of care scores.9 In the developmental stage of the study, multi-professional groups of reviewers independently reviewed the same records, first using a quantitative and then a qualitative review process. For each case, the review process was undertaken for three phases of care (admission, initial management and later management), followed by an overall judgement of the care provided for the patient. For each phase of care, and for care overall, reviewers, both physicians and nurses, were asked to rate quality of care on a 1 (unsatisfactory) to 6 (excellent) scale. This was similar to a four-stage phase of care approach, together with overall care quality, subsequently used by Hogan et al8 to provide a framework on which to rate quality of care.

There was moderate inter-rater reliability of these judgement-based scores when two or three physicians, working separately, used structured implicit review on the same set of case notes (intraclass correlation coefficient (ICC) 0.52). Physician reviewers tended to make more explicit written judgements on the quality of care provided than did nurse reviewers, who more often made commentaries about the process/pathway of care.13

Subsequently, we asked 40 physician reviewers to undertake this enhanced form of structured implicit review to examine the quality of care provided for 1566 people with either chronic obstructive pulmonary disease (COPD) or heart failure as their main diagnosis. There was no oversampling of deaths and each set of case notes was reviewed only once. There were two reviewers (one for COPD cases and one for heart failure cases) for each of 20 randomly selected large hospitals in England and each reviewer judged between 30 and 40 consecutively selected sets of case notes and associated clinical records in their own hospital. Reviewers were either senior respiratory or cardiology physicians in training. Our initial quantitative analysis, reported elsewhere, examined the range of phase of care scores and overall care scores for each of the 20 hospitals and the relationship of the care scores to broader quality of care markers.9

Here we report a new qualitative and quantitative analysis of the commentaries written by the reviewers to support their judgement scores of care provided for the 119 cases who died in hospital (7.6% of the cohort of 1566 cases). The purpose of the analysis was to explore whether physician reviewers can consistently provide short, structured, judgement-based comments on quality of care that they can also justify with an appropriate care score. The consistency between the explanatory textual data and the related scores is explored with a view to considering whether this structured method, combining implicit judgements supported by explanatory comments, together with quality of care scores, can be used for routine mortality case note review.


Hospital and reviewer selection

Acute care hospitals in England were first grouped into quartiles using mortality data. Equal numbers of hospitals from the top and bottom quartiles were then randomly selected (20 in total). Each randomly selected hospital had to provide two reviewers, who were all volunteers and specialists in training. Each was initially approached by specialists in their own hospitals and initial research team contact with the specialists was made through the Royal College of Physicians.

Reviewer training

All reviewers received training in the review methods and in data recording prior to data collection. A full-day training session comprised a description of the methods, discussion about the need to be as explicit as possible about the judgement commentaries and a session reviewing a set of case notes in pairs with tutors. Finally, all of the reviewers judged the care from the same set of anonymised case notes and then commented on their findings in a managed small group discussion, which again emphasised the need to be explicit in their judgements. Data were collected via an electronic form which enabled direct entry by reviewers of both comments and scores for all relevant care phases and care overall. This enabled reviewers to structure their commentaries. The data collection programme was also demonstrated during the training day.

Finally, reviewers were provided with a set of national clinical practice guidelines relevant to their clinical specialty. Regular contact was maintained between the study team and the reviewers, who could ask for advice during the review period using a telephone helpline.

Data collection

Each set of case notes was reviewed by a single physician reviewer. Quality of care was assessed in three phases—admission, initial management and later management, and also for care overall. For each phase of care and for care overall, reviewers wrote short textual comments on the quality of care provided and were encouraged to be explicit in their comments on care. They also gave the care a score from 1 to 6 for each phase and for overall care, based on the criteria in table 1.

Table 1

Care score criteria

Analysis methods

Of the 1566 cases reviewed, 119 had died during their hospital admission. To explore the type and content of written comments by the reviewers on each of the 119 cases, a textual analysis framework, developed during the study prior to this analysis and previously reported,9 was applied to all of the phase and overall care comments. Two authors (AH, JEC) reviewed and categorised the comments independently and any differences in categorisation were resolved through discussion.

Comments were categorised into three groups (see box 1). All comments in categories B (implicit judgement comments) and C (explicit judgement comments) were subsequently classified by the two study analysts as indicating good quality of care (positive comments) or as indicating poor quality of care (negative comments). These two categories of comment for each case were then grouped by their related overall quality of care scores, which were then used to classify each case into one of six groups, from unsatisfactory care (score 1) to very best care (score 6). Examples of the detailed textual analysis are presented in the results in tables 2 and 3.

Box 1

Reviewer comment categories

Category A

Little or no comment about care and/or little or no judgement, including, for example, a description of what was in the case note or a description of what happened to the patient (not the care they received).

Note: Category A did not contribute to the analysis presented here, since this analysis was concerned with judgements rather than descriptive reports.

Category B

Limited comment about quality of care and/or implied judgement. This category included an implied judgment and/or a description of the care delivered (not just a description of a patient pathway) and/or a description of an omission of care.

Category C

Comments about care with explicit judgements and views. This category included explicit judgements of care delivered, questioning or queries about the care delivered, explanations or justification of care delivered, alternative options or justification of care that should have been delivered or concerns about care.

Table 2

Reviewer commentary on care judged unsatisfactory overall

Table 3

Reviewer commentary on care judged short of best practice

The association between the quality scores for care overall for the group of 119 people who died was compared with the distribution of scores for the 1447 patients who survived, using the χ2 test. The association between the comment category and type and their relationship to each another were explored across overall care scores using the χ2 test. The χ2 tests were undertaken using Microsoft Excel and p values were calculated using GraphPad software (


The overall quality of care scores for the patients who died are compared in table 4 with the scores for all patients who survived. The proportions of cases in which care fell short of good practice are relatively similar across the two groups of cases, although there are a higher proportion of ‘satisfactory’ cases and a somewhat lower proportion of ‘good’ cases among people who died than in the survivor group. There were no statistically significant differences between the two groups (χ2=9.800; df=5; p=0.0811).

Table 4

Quality of care overall: score comparisons between people who died and those who survived

Relationship of positive and negative comments to overall care scores

Table 5 summarises the relationship between the overall care score for each case and the types of comment (whether positive or negative judgements) provided by the reviewers for each of the phases and for overall care. There was a significant association between the total number of positive and negative comments and the overall scores (χ2=205.50; df=5; p<0.0001).

Table 5

Numbers of positive and negative comments per overall score

In the care score range unsatisfactory (1) to falling short of best practice (3), the proportion of negative comments outweighs the positive comments. When the care is rated from satisfactory (4) to very best care (6), the positive comments increasingly outweigh the negative. Generally, the positive to negative ratio of comments for each phase remains stable across each overall group score band. So where the overall score is 3 or less, across each of the phases there are more negative comments than there are positive comments, and the reverse is true for the summary of the higher scores, indicating that the reviewer judgements are generally consistent with the overall score that was given. The ratios of positive to negative comments ranges between 0.28 for overall care score 1, to 21.17 for those cases grouped by overall care score 6.

There are fewer comments in total in the later phases of care because some patients died early in the course of the admission. There is also some indication in the textual commentaries that a number of reviewers felt most of what needed to be said had already been said in the earlier phase of care comments for a particular case, and so did not need to be repeated.

In general, the phase of care comments were more detailed than the overall care comments. Occasionally, however, reviewers gave an unexpectedly high score related to a qualitative judgement that suggested a lower quality of care had occurred (see, for example, the case in table 3).

Categorisation of comments: implicit and explicit judgements about care quality

Table 6 summarises the numbers of comments grouped by category (category B: implicit judgements of care; category C: explicit judgements of care) and comment type (positive or negative) for each overall care score.

Table 6

Comments by type and category and overall score

Results in table 7 show that, overall, there were more than four times as many explicit comments (judgements) as there were implicit comments. For the lower overall care scores (1–3), there tended to be a rather higher ratio of implicit (B) judgements than there were for the higher care scores, although the implicit judgements were always in the minority. This trend is confirmed by a significant statistical association between the total number of implicit/explicit judgements of care and the overall care score (χ2=48.37; df=5; p<0.0001). Thus, the pattern of more explicit comments than implicit comments was seen for all quality of care scores, from 1 (poor care) to 6 (best care), indicating that reviewers were on the whole prepared to make explicit judgements where care was poor as well as where care was good.

Table 7

Comparison between implicit/explicit and positive/negative comments

These results suggest that the reviewers were on the whole prepared to make the type of judgements and explicit comments asked of them during training and which would be valuable in a quality of care review.

Content and nature of comments

Study of the individual comments showed that a number of B category comments contained concise technical summaries in addition to implicit judgements on the quality of care. Many of the C category comments were incisive clinical observations with a strong view of the quality of care, especially when the reviewer considered that the care was poor. Comments across the range of overall scores often included consideration of the broader, non-technical processes of care (eg, communication with relatives), as well as technical aspects of care.

Of the 21 case reviews with low overall scores (scores of 1, 2 or 3), 15 were accompanied by an explicit clinically relevant judgement that justified the low score. Some related to cases where care was generally poor throughout the inpatient episode, while others related to cases where a specific aspect of care was of concern. In two of the cases, incorrect diagnosis was the main problem, while in 12 cases there was concern about suboptimal management. There were usually multiple smaller events that were additive, rather than one main adverse event, which only occurred in one of the 12 cases. Two of the 15 cases were considered to have such poor record keeping as to be a threat to the care of the patient.

Tables 2 and 3 provide examples demonstrating the range, type and category of comments made by reviewers in two cases. All of the comments are as written by the reviewers and the scores given for each phase of care are included. Reviewers were able both to comment on the technical aspects of care and to take a holistic view of the overall care plan.

Table 2 is also used to demonstrate how the categorisation of the comments was applied in the analysis. For example:

  • Although the reviewer explicitly grades the documentation as poor in the admission, this is only a description of the documentation without any explanation and therefore is categorised as a B level comment. In the initial management phase, however, there is a judgement (very poor documentation) together with an explanation, which rates a C category.

  • When the reviewer implies in the initial management phase that it was poor practice not to take an arterial blood gas sample (‘No ABGs and patient was tachypnoeic and hypoxic’), there is no explicit statement that this was unsatisfactory (and it is thus a B category comment).

  • A judgement on the therapy (‘pitiful dose of frusemide (furosemide) (20 mg IV)’) is a C category comment.

  • When commenting on the technical aspects of care, the reviewer could also be explicit about how the care should have been managed overall, in the context of the patient's illness. This is an explicit, category C, judgement.

The case in table 2 also illustrates a pattern where there is a group or ‘constellation’ of events which of themselves may not cause severe harm but which, taken together, can lead to harm to the patient. This pattern was also found in the main study among some of the patients who survived.17

Although there are usually more negative comments than there are positive comments when overall care scores are low, as shown in table 5, the case in table 3 shows examples of how positive and negative comments can be juxtaposed in each phase. In retrospect, this case also raises the question of whether the overall score of 3 was the most appropriate—it might be argued from the level of the comment that the case could have been given a lower overall care score of 2 (see, for example, the comments on later management).

Comments on good care tended to be more global than those for unsatisfactory care but may also be quite explicit. Cases which demonstrate this and also how a single adverse event may change the reviewer's overall consideration of the case are included as additional material (see online supplementary tables S7 and S8).

Some of the reviewers in this study were more ‘explanatory’ than others, so that, in some cases, the number of comments may reflect individual style rather than the strength of the comment. For example, comments such as ‘good care’ or ‘unclear treatment’ are short explicit judgements without further detail, while other reviewers are more extensively explicit.

Of the 63 case reviews (54% of the total number of mortality reviews) that scored most highly (5 or 6), 52 were accompanied by a short explicit comment in the overall care section indicating that all key aspects of care had been good or excellent (eg, ‘well looked after’) and in 16 of the 63 reviews there were comments about the inevitable outcome of the case despite the good care received.


In this study we have shown that physician reviewers are able to use structured review to make implicit quality and safety judgements, write explicit short care commentaries and give coherent matching quality of care scores. Quantitative scores and qualitative comments corresponded well, indicating that physician reviewers can appropriately score the quality of care on a rating scale.

These physician reviewers could identify and explain both technical and non-technical aspects of care, and could rank these aspects of care using a set of ‘benchmark’ scores, ranging from very good care to very unsatisfactory care. For people with complex illnesses, the outcome is not always survival. However, structured explicit judgments can show how high quality care was provided, even if the patient has not survived. For example, there were a number of instances where explicit comments were made about the quality of non-technical care such as the way information was provided to patients and their relatives. Conversely, when poor care occurs, the method can identify the points at which care fails to meet expected standards, and when the situation can be, or is, rescued. It is interesting to note that in table 4 the proportions of those who died and had less than satisfactory care (about 20% of the cases) were similar to those who survived and had poor care.

During the training session, reviewers were encouraged to be as direct as possible in their commentaries, and in the results overall (tables 6 and 7) there were many more explicit comments than there were implicit comments. Nevertheless, when poor care was being described, while explicit comments predominated, there was a noteworthy proportion of implicit, B level, comments. Sometimes these B level comments were about documentation (which was not in the C category) or concerned missed tests which the reviewer listed and did not specifically make a judgement upon (eg, ‘No ABGs’; see table 2). It may be that in this case the reviewer felt that the result said it all and that an explicit comment was superfluous. On the other hand, it could also be that some reviewers might have felt uncomfortable about making direct comments about very poor care.

With the hindsight of these results, and when undertaking reviews such as this in health service settings, training should include discussion of an initial sample of commentaries and scores with each reviewer to assist in maximising the number of explicit comments. Of course, training might identify some reviewers who do not feel able to make explicit comments and so would not be suitable for this type of review.

The phase of care structure also contributes to an understanding of how care may vary, and at what point. Interestingly, a phase of care approach has also been used by Shannon and colleagues in a review of cardiac surgical care,18 albeit in a rather more structured system with distinct changes in physical settings. In the context of assessing whether death was a preventable outcome, Hogan et al8 used a four-phase model to identify adverse incidents: initial assessment, treatment plan, ongoing monitoring and preparation for discharge. Under the conditions of a service review, a three-phase model might be easier to manage, but either a three or four-phase approach would be appropriate.

Qualitative comments from the reviewers were useful in that they could succinctly identify what was done badly in poor cases. Such short explicit judgements could support a wider, more detailed service review to assess what could be improved in a particular setting or condition. Furthermore, since this structured review method assesses both process and outcome of care, this mixed type of review, using qualitative comments with scores, might be a useful addition to review measures which only assess outcomes or are criterion based. This mixed qualitative and criterion-based method is published in detail elsewhere.9

In this study, assessments of the quality and safety of the care provided showed that, for over 80% of the patients who died, care was rated at least satisfactory and, for approximately half of the cases, care was judged to be of high quality. The processes of care described enable a qualitative judgement to be associated with an objective score that is explicable to, and understandable by, a wide range of people and would also be understood by the public. However, having graded a case as poor or not, there is the added advantage that the structured comments also provide the reasoning behind the judgement in a format to which clinical teams and individuals should be able to respond in a review process.


In this study, the 40 reviewers were all volunteers who undertook the work in their own hospitals. Although there might be concerns about the impartiality of using internal review teams, results have shown that reviewers can make incisive short notes (commentaries) about quality of care, and can critically review care provided in their own hospitals.

Internal review teams have also been used in other settings. Sharek et al19 commented on the strong performance of hospital-based internal review teams, albeit when using more structured, criterion-based trigger tools to identify adverse events.

Although it could be argued that two reviewers per case might enhance the quality and depth of a case note review, there is some evidence to suggest that this use of a more intensive resource does not necessarily improve the review process. While we were able to show in our development study that there was reasonable coherence of quantitative care scores and criterion-based scores between physician reviewers,9 ,13 other work by Hofer and colleagues found that multiple reviewing of the same set of case notes did not enhance the results.20

Finally, it is important to recognise that there are limits to the extent to which the quantitative analysis of the reviews can be used. For example, averaging phase scores across each case, to determine whether phase score averages are similar to the overall care score, is not appropriate. An example of this can be found in online supplementary box S6 where care was judged excellent until moments before the patient died. The value of this current study is that the context and the basis for any quantitative score can be found in the phase of care comments associated with each score.


This method is a refinement on both global implicit judgement and structured implicit judgement used upon a set of case notes, because it is able to provide information on aspects of each phase of care, enabling more detailed, yet still brief, comments to show explicitly how care may vary or be consistent with expected standards. For example, this method could be used to identify whether care has led to a preventable death, or to identify good quality of care even though the overall outcome is failure to survive. Thus, although the study did not explicitly seek to judge a death as preventable, as did Hogan et al,8 review training could straightforwardly include an explicit judgement commentary about whether a death was preventable or was not preventable (which some of the study reviewers actually provided).

Results also show how explicit written judgements and quality of care scoring can be used together and thus may offer a range of case note review methods for use under differing circumstances, together with opportunities for providing training and assessment of ‘reviewer quality’.

Structured judgement review provides the framework for a quality of care review that can be used by clinical leaders and quality managers to identify potential priority areas for evaluation. For example, scoring allows for a screening of the overall care quality for a case overall, or can identify issues in a particular phase of care, say at admission or initial management. Explicit comments allow exploration of particular aspects of care, for instance where good treatment plans might be inadequately implemented. For these purposes it is not necessary to analyse whether comments are implicit or explicit. The data collection framework is straightforward, has been previously published and is easily available.9

Who should act as the reviewers? Because of the complexity of illness often presented in hospital settings, studies of adverse events have used experienced generalists with some specialist support.8 This structured implicit review method could be used in a similar way either with in-hospital teams or by visiting teams from other hospitals. We do not know whether the review results would be better when undertaken by experienced specialists rather than by the reviewers in our study. However, our results have shown that this form of review can be undertaken by specialists at a senior level in a training programme—so increasing the pool of trained senior reviewers in a hospital—and thus the method offers the opportunity for early review of the care of people who die in hospital so that, where necessary, timely quality improvement lessons can be learnt.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors AH: lead on the conception and design of the study, lead on the analysis of the qualitative mortality review data and principal author of all drafts of the paper; JEC, MP, AM, PAB: study conception; JEC, MP, AM, PAB, KLC: study design; JEC, KLC: data collection and analysis of mortality review data; MP: interpretation of mortality review data; AM: lead on the qualitative analysis framework; PAB: qualitative analysis framework and statistical analysis for the quantitative analysis; JEC, KLC, MP, AM, PAB: contributed to all drafts of the paper. All authors have given approval for this version of the paper to be published. AH acts as guarantor.

  • Funding This project was funded by the National Institute for Health Research Health Technology Assessment (NIHR HTA) Programme (project number RM03/JH08/AH) and was published in full in Health Technology Assessment 2010;14(10):1–170. The views and opinions expressed herein are those of the authors and do not necessarily reflect those of the HTA programme, NIHR, NHS or the Department of Health.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.