Article Text

Development and testing of an assessment instrument for the formative peer review of significant event analyses
  1. J McKay1,
  2. D J Murphy2,
  3. P Bowie3,
  4. M-L Schmuck3,
  5. M Lough2,
  6. K W Eva3
  1. 1Department of Postgraduate Medicine, University of Glasgow, Glasgow, UK
  2. 2NHS Education for Scotland, Glasgow, UK
  3. 3McMaster University, Hamilton, Ontario, Canada
  1. Correspondence to:
 Dr J McKay
 NHS Education for Scotland, Postgraduate General Practice Education, 2 Central Quay, 89 Hydepark Street, Glasgow G3 8BW, UK; john.mckay{at}


Aim: To establish the content validity and specific aspects of reliability for an assessment instrument designed to provide formative feedback to general practitioners (GPs) on the quality of their written analysis of a significant event.

Methods: Content validity was quantified by application of a content validity index. Reliability testing involved a nested design, with 5 cells, each containing 4 assessors, rating 20 unique significant event analysis (SEA) reports (10 each from experienced GPs and GPs in training) using the assessment instrument. The variance attributable to each identified variable in the study was established by analysis of variance. Generalisability theory was then used to investigate the instrument’s ability to discriminate among SEA reports.

Results: Content validity was demonstrated with at least 8 of 10 experts endorsing all 10 items of the assessment instrument. The overall G coefficient for the instrument was moderate to good (G>0.70), indicating that the instrument can provide consistent information on the standard achieved by the SEA report. There was moderate inter-rater reliability (G>0.60) when four raters were used to judge the quality of the SEA.

Conclusions: This study provides the first steps towards validating an instrument that can provide educational feedback to GPs on their analysis of significant events. The key area identified to improve instrument reliability is variation among peer assessors in their assessment of SEA reports. Further validity and reliability testing should be carried out to provide GPs, their appraisers and contractual bodies with a validated feedback instrument on this aspect of the general practice quality agenda.

  • CVI, content validity index
  • GP, general practitioner
  • SEA, significant event analysis

Statistics from

Significant event analysis (SEA) is a method of reflective learning that is strongly promoted as a mechanism for improving patient safety and healthcare risk in the UK.1 It typically involves an attempt to review in-depth an event identified as “significant” by any member of the healthcare team. Given the complexity and uncertainty in general medical practice, SEA may offer both an understanding of where care processes can fail patients and the means to implement systemic change in relatively non-bureaucratic organisations.2 The National Patient Safety Agency—a special health authority created to co-ordinate learning from patient safety incidents in the NHS—has recently recommended that primary care teams should analyse significant events as part of their safety culture (box 1).

Box 1: Significant event analysis and the link with patient safety

  • Significant event analysis (SEA) is a retrospective, qualitative clinical audit technique based on a synthesis of traditional case discussion and the principles underlying the critical incident technique.3

  • A significant event is defined as “any event thought by anyone in the team to be significant in the care of patients or the conduct of the practice”. This normally involves suboptimal practice, but could also be an example of excellent care.1

  • A typical event analysis involves a non-threatening, structured investigation (normally team-based) to establish why an event happened, to learn from it and to introduce change where necessary.

  • SEA has been recommended by the National Patient Safety Agency for the analysis of patient safety incidents in primary care, which have resulted in a “near miss” or low to moderate patient harm.4

  • SEA facilitates identification of reportable safety incidents to local health organisations or national reporting systems to enable learning and sharing among healthcare teams.4

  • SEA is arguably more acceptable and feasible as an investigation technique in general practice than more established methods such as root cause analysis, which require more extensive training, time commitment and expense.

Evidence of the ability of general practitioners (GPs) and others to verifiably undertake SEA effectively is limited.5–8 This is highly important because superficial or informal discussion of an event is unlikely to lead to understanding, learning and the implementation of necessary change.3,9

One method of informing on the quality of SEA is through external peer review. Peer review can be described as the critical evaluation of a specific aspect of a practitioner’s performance by professional colleagues, preferably achieved through use of a reliable and structured instrument.10,11 However, few peer assessment instruments have been evaluated sufficiently with regard to validity and reliability to justify their widespread use.12

In the west of Scotland region, a voluntary educational model for the external peer review of SEA reports has been available to all GPs as part of their continuing professional development since 1998.5–8,13 This involves a submitted written report being sent to two trained GP assessors, chosen from a group of 20, who independently review it using a structured assessment instrument and provide educational feedback.13

Given the perceived importance of the SEA technique to the patient safety agenda,4,14 the development of a valid and reliable assessment instrument with which to facilitate the educational peer review of SEA would be highly desirable. In this way, a professional judgement could be made on the quality of the event analysis in question, and a formative feedback provided for consideration. Raising the standard of event analyses undertaken by GPs and their teams creates a clear potential to further enhance learning and the quality of patient care.

This study was undertaken to establish the content validity of a new peer assessment instrument, elucidate aspects of its reliability and investigate possible subsample differences, which would be relevant for generalising to a wider population of GPs.


Content validity

The developmental stage to assimilate the proposed items for the instrument was carried out independently by three of the authors (JM, PB, DJM). This work was informed by previous focus group interviews with the west of Scotland Audit Development Group.15 These discussions used Marinker’s six essential steps in formulating an enquiry into a significant event (REPOSE) to identify a set of items and domains that could be applied to a selection of events considered “significant” by the group.16 Agreement was reached on four criteria considered “essential” for assessment of a significant event analysis.15 Together with previous research,1,9 these criteria were developed to generate relevant domains and items. These were discussed by the three authors until consensus was achieved on the items to be included in a content validity exercise.

The proposed instrument consisted of 10 items each rated on a 7-point adjectival scale, with anchor points ranging from absent to excellent (see supplementary appendix, available at This was sent to 10 GP experts, identified as being well informed in SEA because they were experienced peer assessors or had published on SEA in peer-reviewed journals.

The relevance and appropriateness of each item was then assessed by asking the experts to rate each item and the instrument as a whole using a 4-point scale to create a content validity index (CVI). In all, 8 out of 10 experts were required to endorse each item by assigning a rating of at least 3 out of 4, to establish content validity beyond the 0.05 level of significance.17 This was determined to provide sufficient evidence for inclusion of each item as part of the final instrument. Experts were also asked to identify any missing items that they deemed important for inclusion when considering the quality of a SEA report.

Reliability testing

Participants and assessment exercise

The proposed instrument was introduced on a training day to the west of Scotland Audit Development Group from which all the peer assessors are drawn (box 2). The role of the assessors and any clarification points around using the instrument were discussed. Further issues raised by assessors were to be emailed to the authors as they arose, or discussed at three-monthly follow-up meetings.

Box 2: Characteristics of west of Scotland Audit Development Group

  • 20 principals in general practice with a minimum of 8 years experience, trained in peer review.

  • All have a minimum of 5 years experience as peer reviewers of criterion audit and significant event analysis reports for continuing professional development and summative assessment.

  • 18 (90%) are members or fellows of the Royal College of general practitioners (GPs).

  • 2 (10%) are GP appraisers.

  • 10 (50%) are GP registrar trainers.

  • A further 3 (15%) have other general practice educational roles (eg, associate adviser, undergraduate tutor).

All 20 assessors took part in a reliability marking exercise. A nested design consisting of five cells, each with four raters, was used. Members of each cell marked 20 separate SEA reports, unique to that cell, using the proposed new assessment instrument. The exercise was repeated after 1 month, with the raters in each cell marking the same unique 20 SEA reports. The 20 SEA reports for each cell consisted of 10 submitted by GP principals (experienced doctors) and 10 from GP registrars (doctors-in-training).

Data analysis

A repeated-measures analysis of variance was undertaken using BMDP software, and analysed to establish the variance attributable to each study variable (SEA reports, n = 100, 20 per cell; raters, n = 20, 4 per cell; time, n = 2; items, n = 10). Generalisability theory (G theory), a statistical technique for determining the extent to which ratings consistently discriminate between subjects of measurement (ie, determines the reliability of observations), was used to investigate the instrument’s ability to differentiate the quality of SEA reports.18 The internal consistency (a measure of item homogeneity), intra-rater reliability (agreement within rater across occasions) and inter-rater reliability (agreement among raters) were all calculated. These statistics range from 0 to 1, with 1 indicating perfect reliability.

To avoid the potential of artificially inflating the heterogeneity of the sample (and hence the reliability), we report separate analyses on the SEA reports provided by the GP principals and GP registrars.


Content validity

At least 8 out of 10 experts endorsed all 10 items listed in supplementary appendix (available online at and the overall instrument, indicating a statistically significant proportion of agreement regarding the content validity of the assessment instrument (p<0.05). No additional items were identified for inclusion.


The G coefficients obtained for the overall test reliability, internal consistency and inter/intra-rater reliability values for the instrument when used to assess SEA reports are shown in table 1 for GP principals and in table 2 for GP registrars.

Table 1

 Calculated reliability coefficients for general practitioner principals’ significant event analysis reports marked using the peer review instrument (expressed with 95% CI)

Table 2

 Calculated reliability coefficients for general practitioner registrars’ significant event analysis reports marked using the peer review instrument (expressed with 95% CI)

The internal consistency of the instrument was high when averaged over all items for both GP principals (G = 0.94) and GP registrars (G = 0.89). This indicates that the items included in the instrument are correlated with one another to a sufficient extent. Item reliability of a single item is low, however, indicating that no one item should be deemed a reliable indicator of SEA quality.

The high intra-rater coefficients for SEA reports undertaken by GP principals (0.78) and GP registrars (0.71) suggest that individual assessors’ opinions regarding the quality of each SEA report are reasonably stable over time.

The moderate G coefficients for inter-rater reliability, assessed using the average of scores provided by all four raters, for both GP principals (0.64) and GP registrars (0.6), indicate that there may be room for future calibration of assessors to ensure that consistent feedback is provided. Decision study analyses suggest that 10 raters are required for the average score to achieve an inter-rater reliability of G>0.8.

The correlation between the global rating scale and the sum of the nine specific items was strong (r = 0.87 and 0.90 for GPs and registrars, respectively). A comparison of the mean scores between GP principals’ and GP registrars’ SEA reports is shown in table 3 and demonstrates no difference between the two groups.

Table 3

 Comparison of the mean scores between general practitioner principals’ and general practitioner registrars’ significant event analysis reports


This study demonstrates that the content validity and reliability of the assessment instrument are adequate, providing the first steps towards validating an instrument for providing educational feedback to GPs on the quality of their written SEA reports. The findings highlight specific areas that could improve instrument reliability, with the key area being variation among peer assessors in their assessment of SEA reports. Consistent with previous research,8 no difference was found in the quality ratings assigned to SEA reports completed by GP principals or GP registrars.

Limitations of the study

Validity testing

This instrument has been developed by GPs and so is doctor-centred, despite the frequent team involvement in significant events and their analyses.1 Our “expert” raters were simply well-informed individuals as the number of individuals with sufficient knowledge and experience to be deemed truly an expert is limited (and it must be acknowledged, poorly defined).19,20 The CVI exercise was adequate, but a different approach such as the Delphi technique may have added more depth to the process.


The significant events chosen for peer review were self-selected. The finding that most SEA reports were rated as having a global score of ⩾4 may indicate a bias towards submission of reports with which the submitting doctor feels comfortable.13 The impact of this limitation, however, should have worked against the observation of sufficient reliability.

In addition, it should be noted that the raters were individuals with extensive experience with SEA who had considerable opportunity to discuss how to interpret the rating task. Further study is required to determine whether or not similar findings would be achieved with less experienced raters. In addition, although the instrument is designed to provide written as well as numerical feedback, we analysed only numerical data. For a formative instrument, written feedback may be at least as important to the submitting doctor. This aspect of the instrument therefore requires its own separate evaluation.

SEA reports

Finally, we recognise that the SEA report content is merely a proxy indicator for what actually happened or was decided in practice. Personal and recall bias in addition to problems of understanding, interpretation and judgement may influence what is reported. An individual’s ability to articulate the event analysis in writing may also be a factor.


There is no universally agreed method for the analysis of significant events. Our instrument mirrors previously suggested approaches,1,4,15 but is unique in providing written feedback by peers. A strength of this instrument is that it is for use in the workplace, and has been tested using events taking place as a result of actual experience. Systems to improve patient safety have been difficult to implement in primary care. Using an instrument that is based on educational theory and research methods—as opposed to simply applying one based on intuition—provides an element of scientific rigour when applied in this patient safety context. This should add to the potential attractiveness and relevance of the instrument and, therefore, to its impact.

The study demonstrated content validity, but further work is required to confirm the overall instrument validity. The high G coefficients observed indicate that the domains and items are inter-related, and the CVI indicates that our judges considered the questions to be relevant, providing the first steps towards enhancing the assessment of significant event analyses.

Context specificity was not considered, so the instrument cannot currently be claimed to be useful for assessing a GP’s proficiency in applying the SEA technique. The purpose of this instrument is to facilitate educational feedback on the merits and drawbacks of individual SEA reports. There is increasing recognition that professional self-regulation should not rely on unguided self-assessments for the improvement of practice.21,22 It is hoped that GPs would find feedback provided by external assessors using this form helpful in highlighting particular issues that could further improve their analysis, thus enhancing the quality or standard of future event analyses and, in turn, the safety of the GPs’ patients.

The largest degree of instrument error when providing feedback is the variation among peer assessors. This is a common difficulty for assessment instruments.23,24 The moderately large G coefficients for intra-rater reliability imply a reasonable degree of instrument stability when used by individual peer reviewers to assess reports at different points of time. The lower inter-rater reliability is more likely, therefore, to be related to calibration issues among the assessors rather than to the robustness of the instrument. Further training of assessors or the continued use of multiple assessors when evaluating each SEA is necessary. This is particularly important if the instrument is to be used by other professional colleagues in different clinical settings.

An ideal educational tool would be “supportive and individualised, yet uniformly applied”.25 This is especially relevant, given the role of SEA in patient safety. A successful formative instrument should, therefore, give information via interpretable numerical scores and written comments, and should be used in conjunction with facilitated feedback.26 Our model fits with both concepts because it promotes self-directed (and team-directed) reflective learning and provides written peer feedback.

SEA is part of GP appraisal in NHS Scotland,27 the GMS contract in the UK,28 and has been proposed as a component of revalidation.29 However, uniform guidance on how it should be applied and monitored is lacking. Participation in our SEA model may demonstrate to patients, appraisers and healthcare organisations the willingness of the GPs to submit aspects of their own work for external review as part of an educational process.14 This would confirm that the GP is verifiably reflecting on how patient care can be improved as part of the clinical governance agenda.

Future work

The study findings justify further development of the instrument, particularly to widen validity testing, calibrate assessors and investigate the educational impact on patient safety.


We thank Dr J Stead, Exeter, Professor M Pringle, Nottingham, Professor G Elwyn, Swansea, Professor C Bradley, Cork, and Members of the west of Scotland Audit Development Group for their input into the development of the content of the peer review instrument. We also thank the west of Scotland Audit Development Group for their work on the reliability testing of the instrument.


View Abstract

Supplementary materials


  • Funding: NHS Education for Scotland.

  • Competing interests: None.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.