Intended for healthcare professionals

Analysis And Comment Health policy

Have targets improved performance in the English NHS?

BMJ 2006; 332 doi: https://doi.org/10.1136/bmj.332.7538.419 (Published 16 February 2006) Cite this as: BMJ 2006;332:419
  1. Gwyn Bevan, professor of management science (R.G.Bevan{at}lse.ac.uk)1,
  2. Christopher Hood, Gladstone professor of government2
  1. 1 Department of Operational Research, London School of Economics and Political Science, London WC2A 2AE
  2. 2 All Souls College, University of Oxford, Oxford OX1 4AL
  1. Correspondence to: G Bevan
  • Accepted 17 November 2005

The star rating system for NHS trusts seems to have improved performance, but we still don't know how genuine the improvements are or the costs to other services

Annual performance ratings have been published for NHS trusts in England since 2001, and the fifth and final set was published in July 2005.16 This process of naming and shaming gave each trust a rating from zero to three stars. Trusts that failed against a small number of key targets were at risk of being zero rated and their chief executives at risk of losing their job; trusts that performed well achieved three stars and were eligible for benefits from “earned autonomy.”7 Although the government has abandoned the star ratings, targets are likely to remain. We consider reported improvements in performance against key targets, problems of the system, and what ought to happen in the future.

Reported improvements in performance

We compared data on performance in England before and after the star rating system for three key targets. When data were available we also compared English data with that of other UK countries that did not adopt the star system.

Accident and emergency departments

The key target for accident and emergency departments was the percentage of patients to be seen within four hours. From March 2003, the target was 90%,3 5 and from January 2005 this increased to 98%.6 The National Audit Office reported that in England, in 2002, 23% of patients spent over four hours in accident and emergency, but in the three months from April to June 2004 only 5.3% stayed that longw1; this increased patient satisfaction and was achieved despite increasing use of emergency services.

Ambulance category A calls

England has had a target for category A calls (life threatening emergencies) since 1996, before star ratings were applied to ambulance trusts. The target was that that at least 75% of calls be met within 8 minutesw2; this became a key target from the end of 2002, when star ratings applied to ambulance trusts.2 3 5 6 About 30 trusts have provided ambulance services. Comparable data are available for 17 trusts for the two years before, and the four years during which, star ratings applied.w3 For the year ending in March 2000, only one trust had response rates above 75% and two trusts had rates lower than 40%. Reported performance improved greatly after ambulance trusts were star rated. For the year ending in March 2005, 14 trusts exceeded the target and the worst performer achieved 71%.


Embedded Image

The Welsh Ambulance Service NHS Trust also had the target of responding to 75% of category A calls within 8 minutes by the end of 2001.w4 Response rates, however, remained at about 50% between 2001 and 2004.w5

First elective hospital admission

The key target for first elective hospital admission was the maximum wait: this was 18 months by the end of March 2001,1 15 months by 2002,2 12 months by 2003,3 and 9 months by 2004.5 6 The numbers of patients waiting more than 12 and 9 months in England at the end of March 1998 were reported to be 67 000 and 185 000, but by the end of March 2005, only 24 were reported to be waiting more than 12 months and 41 more than 9 months.w6

Table 1 gives the percentages of patients waiting for more than six and 12 months at the end of March from 1999 to 2005 for England, Wales, and Northern Ireland. From 2001 to 2003, reported performance improved in England but deteriorated in Wales and Northern Ireland. After that, however, reported performance improved in all countries, dramatically in Wales and Northern Ireland. This suggests that the policy of naming and shaming in England put pressure on the NHS in the other countries.

Table 1

Percentages of patients on NHS hospital waiting lists waiting longer than six or 12 months, 1999-2005

View this table:

Problems with targets

Star ratings have been criticised for their similarities to the target regime of the former Soviet Union, although NHS managers were threatened with loss of their jobs rather than their life or liberty.7 8 The Soviet target regime seemed to produce substantial improvements in the 1930s but was recognised to have serious problems from the 1950s and collapsed in the 1990s.9 In May 2005, during the British general election campaign, the prime minister was apparently non-plussed by a complaint made during a televised question session that pressure to meet the key target that 100% of patients be offered an appointment to see a general practitioner within two working days6 had meant that many general practices refused to book any appointments more than two days in advance.w7 A survey of patients found that 30% reported that their general practice did not allow them to make a doctor's appointment three or more working days in advance.w8 Many saw the perverse outcome of a key target that was intended to improve access to general practitioners as a reason for abandoning the system of targets and star ratings.

Regulation by targets assumes that priorities can be targeted, the part that is measured can stand for the whole, and what is omitted does not matter. But most indicators of healthcare performance are “tin openers rather than dials… they do not give answers but prompt investigation and inquiry, and by themselves provide an incomplete and inaccurate picture.”10 Hence, typically for defined priorities there will be a few good measures (“dials,” such as waiting times); a larger group of imperfect measures (“tin openers,” such as mortality), the use of which is liable to generate false positive and false negative results; and an even larger group for which no usable data are available (which applies to the clinical quality of much of health care10). This last group was the cause of the neglect of quality in the Soviet regime, which was widely claimed to be an endemic problem from Stalin to Gorbachev.9

The use of targets results in gaming,79 1113 which means that when reported performance meets the targets, neither government nor the public can distinguish between the following four outcomes:

  • All is well; performance has been exactly as desired in all domains(whether measured or not)

  • The organisation's performance has been as desired wher performance was measured but at the expense of unacceptably poor performance in the domains where performance was not measured

  • Although reported performance against targets seems to be fine, actions have been at variance with the substantive goals behind those targets (hitting the target and missing the point)

  • Targets have not been met, but this has been concealed by ambiguity in the way data are reported or outright fabrication.

Table 2 presents evidence that these problems have occurred in the three key targets discussed above. Although we have no evidence of poor performance in other domains in response to the target for inpatient waiting times, this type of gaming was reported for the target for new outpatient waiting times. Ophthalmology services in Bristol met that target by cancelling and delaying follow-up outpatient appointments (which had no target) and, as a consequence, at least 25 patients were estimated to have lost their vision over two years.13 The Audit Commission's last report based on spot checks of the quality of data in 55 trusts concluded that the scale of reporting errors identified did not undermine the reliability of overall trends reported nationally.14 But questions remain over the extent to which improvements in targeted performance in the English NHS were undermined by other types of gaming and whether similar problems underlie the big reductions in long waiting times reported in Wales and Northern Ireland in 2004 and 2005.

Table 2

Evidence of gaming in response to three type of targets

View this table:

What next?

Nobody would want to return to the NHS performance before the introduction of targets, with over 20% of patients spending more than four hours in accident and emergency and patients waiting more than 18 months for electiveadmission. And attempts to improve performance without the star system in Wales were criticised by the auditor general for Wales for having “provided neither strong incentives nor sanctions to improve waiting time performance” and were widely perceived to have rewarded organisations thatfailed to deliver on waiting times.17 So how can we maximise the social benefits and minimise the costs of a regime of targets with sanctions?

We suggest two remedies. One, for which we have argued earlier,18 is to introduce more uncertainty in the way that performance will be assessed and thus make some kinds of managerial gaming more difficult. A second is to remedy the continuing lack of coherent systematic auditing of performance data of the healthcare system in England. Despite the heavy regulatory burden from auditors and assessors of various kinds, if anything the audit hole is getting bigger. Current proposals for assessing performance seem to favour reliance on statistical data to assess the robustness of performance data19 rather than regular visits by the Commission for Health Improvement, which uncovered gaming practices.15 16 In addition, responsibility for auditing the quality of data in the English NHS has been transferred from the Audit Commission to the Healthcare Commission, which has no presence on the ground in NHS provider units.14

We need an independent body that approximates to the Office of Performance Data advocated by Robert Behn.20 Such a body would investigate the genuineness of reported improvements in healthcare performance and whether improvements are achieved at the cost of what cannot be easily measured. Although these changes would not wholly eliminate the gaming problems associated with any regime of targets and terror, they could reduce them. The current combination of performance measures that are highly predictable to managers and an audit system that is poorly equipped to detect gaming systematically, risks losing credibility and the prospect of even more awkward questions being asked in the next general election campaign.

Summary points

The star rating system for English NHS trusts has improved reported performance on key targets

The effect on services excluded from star ratings is unclear

In some cases data have been manipulated to achieve targets

Systems need to be put in place to minimise gaming to meet targets and ensure targets are not causing unwanted effects elsewhere

Footnotes

  • Embedded ImageReferences w1-w12 and sources of data are onbmj.com

    We thank those who helped us identify comparable statistics with England for Wales and Northern Ireland and explained that no such statistics are available for Scotland. Also thanks to Olly Bevan for assembling the statistical material.

  • Contributors and sources The evidence and ideas for this paper come from GB's involvement in the development of NHS star ratings in England and CH's extensive research into regulation by governments in various sectors. The article is based on numerous presentations by both authors. GB did the analysis of the impact and evidence of gaming and wrote the first draft. The concepts underlying the paper were developed jointly. CH contributed to revisions of the paper.

  • Competing interests GB was director of the office for information on healthcare performance at the Commission for Health Improvement until September 2003.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
View Abstract