Article Text


The value of short and simple measures to assess outcomes for patients of total hip replacement surgery
  1. Ray Fitzpatrick, professor of public health and primary care ,
  2. Richard Morris, senior lecturer in medical statistics ,
  3. Shakoor Hajat, research statistician ,
  4. Barnaby Reeves, director,
  5. David W Murray, consultant orthopaedic surgeon ,
  6. David Hannen, research assistant,
  7. Marianne Rigge, director,
  8. Olwen Williams, consultant in clinical effectiveness ,
  9. Paul Gregg, professor of orthopaedic surgery
  1. Institute of Health Sciences, University of Oxford, Headington, Oxford OX3 7LF, UK
  2. Department of Primary Care & Population Sciences,Royal Free and University College Medical School, London NW3 2PF, UK
  3. Clinical Effectiveness Unit, Royal College of Surgeons, London WC2A 3PN, UK
  4. Nuffield Orthopaedic Hospital, Oxford OX3 7LD, UK
  5. College of Health, London E2 9PL, UK
  6. Public Health Research Unit, Addenbrooke's Hospital, Cambridge CB2 2SP, UK
  7. Trauma & Orthopaedic Surgery, University of Newcastle, Newcastle upon Tyne NE2 4HH, UK
  1. Professor R Fitzpatrick raymond.fitzpatrick{at}


Objectives—To evaluate the performance of a patient assessed outcome measure, the Oxford Hip Score, in a national study of primary hip replacement surgery.

Design—A survey of patients' health status before undergoing primary hip replacement surgery and three months and one year after surgery.

Setting—143 hospitals in three NHS English regions.

Patients—7151 patients admitted for primary total hip replacement surgery over a period of 13 months from September 1996.

Main measures—For patients, Oxford Hip Score and satisfaction with hip replacement and, for surgeons, American Anesthiologists' Society (ASA) classification of physical status.

Results—The response rates to the postal questionnaire at three and 12 months follow up were 85.2% and 80.7%, respectively. Including all three administrations of the questionnaire, all except two items of the Oxford Hip Score were completed by 97% or more respondents and only one item at one administration appeared marginally to reduce the reliability of the score. The effect sizes for changes in the score from baseline to three months was 2.50 and to 12 months was 3.05. Patients rated by surgeons as being healthy preoperatively by the ASA classification were somewhat more likely to return a completed questionnaire at three months (79.4% versus 75.3%) and 12 months (72.4% versus 70.3%) than those rated as having poorer health.

Conclusions—Overall there was little evidence of difficulties for patients in completing the Oxford Hip Score or of unreliable data, except in relation to one questionnaire item. The instrument was very responsive to change over time and score changes for the Oxford Hip Score related well to patients' satisfaction with their surgery. The instrument is an appropriate measure in terms of validity, responsiveness, and feasibility for evaluating total hip replacement from the perspective of the patient.

  • total hip replacement surgery
  • Oxford Hip Score
  • questionnaire
  • assessment

Statistics from

Primary total hip replacement surgery is an effective and very commonly performed procedure to reduce pain and improve physical function in patients with arthritis. Nevertheless, it is widely recognised that the quality of evidence to inform surgical choices—for example, with regard to type of prosthesis used—is modest, with few randomised trials or other forms of well designed studies to evaluate variations in practice.12 Two recently published NHS R&D systematic reviews of primary total hip replacement concurred in evaluating the quality of evidence in this field as poor, and specifically argued for the need for greater use of validated patient assessed outcome measures to be incorporated into future evaluations of total hip replacement.34 To date, orthopaedic evidence on effectiveness has relied excessively on indicators such as whether revision of surgery occurs, a potentially misleading approach to outcomes since patients whose hip replacements are judged to have “failed” do not always receive revision surgery.5

A number of patient assessed outcome measures have recently been developed in the form of questionnaires for use in evaluative health research, assessing issues such as functional status and health related quality of life. Several such measures have been applied to total hip replacement.67 Because of the need to measure specific forms of pain and the problems of mobility in older patients who may experience substantial symptoms for other reasons than hip disease, the Oxford Hip Score was developed specifically to assess the patient's perceptions of pain, mobility, and function in relation to problems of the hip.8

An opportunity to examine the usefulness of the Oxford Hip Score was provided by the National Total Hip Replacement Project (NTHRP). This study, coordinated by the Clinical Effectiveness Unit of the Royal College of Surgeons and carried out in three English health regions, is the first attempt to describe in detail current practice in primary total hip replacement in England on a large sample of patients combining evidence from both surgeons and patients. The Oxford Hip Score was selected as the primary method to assess patients' perceptions of their hip problems before surgery and their outcomes at three and 12 months postoperatively.

The emergence of patient based outcome measures has led to concerns that overly simple standardised questionnaires may fail to capture matters of importance to patients and therefore produce misleading evidence of outcomes of health care interventions.9 The need for more elaborate and detailed methods of eliciting patients' experiences has also been expressed specifically in the field of orthopaedics.1011 It has also been argued that patients can have difficulties completing questionnaires if their simple and standardised format fails to address nuances and complexities of personal experience.12 This paper examines the measurement properties of the Oxford Hip Score in the context of a large study of outcomes of primary total hip replacement. Specifically, the paper examines (1) the extent of patients' difficulties in completing the Oxford Hip Score as evidenced by the response rate for the questionnaire and by the frequency of missing values and unreliable data; (2) the sensitivity to change (responsiveness) of the Oxford Hip Score; and (3) some potential biases from this method of assessing outcome.


A total of 390 consultant firms from 143 hospitals in three English health regions agreed to participate in a national study of NHS and private primary total hip replacement, recruitment for which began in September 1996 for NHS patients and in October 1996 for private patients, continuing for one year in both groups. The design of the study required the surgeon to complete a questionnaire for each operation performed for primary total hip replacement. The patient was invited to complete a questionnaire at a point during their hospital stay before surgery. This questionnaire also asked for permission to send a postal follow up questionnaire three months and one year after surgery. Regional research coordinators collated patients' and surgeons' questionnaires and returned them to the Clinical Effectiveness Unit (CEU), Royal College of Surgeons, London. The group of patients whose preoperative questionnaires were incorporated into the central database of the CEU were subsequently followed up by post from the CEU.

The surgeon's questionnaire collected data on details of surgical approach, type of prosthesis, cement, anaesthesia, thromboprophylaxis, and a rating of the patient's overall health status by means of the American Society of Anesthiologists' classification of physical status (ASA score).13 Questionnaires from both surgeon and patient were obtained for 5038 cases.

The patient's preoperative questionnaire included the 12 item Oxford Hip Score, questions about other major health problems, pain in other joints, and duration of outpatient and inpatient wait. The three and 12 month follow up questionnaires included the Oxford Hip Score and other questions about further admission for problems or complications of their hips, satisfaction with hospital care and with results of their hip operation. For non-respondents at both follow up surveys one reminder was sent at two weeks and a further reminder together with another copy of the questionnaire was sent two weeks later.

The Oxford Hip Score is a 12 item self-completed questionnaire addressing the patient's perceptions of pain and disability arising from their hip. It is intended to be used as a single summed score with the total score reflecting the severity of problems that the respondent has with his or her hip. The range of scores is from 12 to 60 with a high score indicating a poor perceived state of health. From prospective evidence of patients before and six months after hip replacement surgery it has been shown to have very satisfactory reproducibility and has been validated with reference to arthritis-specific and generic health status measures and the surgeon's assessment in terms of the Charnley Hip Score.8

Items in a summed score constituting a scale should be internally consistent, as measured by Cronbach's alpha, with higher values of alpha reflecting higher reliability. The alpha value should normally be above 0.70.14 In the current study internal reliability was assessed for all three administrations by Cronbach's alpha for the score as a whole and when individual items were removed to examine their effect on reliability. Effect sizes for the Oxford Hip Score were calculated to assess responsiveness as the difference between the mean preoperative and follow up scores (three and 12 months) divided by the standard deviation (SD) of the preoperative scores.15


A total of 7151 completed preoperative questionnaires were returned to the CEU. The mean (SD) age of the patients was 67.8 (10.9) years; 4284 (61.4%) of those who reported their sex were women and 4285 (87.5%) of patients whose diagnosis was available had osteoarthritis.

A total of 6174 patients replied to the follow up questionnaire at three months. In order to estimate the three month mortality rate following primary total hip replacement, the vital status of all patients in the baseline sample was ascertained from the National Office of Statistics. This process identified 80 patients as having died before the three month follow up. When these were removed from the denominator a response rate to the questionnaire of 85.2% at three months was obtained. In response to the 12 month follow up 5854 questionnaires were returned. It was not possible to ascertain further deaths so the response rate (80.7%) was calculated from the denominator of those returning preoperative questionnaires with those who died in the first three months again removed.

An analysis was performed of the extent of missing values for items of the Oxford Hip Score for the three administrations (preoperative, three months and 12 months). Most of the questionnaires returned had no missing items from the Oxford Hip Score (91.9% at preoperative assessment, 88.3% at three month follow up, and 87.3% at 12 month follow up). At all three assessments, for respondents whose questionnaires were incomplete, most commonly only one item was left unanswered. When analysed by individual items of the Oxford Hip Score, fewer than 3% of respondents omitted any item. The only exceptions were item 6 (distance can walk before pain severe) which was left blank by 3.2% and 4.1% of respondents at three and 12 months, respectively, and item 9 (limping when walking) which was left incomplete by 3.1% at 12 months (table 1).

Table 1

Missing values for the Oxford Hip Score at three administrations

Internal reliability in terms of Cronbach's alpha for the three administrations was found to be 0.86 (preoperative), 0.90 (three month follow up), and 0.92 (12 month follow up). An analysis was performed to see whether omitting any item improved internal reliability. At preoperative assessment the Cronbach's alpha score improved only when item 6 was omitted from the questionnaire, producing a very modest improvement to 0.88. At the two follow up assessments no omission of an item improved Cronbach's alpha.

The responsiveness of the Oxford Hip Score was examined by the direction and extent of change scores and effect sizes. The mean (SD) preoperative score for patients was 44.5 (7.5). By the three month follow up the score had improved to 25.7 (9.3), an effect size of 2.5. Further improvement from the three month scores was indicated at 12 months with a mean (SD) score of 21.5 (9.0), an effect size of 3.1.

The responsiveness of the Oxford Hip Score was also examined by relating change scores on the instrument to patients' global satisfaction with their hip replacement expressed at three and 12 months follow up. Change scores for the Oxford Hip Score between preoperative assessment and three month follow up were larger for the 4537 satisfied patients than for the 418 patients who were not satisfied (19.7 (9.5) versus 6.8 (8.8), 12.9 points difference (95% CI 12.0 to 13.8), p<0.001). Similarly, those who at 12 months were satisfied with their hip replacement (n = 4141) had larger change scores than the rest (n = 476) for the difference between preoperative and 12 month assessments (24.0 (8.7) versus 9.9 (9.3), 14.1 points difference (95% CI 13.3 to 15.0), p<0.001). Improvement in the Oxford Hip Score between three and 12 months was also larger for those satisfied at 12 months with their hip replacement (n = 3776) than the other 428 patients (4.1 (6.5) versus 0.05 (8.3), 4.1 points difference (95% CI 3.3 to 4.9), p<0.001).

Potential sources of bias were examined in relation to the Oxford Hip Score arising from incomplete evidence being provided by older patients or those with poorer general health status. From the denominator of patients who returned a questionnaire, those who completed every item were somewhat younger than those who omitted at least one item (mean (SD) age 67.7 (11.0) versus 69.9 (10.3), difference 2.3 (95% CI 3.4 to 1.1), p<0.001). Also, patients whose general health status was more favourable—that is, fit and healthy with no co-morbidity as rated by the surgeon on the ASA score—were slightly more likely to complete the questionnaire fully than those rated as having minor or severe medical problems, whether preoperatively (2205/2362 (93.4%) versus 2113/2313 (91.4%), difference 2% (95% CI 0.5 to 3.5), p<0.01), at three month follow up (91.2% versus 86.1%, difference 5.1% (95% CI 3.2 to 7.1), p<0.0001), or at 12 month follow up (89.5% versus 85.9%, difference 3.6% (95% CI 1.6 to 5.7), p<0.001).

Older patients were somewhat less likely to return a questionnaire at all before surgery; the mean age of those who returned a questionnaire was 67.8 (10.9) years compared with 69.1 (11.5) for those not returning a questionnaire (difference 1.3 (95% CI 0.8 to 1.7), p<0.001). However, the association between older age and non-return of questionnaires was not observed for the three and 12 month follow up assessments. Similarly, patients rated fit and healthy on the ASA score were more likely to return a questionnaire before surgery than those rated as having minor or severe medical problems (2862/4973 (57.6%) versus 2313/4675 (49.5%), difference 8% (95% CI 6.0 to 10). No significant differences between patients grouped by ASA score were observed for rate of return at three month and 12 month follow up. Age and health status were in turn found to be weakly associated with outcomes on the Oxford Hip Score. Older patients were somewhat more likely to report poorer scores at three month and 12 month follow up (r = 0.06 and 0.11, respectively, both p<0.01). Similarly, at three month follow up those who were rated fit and healthy on the ASA score preoperatively reported a mean (SD) Oxford Hip Score of 25.0 (9.3) compared with 26.5 (9.4) reported by those with minor or severe medical problems (difference –1.5 (95% CI –2.1 to –0.9), p<0.001). At 12 months follow up those rated fit and healthy on the ASA score reported a mean score of 20.5 (8.8) compared with 22.8 (9.3) for those with minor or severe medical problems (difference –2.3 (95% CI –2.9 to –1.7), p<0.001).


The use of outcome measures focused on patients' perceptions to evaluate health care is still relatively novel and therefore requires careful appraisal. This study provides an assessment of the use of such a measure, the Oxford Hip Score, in a pragmatic survey recruiting patients across a diverse range of 143 hospitals in three NHS regions. The study firstly provides several kinds of evidence of the extent of difficulties that the questionnaire may present. The response rates for the questionnaire in two waves of follow up postal survey at three and 12 months after discharge (85.2% and 80.7%, respectively) are very satisfactory for this method of administration.14 It is not possible to disentangle the effects of the Oxford Hip Score on the response rate from other questionnaire items included in the two postal surveys. Other analyses therefore considered the amount of missing data in the returned questionnaires and the contribution to the reliability of each item of the questionnaire. It is extremely encouraging that between 87% and 92% of returned questionnaires in the three waves of the study were returned with no missing data. Questionnaire items were filled out by at least 97% of respondents for most items; only one item (question 6 about distance respondent can walk without severe pain) produced markedly different levels of missing data, but even for this item only 3% and 4% of respondents failed to complete it in the postal surveys.

The analysis of reliability showed that, at all three administrations, the reliability of the instrument was affected by only one item. A qualitative study by McMurray and colleagues suggested that difficulties in response to this item may be produced by a lack of clarity in the response categories.12

There are no universally agreed criteria for assessing responsiveness of instruments, although most approaches in some way assess the degree of intra-individual change over time observed by an instrument in patients expected to experience change.16 The problem with all such approaches is that it is difficult to be precise about the extent of expected change. The Oxford Hip Score proved highly sensitive to change (responsive) in the study; patients reported major improvements in pain and function between preoperative assessment and three month postoperative follow up, with effect sizes comparable to other studies of outcomes of total hip replacement.17 The instrument also provides evidence of the further improvements that are believed to occur in the course of the rest of the first year after surgery. Furthermore, these improvements were consistently and significantly associated with patients' more direct global judgements of satisfaction with their hip replacement.

Evidence was obtained of small potential biases from this approach to assessing outcomes. Older and less healthy patients were somewhat less likely to complete the Oxford Hip Score. This is a potential problem found with patient assessed outcome measures more generally.18 This evidence reinforces the need for short instruments that minimise the burden to patients of assessing outcomes.

The performance of the Oxford Hip Score needs to be compared with available measures such as the widely used SF-36. In a direct comparison of the Oxford Hip Score and SF-36, both completed by the same series of patients, the Oxford Hip Score resulted in a higher completion rate and higher responsiveness.19 This is consistent with other evidence that older respondents have difficulties with the SF-36.20 The SF-36 also assesses broad aspects of pain that may be difficult to relate to hip problems.21

A central aspect of appraising measures such as the Oxford Hip Score is whether they prove useful in detecting differences between patients that are relevant to evaluating health care. Variations in outcome are not expected to occur between patients who have received different forms of total hip replacement—for example, different kinds of prosthesis—until at least five years after surgery.4 It is therefore premature to judge the Oxford Hip Score in this respect in the National Hip Replacement Project. However, evidence has already been obtained in other applications of the instrument to indicate that it detects significant differences in the threshold of pain and disability at which private and NHS patients receive total hip replacement surgery and also significant differences in the outcomes of primary compared with revision surgery.1922 The Oxford Hip Score is intended for use in any context based on samples of patients such as a randomised controlled trial or well designed observational surveys or audits of orthopaedic surgery where it is possible to take account of potential confounding factors. It is not intended for use in decision making regarding individual patients.

Critiques of the excessive simplicity of patient assessed outcome measures, especially shorter instruments, are effectively emphasising one aspect of their measurement properties—namely, validity. They argue that, given more time and more in depth questioning, patients are capable of providing more detailed information about their health status and perceptions of the benefits of health care interventions. Whilst this may be true, such critiques do not address the need for outcome measures to be adequate in a number of other respects, in particular with regard to responsiveness, acceptability, and feasibility.23 There is a trade off that has to be made between these properties as evaluative instruments for health care interventions such as total hip replacement. It is widely recognised that large sample sizes, almost inevitably from multicentre studies, are required to detect the modest differences between surgical strategies in hip replacement surgery.25 It is not feasible to collect detailed in depth information from patients on this scale.

Qualitative evidence has an important and distinctive role in the evaluation of health care.24 Indeed, in the NTHRP reported here, investigators have collaborated with the College of Health to analyse answers to open ended questions about their experiences. In the context of patient assessed outcomes, qualitative evidence is essential in initially identifying issues of concern to patients that need to be included in outcome measures. McMurray and colleagues also used qualitative evidence to suggest reasons for difficulties respondents may have with an instrument. However, it is less clear how qualitative evidence can contribute to identifying the modest but important benefits that may be associated with different surgical strategies. Health service researchers need to be able to detect such differences in order to improve the quality of total hip replacement surgery.

Quality Improvement in Nursing and Midwifery 12–17 November 2000 Oxford, UK

This seminar is aimed at nursing and midwifery leaders and managers who are involved in leading quality improvement initiatives at a strategic and operational level. It will be of interest to clinical leaders, managers and educationalists working within health care organisations or government departments with a remit or responsibility for quality.

The programme will cover the theory and the practical application of the concepts of quality improvement within the broader context of nursing and midwifery practice.

For more details please contact: Information Manager, International Networking Events, The British Council, 1 Beaumont Place, Oxford OX1 2PJ, UK. Tel: +44 (0) 1865 316636; Fax: +44 (0) 1865 557368 / 516590. email:{at}


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.