Assessment of patients’ tendency to give a positive or negative rating to healthcare
- 1Division of Clinical Epidemiology, University Hospitals of Geneva, and University of Geneva, Geneva, Switzerland
- 2Division of Orthopaedic Surgery, Department of Surgery, University Hospitals of Geneva, and University of Geneva, Geneva, Switzerland
- 3Quality of Care Service, University Hospitals of Geneva, and University of Geneva, Geneva, Switzerland
- Correspondence to Thomas V Perneger, Division of Clinical Epidemiology, University Hospitals of Geneva, 24 Micheli-du-Crest, CH-1211 Geneva, Switzerland;
- Accepted 14 July 2008
Background: Adjustment of patient satisfaction scores for case-mix variables such as age and sex may lead to overadjustment. The patient’s tendency to rate healthcare positively or negatively may be the only variable that should be adjusted to improve the comparability of satisfaction scores between healthcare providers.
Objective: To develop a measure of “rating tendency”, assess its stability over time, explore its distribution across subgroups of patients and its association with patient opinion scores.
Design and Subjects: A scale based on 10 hypothetical scenarios describing hospital care episodes was developed. It was administered both before and after hospitalisation to 203 patients programmed for elective orthopaedic surgery in a Swiss teaching hospital. A problem score regarding the actual hospitalisation was obtained at follow-up.
Results: The rating tendency scale had good internal consistency (Cronbach α 0.85), and factor analysis confirmed that it measured a single underlying concept. However, the correlation between prehospitalisation and posthospitalisation measures was moderate (intraclass correlation coefficient 0.55, p<0.001), as was the correlation with the hospitalisation problem score (Spearman r = −0.22, p = 0.002). The pattern of rating tendency across subgroups of respondents mirrored the pattern of problem scores. Adjusting for the rating tendency had little effect on comparisons of problem scores between subgroups of patients, all of whom were treated at the same hospital.
Conclusions: A patient’s “rating tendency” can be measured using a reliable 10-item scale. The utility of adjusting satisfaction scores for rating tendency when comparing hospitals remains to be tested.
Patient satisfaction surveys are used to monitor quality of care, identify domains for quality improvement and compare the performance of hospitals or health plans.1234 However, comparisons of satisfaction scores between healthcare providers may be confounded by patient characteristics and are therefore often distrusted.56
To produce more comparable results, satisfaction scores are often adjusted for any available patient characteristic, such as sex, age or health status.678910 The underlying assumption is that patient satisfaction is determined by two sets of causes: characteristics of healthcare received, which are of interest and patient characteristics, which are undesirable confounders. Statistical adjustment aims to remove the influence of these confounders.
However, adjustment for patient characteristics may also erase the effects of associated characteristics of healthcare and therefore result in overadjustment.11 For example, older patients often report higher levels of satisfaction than younger patients.78910 This age-related difference may be because of younger patients being inherently more critical about the services they received, which is unrelated to the quality of hospital care, or to hospitals being unable to respond adequately to the expectations of younger generations, which would reflect poor quality care. Similarly, ethnic minorities12 and patients with disabilities13 are more likely to be dissatisfied, either because they have a different way of answering satisfaction questions, or because they experience less appropriate healthcare.14 Statistical adjustment of satisfaction scores for these patient characteristics may erase meaningful differences in healthcare quality.1516
Whether this is problematic or not depends on the purpose for which the data were produced. Suppose that hospital A has lower satisfaction scores than hospital B because hospital A treats more minority patients, who tend to be more dissatisfied, and that if results are adjusted for minority status, the results of the two hospitals are even. Use of adjusted results emphasises comparability of the hospital scores and takes the influence of minority status as a given; the implication is that hospital A cannot do anything about its patient mix or about the relative dissatisfaction of minority patients. If results are not adjusted, the implication is that hospital A should adapt its healthcare processes to the needs of its population, which happens to include more minority patients. Both positions can be defended, but only the latter will lead to healthcare improvement.
Arguably, the only variable that should be adjusted for in patient opinion surveys is the patient’s tendency to give a positive or negative rating, or “rating tendency”, as it is completely out of the provider’s sphere of influence.11 Conceptually, the rating tendency is a type of response set or tendency to favour some kinds of answers across situations. Examples include the acquiescence response set or propensity to agree or disagree with statements regardless of content,1718 the social desirability response set or tendency to present oneself in a favourable light according to perceived social norms18, and the tendency to use “do not know” options. As defined in this article, the rating tendency concerns only ratings of healthcare and does not necessarily extend to answering other types of questions.
We propose a measurement method for rating tendency based on the assessment of vignettes that describe hypothetical healthcare episodes. Scenarios have been used successfully in health and social research.19202122 Standardised vignettes have been used to make self-reported health measures more comparable across populations.23 In this study, we developed a vignette-based measure of the respondents’ rating tendency, assessed its stability over time among patients hospitalised for elective orthopaedic surgery, explored the distribution of rating tendency scores across patient case-mix variables and assessed their association with satisfaction scores.
Study design and setting
The study was conducted at the orthopaedics department of University Hospitals of Geneva, a public teaching hospital in Geneva, Switzerland. Consecutive adult patients scheduled for elective arthroplasty of the hip or knee from 27 December 2004 to 14 March 2006 were invited to fill the rating tendency questionnaire twice, before and after hospitalisation. The questionnaires were sent by mail. Because it posed minimal risk to participants, the project was exempted from full review by the hospital research ethics committee.
To measure the respondent’s “rating tendency”, we wrote 12 hypothetical scenarios describing an episode of hospitalisation (Appendix A). Each scenario consisted of three to four sentences written in lay language. Ten described hospitalisations that were more or less problematic in terms of technical or interpersonal quality of healthcare, occurrence of adverse events and severity of health outcomes. The two first scenarios described excellent and very poor care, respectively, to check if participants understood the rating task. The contents of the vignettes were designed to vary the patient’s sex, age, type of healthcare and type of quality problem. After each scenario, participants were invited to rate the healthcare on a numerical scale that ranged from 1 (poor) to 7 (excellent). The items were pretested for ease of understanding among hospitalised patients.
Questionnaires and variables
The first survey package was completed by patients 1 week before their hospitalisation. It included the 12 scenarios, as well as two other health status questionnaires used for orthopaedics evaluation (the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) questionnaire24 and the 12-Item Short-Form Health Survey25). The second survey package was sent 6 weeks after patients’ discharge. It included the same 12 scenarios, along with the 15-item Picker Patient Experience questionnaire (PPE-15).26 The PPE-15 yields a global problem score, ranging from 0 (no problems) to 100 (all items problematic). We also included a single-item rating of the healthcare received on a five-point response scale (excellent to poor).
Additional variables included age, sex, nationality, level of education, having worked in a healthcare setting or being a healthcare professional, length of hospital stay, perceived health status, perceived change in health status since before the hospitalisation and “feeling downhearted and blue in the past 4 weeks”.
To assess the coherence of a participant’s responses, we computed the difference between the two anchoring items. The internal consistency of the remaining 10 items was investigated with Cronbach α coefficients, aiming for values above 0.75.18 An exploratory factor analysis was conducted to test the hypothesis that our scale measured a single underlying concept. After checking that all scale items had comparable variance, we computed the global rating tendency score as the mean of valid answers, whenever fewer than half of the items were missing.
A paired Wilcoxon test was used to compare measurements made before and after the hospitalisation. The intraclass correlation coefficient (ICC) was used to assess the test-retest reliability of the score. The ICC is a measure of agreement for continuous variables; it corresponds to the κ statistic for dichotomous variables. Unlike the Pearson or Spearman coefficients, the ICC would detect a systematic shift in responses between test and retest. To correct for attenuation by measurement error, we divided the ICC by the square root of the product of internal consistency coefficients of the prehospitalisation and posthospitalisation measures.18 This procedure estimates what the ICC would be if the score measures were perfectly reliable. To assess regression to the mean, we explored the scatter-plot of the differences between the two measures against the baseline measure.27
We compared means of rating tendency across case-mix variables using ANOVA. We obtained Spearman correlation coefficients of the posthospitalisation rating tendency score with the patient problem scores (PPE-15) and with the one-item global rating of healthcare. Finally, we explored the differences in patient problem scores (PPE-15) across subgroups of patients, unadjusted and after adjustment for rating tendency in general linear models.
Of the 360 eligible patients programmed for elective orthopaedic surgery, 333 completed the prehospitalisation questionnaire and 214 (64%) also answered the posthospitalisation survey. Eleven patients who returned the questionnaire but rated fewer than 8 of the 12 scenarios were considered as non-respondents. This left 203 of 333 respondents (61%) with complete data. The 130 patients with incomplete data did not significantly differ from respondents in age (mean 68.0 vs 68.1 years, p = 0.90), sex (proportion of women 54.6% vs 53.2%, p = 0.45), length of hospital stay (mean 12.2 vs 11.6 days, p = 0.19), nor in their prehospitalisation rating tendency scores (mean 3.82 vs 3.90, p = 0.45).
Respondents were on average 68 years old (10.7, range 25 to 91 years old). The mean length of stay was 11.6 days. There were more women than men and a majority were Swiss (table 1, columns 1 and 2). Two-thirds had completed only compulsory school or an apprenticeship and 10% had worked in a healthcare setting or as healthcare professionals. At the time of the posthospitalisation survey, only few respondents described their health as fair or poor or reported feeling downhearted and blue all or a good part of the time (table 1).
Most patients understood the task of rating vignettes: the difference between the two anchoring items—describing respectively excellent and poor care—was 6, ie, the maximum contrast, for most respondents and >3 for more than 90% of respondents. All 203 respondents were included in the following analysis (table 2).
The 10-item scale had high internal consistency, with Cronbach α coefficients of 0.83 before the hospitalisation and 0.85 after the hospitalisation. The factor analysis yielded a single principal component that explained 40% of total variance at the baseline survey and 45% of variance at follow-up. For all items, scores were well distributed between the two extremes, with no ceiling effect. The variance was comparable for all items (table 2). Five (2.5%) respondents at the baseline survey and 13 (6.9%) at follow-up had up to four missing answers among the 10 scale items.
Stability over time
Rating tendency scores were nearly normally distributed (fig 1) around an average of 3.90 (0.88) on a scale of 1 to 7 before the hospitalisation and 3.81 (0.89) after the hospitalisation. This difference was not statistically significant (Paired Wilcoxon test, p = 0.095). Only the first anchoring item and the sixth scale item showed a significant difference (table 2). The test-retest intraclass correlation coefficient was 0.55 (95% CI 0.45 to 0.64). Correction for attenuation brought the ICC to 0.65.
The scatter-plot of the differences between the two measures against the baseline showed a characteristic pattern of regression to the mean, namely respondents who gave extreme ratings in the first survey, either severe or lenient, were more likely, in the second survey, to give less extreme ratings (fig 2).
Rating tendency and case mix
The rating tendency varied across different subgroups of respondents (table 1). Women tended to give significantly more severe ratings, as did respondents who had worked in a healthcare setting and those who felt downhearted and blue all or most of the time. There were no significant differences in rating tendency according to age or perceived health status, but patients who reported little change in their health status following the hospitalisation reported more severe ratings.
The rating tendency scale did not correlate significantly with any of the health status scales administered before surgery (physical and mental summary scales from the 12-Item Short-Form survey and the WOMAC pain and function scales): all correlation coefficients were less than 0.10 and all p values were non-significant.
Rating tendency and problem scores
The reported mean problem score (PPE-15) was 21.5 (22.1, interquartile range 7 to 33). The correlation between the rating tendency and the problem score measured at the same time was modest (Spearman r = −0.22, p = 0.002). The correlation between rating tendency and the single item rating of care (excellent to poor) was slightly higher (Spearman r = −0.29, p<0.001). The problem score varied across subgroups of respondents in a pattern opposite to variations in rating tendency (table 3, column 1). Adjusting the problem score for rating tendency (measured after hospitalisation) had only a small impact on group comparisons (table 3, column 2).
The tendency to give a positive rating to healthcare can be measured by means of a set of short vignettes. The scale we developed was internally consistent (Cronbach α >0.8), and factor analysis confirmed that it measured a single underlying concept. The resulting scores were nearly normally distributed. However, contrary to our expectations, two consecutive measures—one before and one after hospitalisation—were only moderately correlated, suggesting that the rating tendency is not stable over time. There were significant differences in rating tendency between subgroups of respondents, in a pattern that mirrored the mean problem scores. However, adjustment for rating tendency had little effect among subgroup differences in problem scores.
The validation of the scale rests on several arguments. Firstly, the scale measures a single construct. We were not sure of this a priori. Respondents may have reacted differently to the various healthcare problems that were described. Furthermore, respondents may have been influenced by the specific wording of each vignette, especially by emotionally charged words. However, an empirical finding is that the items shared a substantial common variance, which indicates that a single phenomenon, independent of the specific content and wording of each scenario, was being measured. This result also indicates that the scale has good reliability.
The next question is whether this single construct is a rating tendency or something else. Several construct validity tests suggest that our interpretation is reasonable. Firstly, the scale appeared to be specific for patient opinions about healthcare, as it did not correlate with other psychometric scales, notably the health status scales. Thus the rating tendency scale does not measure a non-specific response set.
Secondly, the scale correlated more strongly with a pure rating of healthcare, the single-item “excellent to poor” assessment, than with the PPE-15 problem score, which combines factual reports of in-hospital care with subjective assessments (completely/in part/not at all). This suggests that the scale predominantly captures a subjective component of healthcare evaluation—it is a rating tendency, not a tendency to report certain facts.
Finally, if rating tendency influences satisfaction scores, the pattern of rating tendency scores should parallel the pattern of patient satisfaction scores. Our results support this assumption. As expected, the correlation between rating tendency and the problem score was negative but moderate. Indeed, the main determinant of self-reported problem scores is the patient’s experience of hospitalisation; the rating tendency should be but a minor determinant of patient assessments of healthcare.
Change over time
The correlation between prehospitalisation and posthospitalisation scores was only moderate (0.55). Measurement error is not the only explanation, given the high Cronbach α coefficients. More plausibly, the underlying variable—rating tendency—changed over time. The observed pattern of regression to the mean (fig 2) suggests that rating tendency fluctuates randomly.27 Thus, rating tendency appears to have both a stable and a variable component; it is both a trait and a state.18 The stable component may considered as a personality trait; others have found that patient satisfaction ratings correlate with a measure of agreeableness, one of the “big five” personality traits.28 In addition, the rating tendency may vary day to day according to the respondent’s mood. We did not assess daily mood in this study, but respondents who reported “feeling downhearted and blue” in the past 4 weeks rated hypothetical scenarios more negatively. Random fluctuations would not be problematic for the purpose of adjusting satisfaction scores, since if the rating tendency and satisfaction score were measured at the same time, both would be affected in the same way.
However, we cannot exclude that part of the change in rating tendency reflects the experience of hospitalisation that took place between the two measures. This experience could have changed the respondents’ interpretation of hypothetical scenarios, since vignettes may yield more valid results when they describe a familiar experience.19 Alternatively, the patients may have projected their own hospital experience onto the vignettes. If so, part of the posthospitalisation rating tendency could be an indirect measure of quality of care.
Adjusting satisfaction scores
We designed our scale to facilitate the comparisons of healthcare providers who do not serve the same profile of patients. But in our sample, the effect of a statistical adjustment for rating tendency was very modest. For example, women reported significantly more problems than men. This difference was substantial, as it reached about half of an SD in problem scores. At the same time, women also tended to rate hypothetical scenarios more severely. However, adjusting for rating tendency had little effect on problem scores. This raises the possibility that women experienced indeed lower quality care or at least care that was less sensitive to their needs or expectations, regardless of rating tendency. Similarly, adjusting for rating tendency had little effect on the comparison of problem score across age groups or severity of perceived health status. Although these variables are associated with care ratings in many studies, they were only weakly associated with rating tendency in our sample.
This modest effect of an adjustment for rating tendency on problem scores may be due to the homogeneity of our sample: few patients were younger than 50 years and few reported poor health status. All patients underwent elective joint replacement surgery at the same orthopaedics department, and the quality of care therefore varied little. Moreover, most respondents reported few problems with healthcare. Rating tendency may explain problem scores to a greater extent in more heterogeneous samples.
A limitation of our results is that generalisability to other settings is uncertain. For example, hospital stays are longer in Switzerland than in many other countries. However, these limitations should probably not affect the main finding regarding the measurement of rating tendency.
In conclusion, we found that the tendency to give a positive or negative rating to healthcare can be measured using a scale of 10 hypothetical scenarios. Further research should examine the stability of rating tendency over time. The performance and usefulness of this scale should also be confirmed in other contexts, such as in between-hospital comparisons, where rating tendency may be much more important in explaining variability in ratings of care. Finally, as our scale is designed to be administered along with a satisfaction questionnaire, the development of a shorter scale may be desirable.
The authors thank Véronique Kolly, RN, for the assistance with data collection, and Agatta Cleopas, MSc, who died unexpectedly before completion of the project. This study was funded by the Quality of Care Program, University Hospitals of Geneva.
Appendix A: Rating tendency questionnaire
“What do you think about these healthcare situations?”
Each of the followings paragraphs describes healthcare situations at the hospital. All these situations were invented. Please give your opinion about these situations, by circling the number which best describes your opinion. Please answer based on the provided information, even if it is incomplete. There is no right or wrong answer—it is your opinion that matters.
Funding At the time of the study, TA and TVP were with the Quality of Care Service, University Hospitals of Geneva, and University of Geneva, Geneva, Switzerland.
Competing interests None declared.