Article Text

Download PDFPDF
Development and testing of a questionnaire to measure patient satisfaction with intermediate care
  1. A Wilson1,
  2. G Hewitt1,
  3. R Matthews1,
  4. S H Richards2,
  5. S Shepperd3
  1. 1Department of Health Sciences, University of Leicester, Leicester, UK
  2. 2Institute of Health and Social Care Research, Peninsula Medical School, Universities of Exeter and Plymouth, Exeter, UK
  3. 3Department of Public Health, University of Oxford, Oxford, UK
  1. Correspondence to:
 A Wilson
 Department of Health Sciences, University of Leicester, Leicester General Hospital, Leicester LE5 4PW, UK;aw7{at}le.ac.uk

Abstract

Background: Individual trials have suggested high levels of general patient satisfaction with intermediate care, but this topic has not been examined in detail.

Aims: To identify the key elements of patient satisfaction with intermediate care, and to see whether these can be validly measured using a questionnaire.

Method: A questionnaire was developed on the basis of a literature review and piloting with patients and staff on participating schemes (phase I). In phase II, the questionnaire was tested for validity and reliability in a group of patients recently discharged from two “hospital-at-home” intermediate-care schemes. In phase III, a shortened version of the questionnaire was psychometrically tested in five sites taking part in a national evaluation of intermediate care.

Results: 96 patients with an average age of 76.5 years took part in phase II. Test–retest reliability was evaluated by repeating the questionnaire 2 weeks later in a subsample of 42 patients. This was “moderate” (κ 0.4–0.6) for 12 questions, “fair” (κ 0.2–0.4) for 6 questions and “poor” (κ 0.1–0.2) for 5 questions. Scores correlated well with the Client Satisfaction Questionnaire (Spearman’s r = 0.75, p<0.001). 843 patients (57% of those eligible) from five intermediate-care schemes took part in phase III. Principal components analysis suggested six factors or subscales: general satisfaction, affective response, cognitive response, timing of discharge, coordination after discharge, and access to pain relief, although the last three factors comprised only one question each. The intraclass correlation coefficients in the first three subscales varied from 0.82 to 0.89. Scores for all subscales differed by scheme, suggesting construct validity. Only one question (on general satisfaction) was found to be redundant.

Conclusion: The questionnaire, with some minor amendments to improve performance, could be used as a validated tool for audit and research in intermediate care. An amended version and scoring programme is available from us on request.

  • CSQ, Client Satisfaction Questionnaire
  • PCA, principal components analysis

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Attempts to provide community-based alternatives to acute hospital admission in the UK date back to winter bed initiatives in the past decade and the findings of the National Beds Enquiry in 2000.1 Intermediate care was coined as a generic term to describe this type of provision in a health services circular in 2000,2 and targets set to develop services in the National Health Services Plan.3 The three main functions of intermediate care are avoidance of admission to hospital, supported discharge and intensive rehabilitation. The health service circular describes a range of service models that can fulfil one or more of these functions. One of these is “hospital-at-home”, defined as “active treatment by healthcare professionals, for a limited time period, in the patient’s usual residence, for a condition that would otherwise require acute hospital in-patient care”.4

Patient satisfaction is an important consideration in service development. To date, most work on satisfaction with intermediate care has been conducted with hospital-at-home schemes. The Cochrane review of the effectiveness of hospital-at-home found evidence that patients prefer treatment at home to treatment at hospital.4 However, the authors noted that none of these trials used a satisfaction measure that had been specifically developed for hospital-at-home schemes, and in several cases questions related to issues such as ward cleanliness and hospital environment had to be removed.

We therefore suggest that there is a need to develop a questionnaire that includes the key elements of satisfaction with intermediate care, which can be used to assess care provided in both residential and non-residential settings. It is important that such measures of satisfaction should be theoretically based and assess all elements of care relevant to particular clinical settings.5 Maxwell identified six criteria on which the quality of healthcare services can be assessed, including access to services, their relevance to need, effectiveness, equity, efficiency and social acceptability.6

There is a consensus among researchers that satisfaction is an attitude combining a cognitive evaluation with an emotional response.7 A consequence of this is that, like other attitudes, it is stable and resistant to change. This means that attitudes to an episode of intermediate care will be highly influenced by previous experiences of healthcare in general. Other problems with satisfaction surveys relevant to this study include lack of variability in response, particularly if, as in this study, those surveyed are likely to be similar in age and health status, and the “halo effect”, whereby one striking impression shapes all other judgements.8 Over-reliance on total scores may also hide what variation does exist,5 and it has been found that asking about specific elements of care discloses problems that would not emerge from more general questions.9 However, there is a trade-off between asking about experiences, some of which may be specific to a particular service, and developing a questionnaire that can be used across services, which may differ in detail in the way care is planned and delivered.

In this paper, we describe the development of a questionnaire to capture patients’ views on the quality of intermediate care provided. Such an instrument would enable schemes to evaluate their own performance and compare results with similar schemes.

METHODS

The questionnaire was developed from the literature. Its construct and criterion validity and test–retest reliability were evaluated by administration in a sample of patients recently discharged from two hospital-at-home schemes; psychometric performance was tested by self-administration in a larger sample of patients included in the five sites taking part in a national evaluation of intermediate care.

Phase I: drafting and pre-piloting of questionnaire

The method followed used the basic principles for survey design suggested by de Vaus.10 This requires the definition of a conceptual basis of the questionnaire before designing the specific questions. A literature search for studies either developing conceptually related measures or discussing the theoretical basis of patient-satisfaction measures was therefore conducted before designing the questionnaire. The following topics, identified in a systematic review of patient satisfaction with medical care, were included 11:

  • overall quality of service;

  • humaneness (warmth, respect, kindness, willingness to listen, interpersonal skills);

  • competence;

  • outcomes;

  • facilities (equipment, etc);

  • continuity of care;

  • access (convenience, hours, availability);

  • informativeness (regarding treatment, procedures or diagnosis);

  • cost;

  • attention to psychological problems.

The first version of the questionnaire to be piloted included 1–3 questions about each topic, depending on how complex and relevant the topics seemed. For example, five questions about continuity were included because of its importance at admission, during care and at discharge. Conversely, only one question on outcomes was included, as we were primarily interested in the process of care. Where possible we used verbatim questions that had previously been tested. This version was pre-piloted with 15 patients to establish face validity, during which time they were asked to comment on omissions or irrelevant items. Comments on face validity were received from nine hospital-at-home staff employed by the two schemes taking part in the subsequent survey. This resulted in some rewording and additional questions—for example, about promoting independence. The pilot questionnaire had 23 attitudinal questions, dealing with the following topics: continuity (five questions); access, competence, information and general satisfaction (three each); humaneness and facilities (two each); psychological problems and outcome (one each).

Respondents were asked to respond on a 5-item Likert scale12 (strongly agree, agree, don’t know, disagree, strongly disagree) to each statement. A “don’t know” or neutral option was included, as we did not expect all participants to be able to offer a view on all aspects of care. Several questions (4/23) were phrased negatively to prevent “response acquiescence”, defined as the tendency to agree rather than disagree,12 although there is debate about this approach, with some evidence that using negatively phrased satisfaction questions in healthcare settings tends to inflate the overall level of satisfaction.13 The questions were arranged to follow patients’ experience of the scheme, starting with admission and ending with discharge.

Questions were scored from 1 to 5. Total scores for the scales and subscales were calculated as a percentage of the maximum possible, excluding any missing responses.

Phase II: testing of validity and reliability

Two hospital-at-home schemes agreed to take part in the study. We asked their staff to invite all patients being discharged to agree to an interview with a researcher at home, at which the questionnaire would be completed. A random subset of roughly half the total sample was identified to complete the questionnaire again 2 weeks after the first interview to assess test–retest reliability, and to complete the Client Satisfaction Questionnaire (CSQ)14 at the first interview to assess criterion validity. The CSQ is a well-validated instrument for service evaluation, comprising eight questions about general satisfaction (for example, “How would you rate the quality of service you received?” Excellent, good, fair or poor?). All items are scored on a 4-point Likert scale and responses summed to produce a maximum score of 32. Permission to use this questionnaire was obtained from its authors.

Test–retest reliability was measured using the κ statistic, with linear weights15 for each question. The κ statistic measures the extent of agreement beyond that expected purely by chance. The weighted κ gives credit for partial agreement by assigning diminishing weights between 1 and 0 according to distance from the diagonal; so, for example, the difference between “strongly agree” and “strongly disagree” carries more weight than the difference between “strongly agree” and “agree”.16 As the κ statistic is sensitive to the number of observations made and the distribution of those observations around the diagonal, a very low κ rating can occur even when there is high agreement between judges.17 Therefore, weighted observed agreement is also reported. Criterion validity (defined as correlation with an existing validated instrument18) was measured by calculating the correlation coefficient between total score on the questionnaire and total score for the CSQ. We calculated that 61 patients would be needed for test–retest reliability, on the basis of the κ statistic and in order to detect a difference of 0.15 (the range within which agreement is categorised as “good”, “fair”, etc is 0.2). For criterion validity, assuming that the correlation was approximately 0.7 with the CSQ, the required sample size was 50 (α 0.05, β 0.8) in order to produce 95% confidence intervals of around 0.52 to 0.82. We therefore aimed to test these on at least 60 patients. As no information was available to estimate the sample size for correlation with other factors that might affect satisfaction, we aimed for a total sample of 120 patients, acknowledging that this would be too small for extensive psychometric testing.

In general, there is good evidence that older people report higher levels of satisfaction.11 We therefore attempted to assess construct validity (defined as the extent to which the instrument tests the theory it is measuring18) by comparing scores according to age. We also compared the satisfaction score by type of admission, sex and health status (assessed by EuroQol—five dimensions19).

Phase III: testing of psychometrics

The performance of the questionnaire was assessed at five sites as part of a national evaluation of intermediate care.20 As some schemes included care in residential settings, the questionnaire was amended to remove questions related to care in the home. These were “The admission fitted in with my home arrangements”, “I felt as safe receiving treatment at home as in hospital” and “sometimes visits from the team disrupted my home arrangements”. Questions related to nursing and medical care were also dropped, as several schemes did not include these elements of care. These were “I received nursing care whenever I needed it” and “I received care from my doctor whenever I needed it”. The final version of the questionnaire contained 18 questions (table 1).

We asked staff in participating schemes to administer a questionnaire to all patients when they were being discharged from the scheme. A reply-paid envelope was provided for return to the intermediate-care coordinator of the respective primary care trust. Before forwarding completed questionnaires to the research team, intermediate-care coordinators checked questionnaires to ascertain whether there were any urgent complaints that required their attention.

Principal components analysis (PCA)20 was used to identify factors (subscales) on which items loaded. Although the questionnaire was developed from a theoretical base, we did not feel that the literature was sufficiently developed to use a confirmatory analysis as opposed to an exploratory analysis. Factors were included if their eigenvalue was one or more times the mean eigenvalue. Varimax rotation was used to minimise loadings on more than one factor. An item was considered to load on a factor when the correlation was ⩾0.4. Internal consistency within subscales was measured using Cronbach’s α coefficients. The correlation between subscales was also measured.

We found no accepted theoretical basis for determining sample size in questionnaire development using factor analysis. Fayers and Machin21 note that recommendations include between 5 and 10 times the number of variables. The final questionnaires included 18 items about satisfaction and so we required a sample size of about 200, but we encouraged schemes to include all patients so that individual feedback could be given. Although the acceptability of the questionnaire could be examined by reporting response rates, we were unable to link the returned questionnaires with demographic or clinical patient data.

In phases II and III, scores are presented as means and medians, to provide maximum information about their distribution. As scores were not normally distributed, they were compared using non-parametric tests.

The study was approved by the Trent Multi-centre Research Ethics Committee.

RESULTS

In total, 231 patients (118 and 113 from each scheme) were invited to take part in phase II. Initially 116 agreed, but of these 20 later withdrew, leaving a sample of 96 patients (41.6% of those eligible). Those who participated were slightly younger than those who refused (mean ages 76.5 (standard deviation (SD) 9.96)) and 78.7 (SD 9.24) years, respectively; p = 0.08, t test). Participation rates did not differ between schemes or according to whether the admission was for early discharge or to avoid admission to hospital.

The median score was 86, interquartile range 80–93 and range 33–100. Scores did not differ by age, sex or health status of the patient, or whether the admission was for early discharge or to avoid admission to hospital.

Criterion validity was tested by 46 patients also completing the CSQ at the time of the first interview. Correlation was good (Spearman’s r  = 0.75, p<0.001). Test–retest reliability was assessed by inviting these 46 patients to repeat the questionnaire 2 weeks later. Of these, 42 (91.3%) patients agreed. As the response “strongly disagree” (or “strongly agree” in negatively worded questions) was so infrequent, this and “disagree” were combined to calculate κ scores. Levels of agreement for individual items were variable, with most showing moderate agreement (κ ranging from 0.41 to <0.6; table 2).15 However, the weighted observed agreements were all >70%, suggesting better agreement. The skewed nature of these data may have affected the κ values. There did not appear to be any pattern to explain items with poor reliability.

Table 1A

Frequencies of test retest reliability of items (phase II study)

Table 1B

Test retest reliability of individual items

Table 2

 Responses to patient-satisfaction questionnaire in phase III

Median satisfaction scores were compared using the Mann–Whitney U test. They did not differ by whether the admission was for early discharge or for avoiding admission to hospital (84.3 and 85.6, respectively; p = 0.56), or by the sex of patient (both 84.35, p = 0.81). Those <80 had slightly higher scores than older patients (86.1 and 82.6, respectively) but this was not significant (p = 0.18). Nor did scores differ by health status; median scores for those with and without a longstanding disability were 83.6 and 86.1 (p = 0.43), respectively. The satisfaction score had a low correlation with the EuroQol—five dimensions (Spearman’s r = 0.22).

In phase III, a total of 843 questionnaires were returned out of a total of 1470 completed episodes of care, giving a response rate of 57%. Response rates by site varied from 39% to 68%. Table 1 shows the overall frequency of patient responses to items in the questionnaire. The highest score was seen in question 10, about interpersonal aspects of care. Question 13, about timing of discharge, scored slightly lower than others. The number of missing values was low, but was greatest for question 14, about coordination of care after discharge.

PCA of questionnaire

The analysis was carried out excluding items on general satisfaction (questions 15–18), as these items will tend to load with most other measures of satisfaction.22 Five factors were identified (table 3). Subscale 1 appeared to include questions eliciting an affective response, such as the way the patient felt about the care received, and subscale 2 included questions eliciting a cognitive response, such as satisfaction with the amount of information received. Three questions loaded on to their own subscale: question 14 about coordination of care after discharge formed subscale 3, question 13 about care finishing too early formed subscale 4 and question 5 about access to pain relief formed subscale 5.

Table 3

 Principal components analysis of questionnaire

We therefore considered that the questionnaire had six subscales; “general”, “affective”, “cognitive”, “coordination after discharge”, “discharge timing” and “access to pain relief”, although the last three of these comprised only single items. Subscales were correlated with each other (table 4). A high correlation (r>0.7) was found between general, affective and cognitive scales, suggesting that these were largely measuring the same attitude. Correlation between the single-item scales and with the three general scales was lower (r 0.101–0.551), suggesting that these were measuring different elements of satisfaction.

Table 4

 Correlation of subscales (804 cases)

We then examined internal consistency within each of the three scales that included more than one question, using Cronbach’s α. All the subscales had high internal consistency, with Cronbach’s α of 0.89 for the affective subscale, 0.87 for the cognitive subscale and 0.82 for the general satisfaction subscale, within the 0.7–0.9 range usually deemed desirable.23 To test for redundant items, the effect of removing individual questions in each scale on Cronbach’s α was calculated. The removal of question 17 (“There are some things the team could have done better”) increased α from 0.82 to 0.84, suggesting that this item was redundant and should be dropped from the questionnaire.

Calculation of scores for subscales

Scores out of 100 were calculated for each subscale containing more than one question. Mean (SD) scores were as follows: affective: 86.8 (11.4), cognitive 85.8 (11.9) and general 82.7 (13.2). Median scale scores were 86.7, 84.0 and 80.0, respectively. Scores for subscales with only one question are presented in table 1. Table 5 shows the comparison of these results for each case-study site. Significant differences were observed between schemes on all scales (p<0.001, Kruskal–Wallis) except for “coordination after discharge”, for which the p value was <0.03.

Table 5

 Comparison of satisfaction scores between case-study sites

DISCUSSION

To our knowledge, this is the only questionnaire developed in the UK or elsewhere to assess satisfaction with intermediate care in detail. It has been shown to be feasible to administer by interview and self-administration, to be reasonably reliable and valid, and to discriminate between different aspects and providers of care.

Results from phase II show that it is feasible to administer the questionnaire to a group of frail elderly people recently discharged from hospital-at-home. The questions seemed to be understood and answerable, with few missing values. The proportion of discharged patients agreeing to take part was less than we had hoped. Although those who refused were similar to patients demographically, we cannot exclude the possibility that they may have been less satisfied, or that schemes tried harder to recruit patients who they felt would give a better account of their experience. This may have contributed to the homogeneity of responses, although it is known that elderly people are reluctant to criticise the care they have received.11 The clustering of positive responses could explain why test–retest reliability was lower than the “good”—that is, a κ score of ⩾0.6—that one would hope for in questionnaire development. Most of the variance in scores was between “agreeing” and “strongly agreeing” with an item of satisfaction, which is likely to be a less repeatable judgement than agreeing or disagreeing. The 2-week time interval between testing may also have reduced the estimate of reliability, as respondents’ judgements may have altered. Homogeneity within patients is a likely explanation for us failing to detect any differences in scores according to age, health status, etc. However, correlation with the CSQ was good, suggesting criterion validity and that the questionnaire does assess general satisfaction with the service.

Results from phase III show that it is feasible to measure patient satisfaction with intermediate care using a self-completed questionnaire issued at the time of discharge. Levels of satisfaction were high and comparable with other surveys of health service provision.8 The aspect of care with the lowest scores was timing of discharge. A limitation of the study design is that we do not know whether there were non-responders, either because the service did not issue the questionnaire or because the patient failed to complete it. The finding that one site managed to achieve an acceptable response rate of 68% suggests that at other sites service factors were responsible for the lower number of returns. Even so, it is possible that those least satisfied with their care did not complete the questionnaire, which may have resulted in some selection bias in these pragmatic settings. The questionnaire was able to show differences between sites, suggesting that it has construct validity. Further use of the instrument in different settings and with more heterogeneous populations will allow further testing of construct validity. More work is also needed to show the responsiveness of the questionnaires to change—that is, if a service makes improvements, does its score improve?

The factor analysis failed to identify the domains of satisfaction we had derived from the literature when designing the questionnaire. This suggests that respondents did not discriminate between these specific elements of care, but made overall judgements of satisfaction on cognitive and emotional levels, with these being highly correlated (table 4). However, the three questions that loaded on to their own factors suggest that these issues (access to pain relief, timing of discharge and coordination of care after discharge) measure different elements of satisfaction and so should be retained.

Results have suggested several ways in which the questionnaire could be improved. Firstly, all questions should include a “not applicable” option, to prevent respondents using the midpoint of the scale in this circumstance. Secondly, question 17 (“There are some things the team could have done better”) seems redundant and so should be omitted. As with any new questionnaire, its use in different settings will accumulate more robust evidence about its construct validity.22

CONCLUSIONS

We consider that our questionnaire, with minor amendments as outlined above, could be used as a tool for benchmarking, audit and research for schemes similar to those taking part in this study. An amended version and scoring programme is available from us on request.

Acknowledgments

Phases I and II of the study were funded by National Health Services Trent. Phase III was conducted as part of a national evaluation of intermediate care, funded by the Medical Research Council and Department of Health. We thank Monika Hare who conducted the pilot study; interviewers Sheila White and Sue Graham; Richard Baker and Carolyn Tarrant for their comments on earlier drafts of this paper; and Nicky Spiers for statistical advice. The study would not have been possible without the cooperation of participating schemes and their patients.

REFERENCES

Footnotes

  • Competing interests: None.

Linked Articles

  • Quality lines
    David P Stevens