Background The National Health Service National Patient Survey Programme systematically gathers patients’ experiences about the care they have recently received. Prioritising quality improvement activities in the accident and emergency (A&E) department requires that survey outcomes are meaningful and reliable. We aimed to determine which method of obtaining summary scores for the A&E department questionnaire optimally combined good interpretability with robust psychometric characteristics.
Methods A&E department questionnaire data from 151 hospital trusts were analysed, covering 49 646 patients. Three methods of grouping and summarising items of the questionnaire were compared: principal components analysis (PCA); Department of Health dimensions; sections according to the patient's journey through the A&E department. The patient-level reliability of summary scores was determined by Cronbach's α coefficients (threshold: α>0.70), construct validity by Pearson's correlation coefficients, and the discriminative capacity by intra-class correlation coefficients (ICCs) and reliability of A&E-level mean scores.
Results The PCA provided the best score reliability on six clear and interpretable composites: waiting time; doctors and nurses; your care and treatment; hygiene; information before discharge; overall. The discriminative power of the concepts was comparable for the three methods, with ICCs between 0.010 and 0.061. A&E sample sizes were adequate to obtain good to excellent reliability of A&E-level mean scores.
Conclusions The A&E department questionnaire is a valid and reliable questionnaire to assess patients’ experiences with the A&E. The discriminative power of six summary scores offers a reliable comparison of healthcare performance between A&Es to increase patient centredness and quality of care.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Listening to patients’ views is essential to providing a patient-centred health service. Many studies have underlined the fundamental importance of patient-established criteria for effective quality assessment. The importance of patients’ views and experiences as an essential component of evaluation and improvement in healthcare has been emphasised both in studies and policy.1–6
Surveys are an important way to find out what patients have experienced in accident and emergency departments (A&Es).7 The information provided by patients can be used to encourage and to prioritise local quality improvement activities. To increase the value and usefulness of this information, a meaningful and interpretable measure should be available. Specifically, these measures should avoid simply focusing on broad, vague concepts such as ‘satisfaction’ but should instead ask people to report events that occurred during their care and treatment.8 ,9 Ideally, A&Es and other stakeholders in emergency medicine are provided with a measure which is applicable for benchmarks and to assess service improvements, national and international. To enable these comparisons the measure needs to be consistent and rigorous.
In England, understanding what patients think about their care and treatment is an important part of the Care Quality Commission's (CQC) duty to assess and report on the quality and safety of services provided by the National Health Service (NHS). One of the ways in which CQC exercises this duty is via a national NHS patient survey programme that systematically gathers the views of patients about the care they have recently received.10 All surveys in the NHS National Patient Survey Programme follow a similar methodology, provide comprehensive documentation, and report results consistently.11 The A&E department questionnaire was developed to assess patient experiences in the A&E department. The questionnaire was cognitively tested with English-speaking patients and was found to be construct valid.12
The effect of the findings of survey data largely depends on the presentation of the outcomes for their users, such as government, commissioners, regulators, policymakers and patients. Summary scores help get an overview of performance and enable identification of the broad areas of strength and weakness. Data reduction, a process whereby items are grouped and summarised, enables more robust comparisons due to enhanced reliability. For the A&E department questionnaire, three different methods of grouping are relevant. Firstly, factor analysis, which is the most common statistical approach to grouping items in surveys. Factor analysis identifies which items are statistically related and refer jointly to an underlying domain (or factor).13 The items can thus be reduced to the smallest possible number of concepts that still explain the largest possible part of the variance. The concepts provide an evidence-based, patient-focused outcome measure.14 ,15
Secondly, the Department of Health constructed five ‘domains’ that are conceptually and thematically similar for all patient experience surveys in the National Patient Survey Programme. Each core questionnaire typically contains around 50–100 experience questions. A subset of these questions has been chosen to represent findings against each of five patient experience dimensions: ‘access and waiting’, ‘safe, high quality, coordinated care’, ‘better information, more choice’, ‘building better relationships’ and ‘clean, comfortable, friendly place to be’. It is possible to use these domains to compare organisations that participate in the same survey. National results have been published in key finding reports and have been used extensively for system level performance management.16–19
A third way to represent the outcomes is according to the patient's journey through the A&E department. The A&E department questionnaire is categorised in sections according to the patient's journey from arrival until departure of the A&E department. For healthcare providers, reporting patients’ experiences in the sequence of the patient's journey may be the most interpretable way of summarising a survey.20 The three approaches described above each have their own benefits, but up to now little has been done to objectively compare their strengths in relation to the A&E survey. As the A&E survey is due to be repeated nationally in 2012—having previously been run in 2004 and 2008—it is timely to look at the domains emerging from these approaches. To enhance the main goal of the A&E survey, which is overall improvement of quality of care, evidence-based outcome measures are preferable. Therefore, to assess the validity and reliability of the questionnaire, psychometric properties need to be tested. The aim of this study was to determine which method of obtaining summary scores for the A&E department questionnaire optimally combined good interpretability and was the most robust in terms of its psychometric characteristics.
A secondary analysis of data from a cross-sectional survey of A&E department attendees was performed.
Setting and participants
The A&E survey of the National Survey Programme was run in 2008 in 151 hospital trusts in England. For each eligible hospital trust, a systematic sample without replacement of 850 patients out of a 1-month sample of A&E attendees was selected. Trusts were able to select one of three months, January or February or March 2008, in case any particular month was ‘atypical’—for example in case of large-scale local emergencies that may have placed unusual burden on the service. Annual emergency department attendances ranged from 11 058 to 306 689 patients. Patients were not eligible for the survey if they were under the age of 16, had attended a Minor Injuries Unit or Walk-in Centre, had been admitted to hospital via Medical or Surgical Admissions Units (and therefore had not visited the emergency department) or had a planned attendance at an outpatient clinic run within the emergency department. The paper questionnaire and covering letter were sent by postal mail up to 3 months after the A&E attendance. Up to two reminders were sent to non-respondents at 2-weekly intervals. The recipients could return the questionnaire in a postage paid envelope.
The A&E department questionnaire consisted of 50 questions divided into 11 different sections: arrival at the emergency department; waiting; doctors and nurses; your care and treatment; tests; pain; hospital environment and facilities; leaving the emergency department; overall; about you; any other comments. This structure was designed to correspond to the usual sequence of a visit to an A&E department, with the aim of making the questionnaire appear logically ordered: this is desirable as it may yield increased response rates.21 The questions used in the analysis are presented in online appendix A. The protocol for the original survey was reviewed and given a favourable ethical opinion by the North West Research Ethics Committee of the National Health Service.
Data screening and pre-analysis
Data from the survey were first analysed to identify item response rates and distributions. Questionnaire items were excluded from further analysis when they had an item non-response of >10% of expected responses (taking into account ‘skip to’ questions). Questions with high missing data are likely to be more relevant in some NHS trusts than in others, and rates of missing data typically vary between trusts. Therefore, items remained in the questionnaire but were not used for this summary measure. Questions with a high average will have very low base sizes for some trusts, which would make their use in summary measures aimed at all trusts problematic. Also, questions were excluded when they had an extreme skew of >90% of responses in the same category (ie, a ceiling or floor effect). The ceiling effect limits its usefulness for comparisons, but if a trust is an outlier on that question then they should know about it and act on it. Where items had a negative wording, their scales were reversed to ensure comparability in the analysis. For each item, the response categories were scored from 0 to 100 with intermediate options at equal intervals.
Construction of composites
Composite item sets were readily available for the latter two of the three approaches outlined above, but analysis was undertaken to identify a set of items based on the factor analysis approach. Multiple analyses were performed using principal components analysis (PCA). In case an item loaded on more than one factor, the factor with the highest loading was used. In a subsequent step, factor loadings were obtained (threshold: factor loading >0.40) for the factor composites. Internal consistency was calculated; if Cronbach's α for internal consistency (α) increased when an item was left out of the factor, the item was dropped. The factor structure in the final PCA fulfilled the statistical criteria. Nevertheless, to improve the clarity and interpretation of the factors, we decided to break down a large factor that covered multiple quality aspects into three factors, each measuring a single quality aspect.
Summary scores were calculated as the means of the experience scores for the items contributing to the composite after PCA, Department of Health dimension and questionnaire section. The concepts and items of the three different methods to reduce the data are presented in table 1. Cronbach's α was calculated to estimate the internal consistency of the three concepts. Coefficients above 0.70 were regarded as reliable.22 Construct validity was studied by calculating Pearson's correlation coefficients between the concept scores (table 2). Pearson's correlation coefficient expresses the similarities of underlying constructs of the concepts. A correlation above 0.70 indicated that both concepts partially measured the same construct.
Additionally, the variance per A&E department and the intra-class correlation coefficient (ICC) were calculated. The variance describes the variability of the A&Es, whilst the ICC expresses the discriminative power of the concepts. The discriminative power is a general assessment of differences between healthcare providers; the variance attributable to providers can be tested for significance. The magnitude of the variance between providers may then be expressed as a proportion of the total variance on a scale from 0 to 1.23
Next, the calculations were repeated after adjusting the data for age (eight categories) and gender of the respondents,24 and again, after creating a more homogenous sample. The effect of heterogeneity of the A&Es was investigated with a more homogenous sample, which was constructed by deleting from the original sample all trusts characterised as multiservice, specialised or ‘unknown’. Decreases of variances and related statistical measures imply that differences between trusts are partially caused by their characteristics.
Finally, A&E-level reliability, which expresses the proportion of variation in A&E-level mean scores attributable to true variation between A&Es, was estimated using generalisability theory.25 ,26 The essence of generalisability theory is the recognition that in any measurement situation there are multiple sources of error variance due, for instance, to random sampling. The theory contains two stages. In the first stage, called G-study, the variances are used to create G-coefficients, each an extension of the classical reliability coefficient. The G-coefficients look at the proportion of total variance due to the object of measurement. In the final step the variances derived from the G-study are used to set the sample sizes needed to obtain a reliability of 0.7, 0.8 or 0.9. This is called a D-study.
Details of the statistical methods used are shown in online appendix B. All analyses were performed using the statistical software SPSS 19.0 and R 2.10.1.
Questionnaires were sent to 128 383 patients and completed questionnaires were received from 49 646 respondents. This represented an adjusted response rate of 40% when undelivered questionnaires, ineligible patients, and deceased patients had been accounted for. Adjusted response rates varied between trusts from 26% to 52%. The mean age of the respondents was 54 years and 45% were men. For gender and age, the differences between respondents and non-respondents were small but significant (p<0.001), as expected given the sample size.
Missing values ranged from 0.3% for the question ‘Was it possible to find a convenient place to park in the hospital car park?’ to 3.6% for ‘Do you think the hospital staff did everything they could to help you control your pain?’. The most skewed question was ‘While you were in the Emergency Department, did you feel bothered or threatened by other patients?’ A total of 90.3% answered ‘no’, and the question was therefore not included in further analyses.
PCA identified four factors that accounted for 50.7% of explained variance. A first factor with 12 items and 31.8% of the variance, a second factor with five items and 7.4% of the variance, a third factor with three items and 6.3% of the variance and a fourth factor with two items and 5.3% of the variance. The content of the PCA factors showed similarities with the questionnaire sections. The first factor contained most items of the sections ‘doctors and nurses’ and ‘your care and treatment’ and all items of the ‘overall’ section: it was divided into three parts according to these sections to enhance interpretability. The second factor contained the same items as the section ‘leaving the emergency department’. Three out of four items of the section ‘waiting’ formed the third factor. The last factor contained two out of three items of the ‘hospital environment and facilities’ section.
Table 1 shows the concepts after reducing data in three different ways. Cronbach's α described the internal consistency of each subscale. The highest α coefficients overall were for the six composites after PCA, with coefficients ranging from 0.634 to 0.877: only the ‘waiting time’ subscale had a value below 0.7. One question—Q17—was added after PCA to the ‘your care and treatment’ subscale to improve internal consistency. Three out of five α coefficients of the national survey dimensions ‘access and waiting’, ‘safe, high quality, coordinated care’ and ‘better information, more choice’ were below the threshold of α=0.7. The α coefficients of the other two domains were α=0.701 and α=0.805. For the sections based on the patient's journey, the α coefficients of the sections ‘waiting’ and ‘hospital environment and facilities’ were below α=0.7. The other five coefficients ranged between α=0.729 and α=0.841.
Pearson's correlation coefficients and unbiased, corrected correlations are presented in table 2. Of interest were the correlations above the threshold of 0.7, which indicates an overlap between the concepts. The correlations of the concepts ‘doctors and nurses’, ‘your care and treatment’ and ‘overall’ were above this threshold (composites after PCA and sections). Thus these concepts partly measure the same aspect of healthcare performance in the A&E (and were originally included in a single factor in the PCA). The second and third dimensions show correlations above the threshold of 0.7 after corrections.
The ICC of a concept is the ability of that concept to point out differences in healthcare performance between A&Es. ICCs ranged from 0.010 to 0.061 for the composites after PCA. In other words, a small part of the total variability in experience of healthcare measured by these composites was attributable to performance differences between A&Es, namely 1.0–6.1%. ICCs of the DH dimensions were 0.011–0.049 (1.1–4.9%), and those of the sections of the questionnaire were 0.010–0.056 (1.0–5.6%). Adjustment for age (eight categories) and gender caused a minimal reduction in ICCs (0 to 0.002). The ICCs calculated for the more homogenous sample of A&Es were influenced minimally as well (0 to 0.003). Patients’ characteristics or trust characteristics made a very small difference to the variability between A&Es. Table 3 shows the estimates of the concepts, including mean experience score and the standard deviation; the variance between A&Es, and the ICC. Furthermore, the reliability (G-coefficient) of the mean value given the actual sample size of the A&Es was presented, and used to set the sample sizes needed to obtain reliability of 0.7, 0.8 or 0.9. Composites with a high A&E-level reliability (>0.9) may have good value as measures of comparative performance at the sample size available. For a reliability of 0.7 most required sample sizes appeared to be rather small.
In general, data reduction has the aim to enhance clarity and comprehensibility of survey data. The focus of this study was to determine a meaningful and reliable presentation of the outcomes of the A&E survey for their users. We studied three data reduction methods for the A&E department questionnaire. First, PCA resulted in six composites, which covered 23 items of the questionnaire. Second, the five dimensions of the national patient experience programme covered 19 items. Last, the patient's journey and questionnaire sections resulted in nine sections, which covered 32 items.
In this study, the PCA exhibited better internal consistency than the other two methods. The content and interpretability of all composites were clear. Variances and ICCs, and therefore the discriminative power of the concepts, were comparable for the three methods. Sample sizes were adequate to obtain good to excellent reliability of the A&E-level mean scores. The DH dimensions showed lower reliability and interpretability compared with the other methods.
Inevitably, data reduction causes a loss of content of the questionnaire. The patient's journey and questionnaire sections might give the broadest representation of the content of the questionnaire. Nevertheless, we advise representing the outcomes of the A&E department questionnaire according to the more reliable six composites after PCA, although the number of items in the composites is lower and some content is lost. Increased reliability (by definition) gives better discrimination between objects of measurement; unreliable measures attenuate relationships and will give less precise estimates of performance.
We decided to break down the ‘original’ first factor of the PCA into three separate composites. In our opinion a composite that measures a single aspect of care is more useful for quality improvement and benchmarking than a larger composite that measures multiple aspects of care, even though the latter might satisfy the statistical criteria. The three composites are easier to interpret, more informative and more specific than the ‘original’ factor. The ‘original’ factor and items were similar to the items of the three sections ‘doctors and nurses’, ‘your care and treatment’ and ‘overall’, which appeared to be reliable and followed a logical sequence, which enhances interpretation of outcomes. We are aware that there was no mathematical reason for breaking down the factor.
Pearson's correlation coefficients of the five DH domains were below the threshold of 0.70 before correcting the correlations for random error of the reliability estimates. Afterwards the DH dimensions were above the 0.7 threshold and higher than the other correlations. The raw Pearson's correlation can be regarded as a lower bound estimate and the ‘true’ correlation may be greater. Hence, the ‘true’ correlation would be somewhere between the raw and adjusted coefficients. The correlations of the ‘doctors and nurses’, ‘your care and treatment’ and ‘overall’ concepts of the other two approaches were above the 0.7 threshold. This implies that these sections partially measure a similar underlying construct of the care provided at the A&E, which was expected for the ‘overall’ concept. The high correlation between ‘doctors and nurses’ and ‘your care and treatment’ was supported by PCA, in which these composites originally formed one factor, together with the ‘overall’ composite.
ICCs of all concepts after data reduction were good compared with other survey data. In other studies on patient experiences the mean ICC was 0.01 for unadjusted data.23 Adjusting the data for age and gender of the patients did not affect the variance or ICC in our study; the largest decrease of the ICC was 0.002. In the present study the lowest ICC was 0.01, whilst the highest was 0.061 for the composite ‘hygiene’ after PCA. Thus, patients’ reported experiences can measure differences in healthcare performance between A&E departments. However, interpretation of these numbers showed that only 1–6% of the total score variance was attributable to the difference between providers, suggesting that individual variation outweighs variation between trusts.
A&E reliability of the mean scores expressed the proportion of variation at the A&E level attributable to true variation between A&Es. The A&E-level reliability of the concepts was good, and for several concepts excellent, which would support their potential for comparative performance assessment. The reliability of the ‘information before discharge’ composite was below the threshold of 0.7. The minimum number of respondents required for all PCA domains to have good reliability at the A&E level is 237. Nevertheless, concepts that show high internal consistency but that are less capable of distinguishing small differences—in other words, those with high reliability but low ICC—should be used with caution. Large sample sizes may be needed to compare organisational performance against these concepts, and larger sample sizes increase the cost of postal surveys. Table 3 shows the reliability and required sample sizes for each concept, obtained using generalisability theory: the same methodology was applied by Lyratzopolous et al using a different terminology.27
The first main strength of this study was the large database of 49 646 respondents distributed over 151 national acute services trusts. The questionnaire provides the government, commissioners, policy makers and patients with a measure to benchmark best practice and to assess service improvement. Second, the A&E survey is part of the national survey programme of 2012. Whether healthcare performances in the A&E, and quality of care as experienced by patients, have been changed over the last 4 years can be explored, adding value to the programme. Third, this study contributes to the international interest in patient-centred care. Future research should establish the possibility of using this measure for international comparisons of quality of emergency care.
We found some small but significant differences between respondents and non-respondents in age and sex. This was likely due to the large sample size and therefore statistical power and not a reflection of meaningful differences between these populations. Patients’ symptoms could have evoked recall bias (ie, due to loss of consciousness): however, the questionnaire was tested via cognitive interviews before use and was found to work well.
We found several similarities between the data-reducing methods. First, the ranges of Cronbach's α coefficients, variances, ICCs and A&E-level reliability estimates between the three concepts were small. Second, the composites after PCA ‘information before discharge’ and the section ‘leaving the ED’ were similar. The different numbers of composites, dimensions and sections made a comparison between the three methods somewhat arbitrary. The overall composite could justifiably be removed since it showed high correlations to two other domains. Only one out of three items of the ‘overall’ composite will be maintained in the A&E survey in 2012.
Data reduction causes loss of content, and consequently the summary scores did not represent all aspects of patients’ journeys through the A&E. From a clinical point of view, it might be preferable to evaluate the quality of care from arrival until departure of the A&E using individual items in addition to the summary scores: this could be particularly useful for locally initiated work aiming to report at a sub-organisational level. However, for organisation level use it might also be a logical step to create a shortened version of the A&E department questionnaire based on the six reliable composites. A shorter survey would decrease patient burden, and might improve response. Previous research has shown mixed evidence on questionnaire length and response rate.28 ,29
The A&E department questionnaire is a valid and reliable questionnaire to assess patients’ experiences with the A&E department. The discriminative power of six summary scores offers a reliable comparison of healthcare performance between A&Es to increase patient centredness and improve quality of care.
Contributors NB and HS conceived of the idea. NB and SS undertook the data analysis. All co-authors co-wrote the first and subsequent drafts, and contributed to edition and reviewing the final version.
Competing interests Two of the authors are employees of Picker Institute Europe, which is contracted to the Care Quality Commission to develop national patient experience surveys, and was involved in developing and coordinating the surveys that provided the data for this study.
Patient consent This study entails secondary analysis of patient survey data. Ethics clearance for the accident and emergency department survey was obtained by the Picker Institute on behalf of the Care Quality Commission prior to the survey commencing, with consent provided via patient responses.
Provenance and peer review Not commissioned; externally peer reviewed.