Objective To assess the psychometric properties of the French-language version of the Hospital Survey on Patient Safety Culture (HSOPSC).
Methods Data were obtained from a staff survey at a Swiss multisite hospital. We computed descriptive statistics and internal consistency coefficients, then conducted a confirmatory and exploratory factor analysis, and performed construct validity tests.
Results 1171 staff members participated (response rate 74%). The internal consistency coefficients of the 12 dimension scores ranged from 0.57 to 0.86 (median 0.73). Confirmatory factor analysis indicated a reasonable but not perfect fit of the hypothesised measurement model (root mean square error of approximation 0.043, comparative fit index 0.89). Exploratory data analysis suggested 10 dimensions instead of 12, grouping items from teamwork across hospital units with those of hospital handoffs and transitions, and items from communication openness with those of feedback and communication about error. However, the loading pattern was clean: 41 of 42 main loadings exceeded 0.40, and only 3 of 378 cross-loadings exceeded 0.30. All 10 process scores were higher among respondents who rated the global safety grade as ‘excellent’ or ‘very good’ rather than ‘good’, ‘fair’ or ‘poor’ (effect sizes 0.41–0.79, all p<0.001), but score differences between those who have and have not reported an incident in the past year were weak or inconsistent with theory.
Discussion The French version of the HSOPSC did not perform as well as the original in standard psychometric analyses.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Assessment of patient safety culture at a hospital usually includes a survey of staff perceptions.1 ,2 Several instruments exist for this purpose.3 These instruments vary in their conceptualisation of patient safety culture and in the dimensions that they purport to measure, but most address dimensions such as hospital leadership, policies and procedures, communication between hospital staff and between hospital units, adequacy of staffing and reporting of incidents.3 Among these is the Hospital Survey on Patient Safety Culture (HSOPSC), developed under the aegis of the US Agency for Healthcare Research and Quality.4 The questionnaire was developed by Westat on the basis of a broad literature review and an examination of items and dimensions in similar instruments.4 Following cognitive testing and external review by experts, a pilot survey of 79 items grouped into 14 dimensions was created and administered to a sample of employees of 21 US hospitals. Repeated confirmatory factor analysis (CFA) of this dataset guided the deletion of problematic items and led to a reduction of the number of measured dimensions. The final questionnaire consists of 12 dimensions (10 process dimensions and 2 outcome dimensions) measured by 42 items, with 3 or 4 items per dimension.
The HSOPSC has gained importance and visibility in Europe as it has been endorsed by the European Union Network for Patient Safety5 and by the ‘Safety Improvement for Patients in Europe’ project on behalf of the Council of Europe.6 However, this instrument is used worldwide. It has been translated into several languages.7–18 Because translation may change the meaning of the questions that are asked and therefore alter the ability of the instrument to measure the intended quantities, translated instruments require a validation in the target language. Furthermore, hospital organisation and functioning may differ across countries—for example, in management style, team organisation, implementation of incident reporting systems, etc.—so that the dimensions measured by the instrument and indeed the underlying model of patient safety culture may be only partly applicable to the local situation. Finally, any instrument that has been developed and optimised in a given setting may be overfitted and is likely to perform less well when used in a different context. For these reasons, it is important to reassess the psychometric properties of the HSOPSC when administered in non-US hospitals. Of note, all eight translations or adaptations of the HSOPSC that we reviewed performed less well than the original. For example, internal consistency coefficients of translations are on average 0.09 lower than for the original scores, no CFA yielded a fully adequate fit and no exploratory factor analysis (EFA) assigned all items to the intended dimensions (these results are presented further below along with our own results).
In this study, we explored the psychometric properties of a French-language version of the HSOPSC using data collected at a Swiss multisite hospital.
Participants and setting
The study was conducted at a multisite public hospital serving a population of about 170 000 in French-speaking Switzerland.19 The hospital has a total capacity of 450 beds distributed over seven hospital facilities, two major and five smaller facilities. The main aim of the survey was to assess the patient safety culture among hospital staff. As the survey was conducted for management purposes and did not involve any patients, it was exempted from review by the research ethics committee per local policy. All employees with at least 6 months of employment, whether directly involved in patient care or members of management, were invited to participate. We distributed 1583 questionnaires; 1221 were returned, of which 50 were excluded because >50% of items were missing and 1171 (74.0%) were included. By comparison with the hospital employment roster, the response rate was 73% for nurses, 92% for nurse managers, 80% for physicians, 56% for nursing aides, 68% for pharmacists and 55% for administrative employees and technicians.
Questionnaire and key variables
We used the HSOPSC developed by the US Agency for Healthcare Research and Quality,4 translated into French by a Belgian team.18 We made no changes to this instrument. The 42 items were answered on a five-point agreement scale (from ‘strongly agree’ to ‘strongly disagree’) or a five-point frequency scale (from ‘always’ to ‘never’). The items measure seven unit-level process dimensions—supervisor expectations and actions, organisational learning, teamwork within hospital units, communication openness, feedback and communication about error, non-punitive response to error and staffing— three hospital-level process dimensions—hospital management support, teamwork across hospital units and hospital handoffs and transitions—and two outcome dimensions—overall perception of safety and frequency of event reporting. In addition, the survey includes two single-item outcome variables that are not used in the computation of the 12 scores: a global safety grade between poor and excellent (‘Please give your area/unit in this hospital an overall grade on patient safety’) and the number of incidents reported in the past year (‘In the past 12 months, how many event reports have you filled out and submitted?’).
Hospital staff were sent a notification letter regarding the upcoming survey that described the data collection process and the objectives in February 2009. The questionnaire was distributed 1 week later with a cover letter. Filled questionnaires could be returned in ballot boxes located in several points in every building. Reminders were sent after 2 and 4 weeks. The survey also tested the effect of numbering questionnaires on response patterns; these effects were small to absent and have been published previously.19
The survey was intended for hospital employees who worked in contact with patients or who were in leadership positions; there was no determination of sample size related to the psychometric validation. However, the achieved sample size (>1000) is sufficient for a multivariate analysis of 42 items and an expected number of 12 latent variables, both to satisfy the rule of 10 observations per variable20 and more recent suggestions that a minimum of 300–400 observations is required.21
We reversed the coding of negatively worded items and computed dimension scores for each respondent both as the proportion of positive responses (‘agree/strongly agree’ or ‘most of the time/always’) among the corresponding items and as simple averages of the responses on the original 1–5 scale. The scores of a given dimension computed using these two methods were highly correlated, and because the simple average method uses all available information, we performed the validation analyses using the simple average scores. We reported means, SDs, missing values and correlations between the scores computed by means of the two scoring methods.
We examined the similarity of ratings obtained within sectors of activity (eg, medicine, surgery, paediatrics and pharmacy) within hospital sites. For this we used mixed linear models, with each score as the dependent variable and the hospital site and the activity sector nested within hospital site as random factors. We report the percentage of variance attributable to each random factor.
To examine the internal consistency of the scales, we obtained Cronbach α coefficients and compared them with published coefficients of the original version and other published translated versions of the HSOPSC. We obtained the range and the median of published Cronbach α coefficients.
To verify the internal structure of the instrument, we performed a CFA of the 42 items (recoded as needed), assigning items to their intended dimensions and allowing non-null correlations between the dimensions. To examine the fit of the measurement model, we obtained the root mean square error of approximation, the comparative fit index, the non-normed fit index, the standardised root mean residual and the coefficient of determination. We compared our results with recommended cut-off values22 ,23 and with other published results.
To examine possible alternative structures, we performed an EFA of the 42 items. We retained factors that had an eigenvalue >1. We obtained the Kaiser–Meyer–Olkin measure of sampling adequacy (>0.6 is recommended) and the Bartlett test of independence (rejection is recommended), and examined the primary loadings on the intended dimension (higher is better, >0.4 is recommended) as well as any cross-loadings >0.3 (fewer is better).
To explore construct validity, we examined mean scores of the 10 process dimensions in subgroups of respondents defined by the two single-item outcome variables. Both outcome items were dichotomised: safety grade as ‘excellent/very good’ versus the rest, number of incidents reported as any versus none. The differences between score means were tested by t tests. In all cases, we expected a significant difference, with a higher mean score among those who gave a high safety grade and those who reported at least one incident. In addition, we obtained an effect size for each difference, that is, the difference between means divided by the pooled SD of the scores in the two groups. In all cases, we expected a medium-sized difference,24 that is, approximately 0.5.
Finally, we performed an EFA of the 10 process dimension scores. We expected distinct groupings of the unit-level process dimensions and the hospital-level process dimensions because these sets of dimensions are influenced by unit-level and hospital-level management practices and are therefore expected to covary.
Analyses were conducted using SPSS V.18, except CFA, which was performed with Stata V.12.
A majority of the 1171 respondents were women, and more than half were under 45 years of age (table 1). Nurses were the largest professional group, and about one-tenth of the respondents were physicians. About 90% of the respondents worked directly with patients. The two main hospital facilities represented 76% of the respondents.
Among the respondents, 5.3% (N=60) gave their work unit a global safety grade of ‘excellent’, 46.9% (N=530) said ‘very good’, 39.2% (N=443) said ‘acceptable’, 3.7% (N=42) said ‘poor’ and 4.8% (N=54) said ‘failing’. The majority (61.1%, N=690) of respondents had not reported any event related to patient safety in the previous year; 25.0% (N=282) reported 1 or 2 events, 8.5% (N=96) reported 3–5 events, 3.4% (N=38) reported 6–10 events and 2.1% (N=23) reported 11 or more events.
The mean percentage of positive responses ranged from 28.1% (hospital management support) to 79.4% (teamwork within hospital units) (table 2). When the scores were computed as simple means on a scale from 1 to 5, the pattern of the averages was similar. The summary scores obtained by the two computation methods from the same items were highly correlated: the Pearson correlation coefficients ranged from 0.82 (hospital management support) to 0.90 (feedback and communication about error). Missing score values were low for all scales except frequency of event reporting, for which 11% of respondents had missing values.
For most scores, a greater proportion of variance was attributable to activity sectors (within hospital sites) than to hospital sites (table 2). The dimensions that had the highest levels of agreement within activity sectors were staffing (17.1% of variance) and overall perception of safety (14.2%).
Cronbach α coefficients ranged from 0.57 (organisational learning) to 0.86 (frequency of event reporting) (table 3). The median value was 0.73, and five coefficients were below 0.70. Most coefficients were lower in the French version than in the original US version, but the difference was 0.02 or less for 5 of the 12 dimensions. The largest differences were observed for organisational learning (0.57 vs 0.76) and non-punitive response to error (0.60 vs 0.79). The α coefficients of these two dimensions were also lower than the median of values reported for other translations, whereas the other 10 coefficients were as high as or higher than the median of published results.
Confirmatory factor analysis
We estimated a measurement model for the 12 latent dimensions as specified by the authors of the instrument, in which all dimensions were allowed to have non-zero correlations (table 4). The root mean square error of approximation, the standardised root mean residual and the coefficient of determination achieved the recommended values, but the comparative fit index and the non-normed fit index were both below the recommended thresholds. These results were less favourable than those of the original instrument but in line with the performance of other translations.
Exploratory factor analysis
This analysis of the 42 HSOPSC items yielded 10 dimensions with an eigenvalue greater than 1 (table 5). These dimensions captured 58% of the total variance, and the Kaiser–Meyer–Olkin measure of sampling adequacy was 0.88. The structure identified 10 factors rather than 12 as items belonging to the scales communication openness and feedback and communication about error loaded on the same factor, and similarly for items that belonged to teamwork across hospital units and hospital handoffs and transitions. Besides this, the loading structure was clean: 39 of the 42 primary loadings were ≥0.50, and 41 of 42 were ≥0.40. Of 378 cross-loadings (42 items, 9 secondary factors), only 3 (0.8%) exceeded 0.3 and none exceeded 0.5. Only one cross-loading was higher than the primary loading: the item ‘staff are afraid to ask questions when something does not seem right’, which belongs to the communication openness scale (loading 0.36), loaded also on the non-punitive response to error scale (loading 0.47). Two items from the staffing scale had moderate cross-loadings on the overall perceptions of safety scale.
Associations with single-item outcome variables
All 10 safety culture process dimension scores were significantly higher among respondents who gave a favourable safety grade (table 6). The effect sizes (ie, differences divided by the pooled SDs) ranged from 0.41 to 0.79. In contrast, effect sizes were small and for the most part non-significant when respondents who had reported at least one incident in the past year were compared with those who had not. The only significant differences (for staffing and hospital management support) were in a direction opposite to that postulated by theory: respondents who had reported incidents had lower scores on these dimensions.
EFA of process dimension scores
The analysis of the 10 process scores confirmed the postulated model as the seven unit-level process dimensions formed one factor and the three hospital-level process dimensions formed another (table 7). These dimensions captured 50% of the total variance, and the Kaiser–Meyer–Olkin measure of sampling adequacy was 0.82 (ie, higher than the recommended minimum of 0.6).
This study examined the psychometric properties of the French version of HSOPSC. Overall the performance of the instrument was less satisfactory than that of the original US version. Our results, as well as those published by others, raise several questions regarding the HSOPSC and its underlying theoretical model.
The acceptability of the instrument and of the data collection methods was good as attested by a participation rate of 74%. Missing data were rare for all dimensions except frequency of event reporting, for which missing scores represented over 11%. One possibility is that some respondents were not familiar with incident reporting, especially as a formal incident reporting system was not in place in all parts of this hospital at the time of the survey. Another possible reason is that the questions ask about what ‘is done’ in three hypothetical situations and not about what the respondent would do. Some respondents may have felt that they cannot say what other people do in a given situation.
The internal consistency of 11 of the 12 scales conformed to the standards set in the original publication,4 where Cronbach α coefficients >0.60 were considered acceptable. However, other sources would demand higher standards—for example, Nunnally and Bernstein recommend a range of 0.7–0.8.25 In our case, five α coefficients were below 0.7. This may reflect shifts in meaning due to the translation. Another possibility is that true variance was lower in our sample, taken from one hospital, than in the sample selected from 21 hospitals used in the original study; if so, reliability coefficients would be lower even if the variance due to measurement error was identical.
α coefficients were low for some scales, such as staffing, for all published versions of the instrument.7–14 ,26 This may reflect the nature of the scale more than any problems with translations. The staffing scale is not a typical psychometric scale where all items are caused by the latent variable but otherwise independent of each other. When staffing is insufficient, an organisation may take measures such as requiring overtime or hiring temporary staff (these are two of the items), but not necessarily both, or yet do nothing, in which case staff work in crisis mode (another item). These three consequences need not covary—in fact they can be mutually exclusive in practice. A low Cronbach α coefficient does not therefore necessarily reflect poor reliability, but rather a violation of the assumption of conditional independence between item responses. It would be helpful to obtain alternative measures of reliability for the HSOPSC, such as test–retest reliability coefficients.
The results of the CFA were satisfactory for some indicators of model fit, less so for others. Here again our results were comparable with other published coefficients obtained for translated versions of the instrument,12–15 whereas the original version performed better.4 It is possible that this is the result of the relative overfitting of the instrument to a specific context. The authors of the original instrument started with 79 candidate items and 14 dimensions.4 The reduction to 42 items and 12 dimensions was driven in part by data analysis, specifically the CFA. Inevitably, the resulting instrument will have a satisfactory CFA structure in the dataset in which it was developed. That this model should apply less well to other datasets is to be expected—the same issue of overfitting occurs in the development of other statistical models.27 ,28 Of note, independent users of the English-language version of the HSOPSC also noted issues with factor structure,14 ,25 which shows that this problem is not limited to translations.
Despite a less than stellar CFA in our study, the results of the EFA of the 42 items were in our opinion convincing. The main discrepancy with the postulated model was the merging of two pairs of dimensions. However, these dimensions are conceptually related. Both communication openness and feedback and communication about error deal with communication, and these two dimensions were found to be highly correlated in several studies,7 ,12 ,13 ,17 including an analysis that used the original instrument.29 Similarly, both teamwork across hospital units and hospital handoffs and transitions deal with collaboration between teams and units, and these dimensions too were found to be correlated by others.13 ,17 ,29 In contrast with other published results of EFA, we observed no reallocation of items to recomposed dimensions,7 ,11 ,13 ,26 and neither did we need to eliminate problematic items, as others have done.14 We saw only one major cross-loading, and even that was intuitively acceptable, since ‘being afraid to ask questions’ is conceptually related to a non-punitive response to error.
In construct validity tests, our observation that all process dimensions were associated with the safety grade provides another argument for validity. This argument is however not iron-clad: the tested instrument and the validation variable were both obtained in the same survey so that a halo effect due to shared methods may explain part of the association. In contrast, validation by past reporting of incidents failed: the HSOPSC scores were either not associated at all with such behaviour or those who had reported incidents had lower HSOPSC scores, not higher. However, similar results were obtained by the developers of the original instrument4 ,29 as well as by others.13 ,16 This suggests that incident reporting behaviour is not a simple correlate of other dimensions of patient safety culture perceived by the respondents. We suggest that until determinants of incident reporting are better understood, this variable should not be used for construct validation of the other scales.
Finally, the factor analysis of the process dimension scores (table 7) confirmed the conceptual distinction between unit-level and hospital-level process scores.
This study has several limitations. First, we did not keep track of the hospital unit where each respondent is working in order to ensure confidentiality. Therefore, we cannot take into account clustering within units in the analysis as others have done.16 ,29 ,30 We may have seen higher levels of agreement regarding the patient safety climate within units than we have observed within activity sectors at a coarser level of definition. Also, we were not able to examine a possible selection bias as we obtained no data about the non-respondents. Finally, this study was conducted at one multisite hospital. A broader validation context would strengthen users’ confidence in the psychometric properties of this instrument.
Critical assessment of the HSOPSC
Globally, the French version of the HSOPSC did not pass all tests of reliability and validity. So should the translation be improved and retested? We believe that this may not be an effective solution because several unresolved issues are inherent in the instrument and not specific to this translation.
The first problem is that the empirical evidence of validity of this instrument is limited. Internal consistency coefficients are low for many dimensions in most translated versions of the instrument, and indeed also in the original. Most factor analyses report results that conflict with the original intended structure. Not even the original instrument met all criteria of a sound covariance structure proposed by Hu and Bentler,23 even though item selection was in part driven by the adequacy of the CFA. By application of standard psychometric criteria, the case could be made that the instrument should be in part redesigned. Furthermore, little validation has been done on the basis of external evidence, collected independently of the survey, such as a measured frequency of patient safety events.31 ,32
Another concern is that the conceptual framework of patient safety climate may not be as solidly established as one might hope.33 This is reflected by the variability in measured dimensions across instruments.3 For example, the Safety Attitudes Questionnaire34 includes dimensions such as ‘stress recognition’ or ‘job satisfaction’, which are absent from the HSOPSC. Furthermore, current models are predominantly expert-driven. For example, the two most popular questionnaires to measure patient safety climate were developed on the basis of older instruments and published theoretical frameworks; neither was based on an open-ended exploration of safety culture among the target population.4 ,34 Such instruments can determine to what extent experts’ opinions are adhered to, but not necessarily what hospital staff themselves believe about patient safety. Others too have argued that qualitative research into the culture of patient safety is needed.32 ,35
We found that the French version of the HSOPSC performed less well than the original instrument in standard psychometric analyses, as was true for other translations. Whether this reflects a suboptimal translation process or reflects more general problems with the instrument is unclear.
Contributors All authors were responsible for design of study, interpretation of results and approved the final version of the manuscript to be published. FK was responsible for data collection. TVP was responsible for data analysis. TVP was responsible for writing the manuscript. AS and FK were responsible for critical review of the manuscript.
Funding Hôpital neuchâtelois, no external funding
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Because original data belong to the hospital management, they are not made available in raw format. The authors will try and accommodate any reasonable request for additional analyses.