Article Text

Download PDFPDF

The development and psychometric evaluation of a safety climate measure for primary care
  1. C de Wet1,
  2. W Spence2,
  3. R Mash3,
  4. P Johnson4,
  5. P Bowie1
  1. 1National Health Service Education for Scotland, Glasgow, UK
  2. 2University of Glasgow, Glasgow, UK
  3. 3Family Medicine and Primary Care, University of Stellenbosch, Cape Town, RSA
  4. 4Robertson Centre for Biostatistics, Glasgow, UK
  1. Correspondence to Dr Carl de Wet, Associate Adviser, National Health Service Education for Scotland, 2 Central Quay, Glasgow G3 8BW, UK; carl.dewet{at}


Introduction Building a safety culture is an important part of improving patient care. Measuring perceptions of safety climate among healthcare teams and organisations is a key element of this process. Existing measurement instruments are largely developed for secondary care settings in North America and many lack adequate psychometric testing. Our aim was to develop and test an instrument to measure perceptions of safety climate among primary care teams in National Health Service for Scotland.

Method Questionnaire development was facilitated through a steering group, literature review, semistructured interviews with primary care team members, a modified Delphi and completion of a content validity index by experts. A cross-sectional postal survey utilising the questionnaire was undertaken in a random sample of west of Scotland general practices to facilitate psychometric evaluation. Statistical methods, including exploratory and confirmatory factor analysis, and Cronbach and Raykov reliability coefficients were conducted.

Results Of the 667 primary care team members based in 49 general practices surveyed, 563 returned completed questionnaires (84.4%). Psychometric evaluation resulted in the development of a 30-item questionnaire with five safety climate factors: leadership, teamwork, communication, workload and safety systems. Retained items have strong factor loadings to only one factor. Reliability coefficients was satisfactory (α=0.94 and ρ=0.93).

Discussion This study is the first stage in the development of an appropriately valid and reliable safety climate measure for primary care. Measuring safety climate perceptions has the potential to help primary care organisations and teams focus attention on safety-related issues and target improvement through educational interventions. Further research is required to explore acceptability and feasibility issues for primary care teams and the potential for organisational benchmarking.

  • Patient safety
  • general practice
  • safety culture

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Improving the safety of patient care is a major policy concern in modern healthcare systems. In the UK, it is accepted that safety concerns should be tackled through a multimethod approach to quality improvement. Building a culture of safety in healthcare organisations is one such method.1–5 Measuring the safety culture or climate among the healthcare workforce is a key part of this process. This is important because organisations with a positive safety culture are more likely to learn openly and effectively from failure and adapt their working practices appropriately. The converse is true for a weak safety culture, which has been implicated as a causal factor in many organisational failures, including high-profile healthcare incidents.6–8

Safety “culture” and “climate” are interlinked concepts, and the terms are often used interchangeably. There is ongoing debate about their differences and similarities.9–11 Safety culture has been defined simply as “the way we do things around here” and is thought to help shape the discretionary behaviour of healthcare workers.13 Safety climate is considered to be the measurable, surface components that provide a “snapshot” of the underlying safety culture. It has been defined as the shared perceptions of safety policies, procedures and practices held by a work group.14–17

The theory and practice of safety climate measurement originated in high-reliability organisations, which are typically found in the aviation, nuclear energy and offshore oil-drilling industries, among others. They operate in high-risk, potentially hazardous environments where safety-critical resources, systems, attitudes and behaviours are of prime importance. Experiences of safety management methods in these industries are thought to be relevant and transferable to healthcare systems because of these shared safety-critical features.13 ,16 ,18–21

Many healthcare organisations have begun to look critically at ways to improve their safety climate. Various instruments have been developed for this purpose, but mainly in secondary care settings in North America.13 ,16 ,18–21 Although a qualitative approach to exploring safety culture can be taken, most available instruments are quantitative in design.20 ,22 This typically requires the healthcare workforce to complete self-report questionnaires anonymously on a periodic basis. Survey scores are aggregated to provide a measure of those “factors” or “dimensions” known to be important markers of safety climate strength (eg, the perceived effectiveness of teamwork or communication).

Measuring and examining safety climate have potentially important benefits. It enables the benchmarking of scores for important safety climate factors so that healthcare organisations and teams can monitor, compare and influence these over time. In this respect, it acts as a diagnostic, learning tool assisting the organisation or team to identify perceived areas of safety climate weakness. These can then be targeted through, for example, educational interventions as a means of facilitating potential improvement. The predictive validity of climate measures may also become apparent. Emerging evidence from secondary care suggests that safety climate scores are positively associated with clinical outcomes and the safety behaviour and attitudes of healthcare professionals.16 ,23 ,24

For these benefits to be fully realised, a high standard of safety climate measurement is required so that teams and organisations can confidently consider the collected data as a reliable and trustworthy marker of their safety climate. However, recent studies have questioned the adequacy of the development, testing and reporting of psychometric properties underpinning many safety climate questionnaires.16 ,17 ,19 ,22–24 Consequently, Flin et al have described a core set of safety factors and psychometric attributes that instruments should adhere to and possess.16 ,17 ,19 ,25

The focus of the patient safety agenda and related research in the UK is largely confined to secondary care, although the UK Royal College of General Practitioners recognises the importance of this topic in primary care.26 In National Health Service for Scotland (NHS Scotland), the nascent Scottish Patient Safety Programme has set out a series of steps to make hospital patients safer in the next 5 years.27 Work on developing similar aims for all other health sectors is underway. Improving the safety culture of all NHS Scotland organisations is a common goal. To support the programme, there is a strong requirement for rigorously tested safety management tools and techniques, which are acceptable to the NHS workforce and feasible to apply in the workplace.

The study of safety climate in primary care settings worldwide is limited to a small number of research studies.11 ,28 ,29 In the UK, a qualitative, typological instrument has been developed for primary care, to help teams conceptualise safety culture, illuminate differences between professional groups and act as an educational tool.30 Another UK study applied an adapted survey instrument to measure safety climate perceptions among primary and secondary care teams, but concluded that their factor analysis model fitted secondary care data “substantially better” than the primary care data.31 However, a relevant survey-based quantitative instrument that has been subjected to adequate psychometric evaluation appears to be lacking specifically for primary care. Given the potential benefits for improving safety culture and patient care, we aimed to develop and test a safety climate measure for primary care in NHS Scotland using appropriate scientific methods.


Theoretical consideration

High reliability theory was initially considered to guide questionnaire development. It maintains that organisations can make significant contributions to prevent and minimise the chances of accidents and untoward incidents. However, various other theories also attempt to describe a link between safety climate, safety behaviour and safety outcomes. Additional consideration was therefore given to attribution theory and the models described by Zohar, Gershon and Flin.32–36 ,15 ,19

Instrument development

Literature review and initial questionnaire design

A steering group of three patient safety specialists and three general practitioners (GPs) coordinated the development phase. A search of electronic databases (Medline, CINAHL, EMBASE and PsychInfo) for the period 1997–2007 identified 65 articles and 13 safety climate and culture instruments of relevance. The steering group reviewed these studies, jointly agreed on 13 factors and 61 items appropriate to primary care, and developed a draft questionnaire.

Primary care team input

A convenience sample of six local general practices was recruited to assist questionnaire development. Of 118 clinicians and staff, 78 (66%) completed a content validity index (rating scale range: 1=not relevant to 4=highly relevant) for each questionnaire factor and item. Semistructured interviews (n=46) also explored questionnaire acceptability, clarity and validity. Acting on the feedback, the steering group redrafted a 48-item, eight-factor questionnaire.

Content validation

A modified Delphi group (n=11) of UK primary care patient safety “experts” helped in the refinement (rewording) of questionnaire items and agreement of a final version. “Expertise” was accorded based on relevant peer-reviewed journal publications and/or NHS occupation (eg, patient safety manager). The group also completed a content validity index for each questionnaire item and factor. A minimum of 8/11 experts was required to endorse each item by assigning a rating of at least 3/4 on the scale to establish content validity beyond the 0.05 level of significance.37 All eight factors and 48 items were retained. A 7-point rating scale (ranging from 1=“not at all” to 7=“to a very great extent”) was chosen to enhance discriminant reliability.

Instrument testing

Sample size, setting and participants

To achieve sampling adequacy, we randomly selected 200/463 group general practices in the west of Scotland from the NHS Education for Scotland organisational database, which is used to monitor GP appraisal activity. Sample size was calculated in consultation with a statistician and by using the “rule of 10”, which indicated that for a 48-item questionnaire a minimum sample of 480 primary care team members was required.38 We used the Kaiser–Meyer–Olkin coefficient as a measure of our sampling adequacy, which varies from 0 to 1 with ≥0.6 necessary for factor analysis.38

Data collection

Practices were invited by letter to participate in the study. Sufficient questionnaires were then forwarded for all employed and attached team members. Questionnaires were completed anonymously, collated and returned to NHS Education for Scotland in prepaid envelopes. If three items or more were unanswered, or if all items were scored as 1 or 7, these questionnaires were excluded as this decreases discriminating power.39

Data analysis

Data were coded and entered into a Microsoft Excel spreadsheet. Negatively phrased items were reversed for consistency. All items were considered to have equal weighting. Anonymity meant that non-respondents could not be identified and so were not accounted for by weighting. Data were exported and analyses were performed in SPSS V.14.0 and in the statistical software R V.2.6.0 (libraries psy and sem).40

Factor analysis

Factor analysis is commonly used as a measure of the construct validity of a questionnaire.31 A questionnaire typically measures only some of the various possible factors (or aspects or dimensions) of a given subject. Exploratory factor analysis (EFA) extracts original factors from a dataset. This “factor solution” gives an indication of the number of factors that the questionnaire appears to measure of its intended subject.38

EFA also indicates which items appear to measure which extracted factors. In other words, EFA reduces the dimensionality of the raw questionnaire data by grouping those individual items together, which appears to measure the same factor.38 A “factor loading” is calculated for each questionnaire item ranging from 0 to 1. The closer an item's factor loading is to 1, the stronger the proposed link to that specific factor.31 ,38 Items that load strongly to only one extracted factor are useful for their discriminating potential and should be retained. All safety climate factors were assumed to be correlated. We therefore used maximum likelihood factor analysis with a promax rotation to extract factors and calculate factor loadings.

Although the EFA extracts factors, the “right” number of factors to retain in the questionnaire is chosen by the research team based on their defined criteria. We used parallel analysis as it allows the number of non-random factors to be selected,41 ,42 a benefit over conventional criteria.43 ,44 In parallel analysis, many simulated, uncorrelated datasets with the same number of rows and columns as the original are computer-generated and factor analysed. Their mean eigenvalues are then combined and displayed graphically on a scree plot.45–47 (An eigenvalue relates to the proportion of the additional variance explained by each additional factor.) The original factors exceeding the 95% CI of the simulated eigenvalues are retained (see figure 1).

Figure 1

A scree plot showing the eigenvalues from the 30 questionnaire items (solid circles). The grey line represents the 95% confidence band for eigenvalues from 100 simulated, random and uncorrelated datasets. The number of significant factors can be inferred from the number of eigenvalues exceeding the random eigenvalues (ie, above the solid grey line).

Further validation of the questionnaire was undertaken through confirmatory factor analysis—a statistical method that tests the hypothesis that a specific number of factors “fit” the data well and/or better than a different number would have.38 A minimum of three different goodness-of-fit coefficients is calculated and compared for each likely factor solution.48 Coefficients were calculated for models with four, five and eight factors. The four-factor model was included to control for the tendency of confirmatory factor analysis to favour models with fewer factors.38 Coefficients commonly used in validation studies were chosen, namely standard root mean residual, Bentler comparative fit index and root mean square error of approximation.

Reliability testing

To measure internal reliability, we calculated the Cronbach α coefficient for the questionnaire and for each factor. A minimum score of 0.7 was required for adequate reliability.31 ,49 ,50 This is the most commonly reported measure of scale reliability but has the potential limitation of overestimating reliability for social and complex phenomena such as safety climate, which may bias reliability estimates.51 To control for this potential, we also calculated and report the Raykov coefficient for the questionnaire and for each factor.

Item deletion

Selected questionnaire items were deleted in a stepwise manner to ensure retention of only those that improve validity, reliability and feasibility. Factor loadings were used to remove items with “weak” loadings, or with equal loadings to more than one factor.38 Subsequently, deleted-item reliability coefficients were calculated, which allowed identification and deletion of items that reduced instrument reliability.52 Continuous analyses were performed to ensure that the original results and factor structure remained unaffected.


Instrument development

Content validity

A minimum of 8 out of 11 experts endorsed each questionnaire item and safety climate factor. Their aggregated scores gave an overall instrument Content Validity Index (CVI) of 0.94 with all factors and items scoring >0.8.

Instrument testing

Response rate and practice demographics

Forty-nine of the 200 invited practices (24.6%) participated in the study. A total of 563 questionnaires was completed and returned by primary care team members from a potential study population of 667 (84.4%), giving a Kaiser–Meyer–Olkin of 0.94. Responses ranged from 4 to 27 participants per practice. Relevant demographic data are outlined in table 1.

Table 1

Study population

Factor analysis, reliability and item reduction

The original questionnaire was reduced from 48 items to 30 items with no alteration in the safety climate scores. Items were removed because their factor loadings were <0.4 (n=11), they decreased instrument reliability (n=5), did not improve instrument reliability, had relatively low factor loadings and were grouped with four other items with stronger loadings (n=2). The factor loadings of the retained items are shown in table 2.

Table 2

PC-SafeQuest: factor loadings, reliability coefficients, mean scores and SD of the retained questionnaire items and safety climate factors

Parallel analysis identified five significant safety climate factors (workload, communication, leadership, teamwork and safety systems). Further application of our statistical method found that this five-factor solution provided the best fit to the original dataset with satisfactory goodness-of-fit coefficients (table 3). The “teamwork” factor is more correlated (interdependent) than the other factors. Correlations between all five safety climate factors are shown in table 4. Overall instrument reliability was good, with Cronbach α and Raykov coefficient a comparable 0.94 and 0.93. All factors had coefficient scores of ≥0.72 (table 2).

Table 3

Goodness-of-fit coefficients for a model with four, five or eight factors

Table 4

Correlation matrix of the mean factor scores from the 30 questionnaire items


This study outlines the first steps of development and testing of a safety climate measure for primary care in NHS Scotland, which we have called PC-SafeQuest. The final questionnaire contains 30 items (variables) that measures five safety climate factors. Initial development involved contributions from the intended user groups while drawing on existing theory and worldwide safety-related literature. Instrument validity and reliability were good, with all retained items loading strongly to only one factor and reliability coefficients being satisfactory and comparable.

We explained that the same dataset can potentially be analysed using various methods, which would result in different numbers of factors and items retained in a questionnaire. This is evident in practice for which available instruments all differ in the numbers and types of items that are used to measure diverse safety climate factors. However, it has been argued that a set of universal or “core” factors underpin safety climate across different settings and industries, which are then complemented by sector-specific factors.19 It would be reasonable to expect a newly developed instrument to contain at least these core dimensions. Research in high-reliability industries has identified and described the following as core factors: management and supervisor commitment to safety, workload, safety systems (training, employee satisfaction with safety systems, etc) and procedures/rules. Healthcare-related research identified similar core factors with the exception of procedures/rules.19

Our findings closely mirror the healthcare core dimensions—safety systems, workload and leadership. (We combined the terms “management” and “supervisor” into “leadership”, as these roles are often fulfilled by the same person in general practices.) We identified two additional safety climate factors in the primary care setting: teamwork and communication. These factors were also identified during the development of the Manchester Tool (MaPSaF) for primary care.30 The relatively high correlation measure reported between teamwork and the other factors may arguably reflect the team-centred focus necessary for the delivery of a high-quality service in primary care.

Strengths and limitations

We involved a wide range of primary care team members and expert contributors in preference to relying solely on a combination of theory, existing instruments and our own ideas while developing questionnaire content. Psychometric testing and evaluation was performed according to a recommended standard, which ensures a rigorous scientific approach.17 ,22

The use of questionnaires to explore perceptions of safety climate has been called “quick and dirty”, as this may provide a simplified, superficial description of the underlying safety culture. From this perspective, it only measures the surface components by eliciting the transient attitudes of individuals at a given point in time.9 ,10 ,13 ,53 For a more in-depth assessment of safety culture, a qualitative approach is recommended and one such method has been developed for primary care in the UK.20 We would argue, however, that both approaches have merit and may even be complementary.

The qualitative approach appears reliant on the whole team having protected time that will be resource intensive and may not, therefore, be widely feasible. The speed with which the questionnaire survey can be administered, analysed and reported is potentially a key strength, offering a time-pressured workforce greater flexibility and convenience. Overall, it may be a more efficient, feasible and acceptable strategy for improving quality and safety in a health sector where the management of patient care is often complex and uncertain. In addition, the reality of a target-driven, financially incentivised primary care service characterised by a high volume of patient throughput and limited opportunities for collective learning and development arguably makes questionnaire use a more desirable option.

Single-handed practices were excluded as respondent anonymity could not be guaranteed and because the larger sample sizes required to draw meaningful conclusions from the data may not be feasible on an individual practice level.54 The practical significance of aggregated, numerical scores has also yet to be clarified and will require further study.23

Context and implications of findings

A direct correlation between safety climate and safety outcome has not conclusively been proven in healthcare.11 ,14 ,22 ,28 ,54 In terms of establishing the predictive validity of the instrument, a difficulty for primary care is correlating safety climate scores with a clinical or services outcome measure as a means of assessing its impact on safety. In secondary care, this may be possible by making use of various morbidity and mortality indicators. A similar, convenient measure has not been clearly defined in primary care, as linking safety climate scores with mortality and morbidity levels is not feasible, nor is assessing rates of behaviours (such as non-compliance with protocols or risk taking).

Possible independent outcome measures to consider may include the practice teams' global patient satisfaction survey score, the level of adverse incident reporting and rates of sickness/absence among the team. It is unclear if work-related staff injuries occur in sufficient numbers to make comparisons worthwhile. Arguably, the key indicator may be adverse events. If patient harm levels could be measured reliably in primary care, then it may be possible to examine the relationship between safety climate, safety behaviour and safety outcome. Growing interest in “trigger methods” for reliably quantifying and serially monitoring error and harm levels through the structured review of patient records may have a key role to play as a future outcome measure. An alternative strategy is to use self-reported outcome measures (eg, perceived decrease in the number of informal complaints) but this is not a recommended good practice. Overall, it is clear that much greater consideration is required in determining and defining appropriate outcome measures in primary care and the potential mechanisms underlying the associations between these measures and safety climate.13 ,17

Safety climate can be investigated and measured at multiple levels within a primary care organisation.13 ,15 ,17 ,54 The developed questionnaire is intended for use initially at the team level. In general practice, the team is defined as those directly employed by the practice (eg, receptionists, practice nurses, salaried doctors and managers) and those NHS-employed staff who are attached to the practice (eg, health visitors, district nurses and pharmacists) but who nonetheless are integral members of the GP team. The remaining group is the independently practicing doctors or “partners” who collectively own, and are directly accountable for, managing the business and its development. It is assumed that the comparatively small size of the GP team improves teamwork, but there is a clear workforce hierarchy similar to that in most other industries. There is now a growing consensus and recognition that of all safety climate factors, management commitment to safety is the key issue of concern.19 ,25

A further possible use is at the organisational level. Aggregating data across a geographical or health authority region could inform clinical governance, safety and educational priorities. Wider dissemination of safety climate measurement could over time allow for benchmarking, help to bridge the current knowledge gap, inform the development of educational solutions for individual practices, demonstrate safety climate improvement over time and link this to improved patient safety.13 ,21 ,30 ,36


PC-SafeQuest has the potential to help primary care organisations and teams build a culture of safety by focusing attention on safety-related issues and facilitating improvement through educational interventions. Future research is needed into the processes that effectively create and sustain a culture change and into ways in which climate assessment can be combined with other safety improvement methodologies.13 ,17 We also need to understand the complex manner in which safety climate interacts with other determinants of behaviour, such as internal and external incentives and the motivational links influencing behaviour between patient safety and worker safety.11



  • Funding NHS Education Scotland.

  • Competing interests None.

  • Ethics approval This research project was considered by the Multi-Research Ethics Committee (A) based in Lothian National Health Service board, but was judged not to require ethical approval.

  • Provenance and peer review Not commissioned; externally peer reviewed.