Article Text

Download PDFPDF

How do stakeholder groups vary in a Delphi technique about primary mental health care and what factors influence their ratings?
  1. S M Campbell,
  2. T Shield,
  3. A Rogers,
  4. L Gask
  1. National Primary Care Research and Development Centre, University of Manchester, Williamson Building, Manchester M13 9PL, UK
  1. Correspondence to:
 Dr S Campbell
 National Primary Care Research and Development Centre, University of Manchester, Williamson Building, Oxford Road, Manchester M13 9PL, UK;


Background: While mental health is a core part of primary care, there are few validated quality measures and little relevant internationally published research. Consensus panel methods are a useful means of developing quality measures where evidence is sparse and/or opinions are diverse. However, little is known about the dynamics of consensus techniques and the factors that influence the judgements and ratings of panels and individual panellists.

Objectives: (1) To describe differences in panel ratings on the quality of primary mental health care services by patient, carer, professional and managerial panels within a Delphi procedure; and (2) to explore why different panels and panellists rate quality indicators of primary mental health care differently.

Design: Two round postal Delphi technique and exploratory semi-structured interviews.

Participants: 115 panellists across 11 panels. Eleven panellists were subsequently interviewed.

Results: 87 of 334 indicators (26%) were rated face valid by all 11 panels. There was little disagreement within panel ratings but significant differences between panels. The GP panel rated the least number of indicators valid (n = 138, 41%) and carers the most (n = 304, 91%). The way in which panellists interpreted and conceptualised the indicators and their definition of quality of mental health care affected the way in which participants made their ratings.

Conclusions: Stakeholders in primary mental health care have diverse views of quality of care and these differences translate into how they rate quality indicators. Exploratory interviews suggest that ratings are influenced by past experience, expectations, definitions of quality of care, and perceived power relationships between stakeholders.

  • quality indicators
  • consensus techniques
  • mental health
  • Delphi
  • primary care
  • qualitative

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Mental health is a core part of primary care. However, while attempts have been made to develop quality indicators for primary mental health services in the UK,1–3 there are few validated quality measures.4,5 Those that are available tend to focus on clinical aspects of mental health care, such as counselling6 and guidelines,7 rather than the organisational delivery of services.4 There is also little published international research in this area.8,9,10

The various stakeholders in health care (patients, professionals and managers) define quality of care differently, both generally11,12 and in relation to mental health.13–16 This has implications for the success of quality improvement measures as their legitimacy will depend on their acceptability to the various stakeholders they affect and quality measures must therefore reflect this diversity. The UK government has also advocated involving service users and carers within health and social care decision making.17

The lack of evidence and differences in stakeholders’ perspectives makes primary mental health care ideal for developing quality measures using consensus techniques,18 which combine expert opinion with available evidence and are an increasingly prominent means of developing health service quality indicators.18,19 Consensus methods include the RAND Appropriateness Method20,21 and the Delphi technique.22,23 However, little is known about the dynamics of consensus techniques, how panels reach rating decisions, and the determinants of ratings by individual panellists. Moreover, it has been recommended that qualitative research should supplement quantitative analyses to understand participants’ underlying thought processes within consensus techniques.24 This is especially important where there are multiple stakeholders.

This paper describes quantitative differences within and between panel ratings in a two round postal Delphi technique that has been described previously,1 which developed a generic set of quality indicators for the organisation and delivery of primary mental health care from a multiple stakeholder perspective (including patient, professional and manger panels). It also explores qualitatively why different panellists within the Delphi procedure rated quality indicators differently, and considers the implications of differences in ratings and opinions about quality of primary mental health care for policy and practice.


Delphi technique

The Delphi technique is a consensus method which involves the administration of two or more rounds of questionnaires.18 Panellists are selected according to their relevant expertise, statements on a given issue are developed either by the panel members or researchers, and the panellists are asked to rate the statements by questionnaire with results fed back between rounds. A large group can be consulted from a geographically dispersed population. Most applications of the Delphi technique involve a postal questionnaire, ensuring anonymity from other panel members because panellists usually do not meet or discuss issues face to face.

The credibility of a consensus technique depends heavily upon panel composition.18 For example, panels composed of different stakeholders rating the same statements produce different ratings.25,26 So how is an expert defined? The answer relates to the research objectives. It is crucial to identify relevant stakeholders before conducting consensus techniques as a panel must reflect the constituency of stakeholders it is intended to represent. This could be any individual with relevant knowledge/perspective or those with an extensive publication record in the area.

Primary mental health care Delphi process

A set of indicators was constructed from previously published guideline statements, indicators, and standards relating to primary mental health care (including 22 indicators based on the standards of the mental health NSF). Sources of published/grey literature were also examined and patient focus groups were conducted which identified aspects of care (and thus indicators) important to users.1

Eleven panels were convened (box 1), each composed of 9–12 representatives. Details of the composition of the panels have been described previously;1 briefly, panellists in professional panels reflected a combination of active practitioners and individuals selected on the basis of their national reputation (publication record, nominees of professional organisations, or involvement in specialist societies). Similarly, patient and carer panels contained known patient/carer advocates and those involved in patient/carer organisations as well as individual patients and carers. All panels reflected a geographical spread across Great Britain.

Box 1 The 11 panels

  • Eight professional panels

    • – Clinical psychologist

    • – Community psychiatric nurse

    • – Counsellor

    • – General practitioner

    • – Health and social care commissioner

    • – Nurse (practice, district, health visitor)

    • – Psychiatrist

    • – Social worker

  • Carer panel

  • Patient panel

  • Voluntary organisations panel

A two round postal Delphi survey involving these 11 panels was carried out by the authors between March and July 2000,1 with panellists asked to rate the list of indicators on a continuous integer 9 point scale.

Quantitative data analysis

The validity of an indicator was defined as the extent to which it related to an aspect of care necessary for providing high quality primary mental health care. Indicators were rated valid on a panel-by-panel basis if they had an overall panel median score of 8 or 9 “with agreement”.1,21 Agreement existed where 75% or more of ratings within a panel fell within the top tertile (7–9), and disagreement where 30% of ratings within a panel were in both the lower tertile (1,2,3) and higher tertile (7,8,9).1,21 χ2 tests were used to test for differences in the types of indicators rated valid by the panels. Kappa values were calculated to assess the level of agreement between panels.27 All analyses are based on round 2 data.

Qualitative interviews and data analysis

Semi-structured interviews were conducted to explore the factors that influenced the ratings of panellists. Panellists chosen for interview were a purposive sample selected to reflect the differences in the quantitative ratings of the 11 panels (n = 11). It was not possible to interview participants from all panels for resource reasons and subsequently participants were selected to specifically reflect the more polarised ratings of panels from the quantitative analyses focusing upon patients, carers, and the service providers in terms of general practitioners, psychiatrists, and commissioners. The type of panellists interviewed and the reason for their selection are shown in table 1.

Table 1

 Types of panellists interviewed and reason for their selection

The interview schedule was based on participants’ views of why they had rated the quality indicators in the Delhi technique how they had done so, their experience of taking part in the Delphi process, their views of what constitute quality within primary mental health care, and whether the indicators had reflected these views.

Informed consent was obtained from participants and all interviews were taped and fully transcribed. All transcripts were checked and analysed by the interviewer. Themes emerging from the data were categorised, classified,28 and compared across categories using the constant comparative method of qualitative data analysis.29 The content and meaning of the data was discussed by TS and SC. Interpretation of the themes was agreed between the authors.


Quantitative differences in ratings within and between panels

The response rate was 90% in round 1 and 89% in round 2. There were no missing data (all indicators were rated by all panellists). 367 indicators were included in round 1 and 334 in round 2.1 The majority (97%, n = 324) of indicators were rated valid by at least one panel and 87 (26%) were rated valid by all 11 panels1 (full details of these 87 indicators are available at

The level of disagreement within individual panels was low. The carer and psychiatrist panels were the only ones that demonstrated disagreement within their panel, although in both cases for less than 1% of the indicators. There were, however, differences between panel ratings in terms of the variation in the total number of indicators rated valid (table 2) and the level of agreement between panels (table 3). Agreement between panels was modest with no Cohen κ coefficient of agreement above 0.6 which would represent a good strength of agreement.25 The most polar ratings were between the carer and general practitioner panels (κ = 0.067) and the most highly correlated between commissioners and nurses (κ = 0.587). The patient and carer panels showed poor agreement (κ = 0.344).

Table 2

 Percentage of indicators rated valid by each panel after Delphi process

Table 3

 Calculated κ values illustrating the level of agreement between panel groups in defining quality of care

The general practitioner, psychiatrist, commissioner, and counsellor panels did not consider it important to have protocols agreed at the health authority/primary care trust level whereas the carer, patient, community psychiatric nurse, and voluntary organisation panels did. Similarly, the patient, carer, voluntary organisation, clinical psychologist, counsellor, and social worker panels agreed that a quality service requires general practices to provide particular services on site (e.g. social worker, complementary therapies) whereas the remaining panels agreed that services should be available on an off site basis only.

The general practitioner panel rated significantly more indicators valid at the practice level than the higher (primary care group/health authority) level (p<0.01). Alternatively, the carer, clinical psychologist, community psychiatric nurse, nurse, and voluntary organisation panels rated significantly more indicators valid at the higher level (p<0.05).

The patient, carer, and voluntary group panels agreed that a practice nurse should be involved in the care of mental health problems whereas the service providers did not. The carer, patient, and voluntary organisation panels also disagreed with the service providers regarding patients in crisis having access to counselling services on an emergency basis. They also rated valid, where the service providers did not, that general practitioners and other members of the primary health care team should make home visits for people registered with the practice experiencing a mental health crisis, irrespective of the time of day.

Panel ratings also differed in the extent to which they used the validity scale during the rating process (table 4). For example, carers were least likely to use the bottom end of the scale whereas psychiatrists were the most likely to use this range.

Table 4

 Rating process: extent to which the 11 different panels used the full range of the validity scale*

Reasons for differences between panellists’ ratings

Rating and interpretation of indicators: qualitative data

Participants adopted a variety of perspectives when rating the indicators which were linked to the role, relationship, and experience they had to primary care and mental health services. A commissioner stated that they “just applied in my mind how I thought it would impact on the patient experience” and a patient “compared it to my experience as a user and also to other user’s experiences”.

Interpretation of the meaning and significance of quality indicators in the contemporary policy and practice context was also influential:

Clinical governance is a sort of new term and really I did not even know what it meant. I still have that problem I think, especially in primary care where it has been a sort of minefield of problems of how you establish these sort of clinical governance principles.” (Psychiatrist 1)

At the time I did this is I was in a world of trying to develop best practice at a practical level but using things like indicators and performance mechanisms because that is part of my job… I was rating them as indicators alongside my experience of what indicators like this tend to look like as well their practicability.” (Service Development Officer)

Five main themes emerged from the interviews as explanations of participants’ ratings in the Delphi process, two relating to the rating process and three to the way participants conceptualised quality of care (fig 1).

Figure 1

 Potential factors influencing the ratings of stakeholders within the Delphi survey.

The use of the validity rating scale related to how participants prioritised the importance of indicators rather than simply the ease of filling in the scale:

I think everything was important …, it gives you more to work with”. (Carer 1)

[Int: “So what would a 9 mean to you?”] Psychiatrist: “Well I think what you would call human rights type of issues, treat people with respect, courtesy etc., or what you might think was a key clinical issue, which really if it wasn’t done, you were not doing your job terribly well.” (Psychiatrist 1)

However, participant “fatigue” was acknowledged as a contributing factor in how participants rated indicators, and panellists may not have been making judgements in the same way at the end of the questionnaire as at the beginning.

I think it got to the point that I felt really fed up with it. But then I thought no, I am determined to finish it..... so I might not have been making judgements in the same way as I was at the beginning. At the beginning I was obviously enthusiastic. I think towards the end I wanted to get it finished.” (Patient 1)

Conceptualising quality of care

Perceived control over mental health service provision appeared to influence participants’ ratings. For a patient/carer this was about what aspects of service provision they believe they control and for a health practitioner the extent to which they are in control of the patient’s health status and health related quality of life.

A tension was identified between what professionals felt should be provided (that is, ideal provision to maximise care for individuals) and what they were actually able to provide (pragmatically recognising the limitations of the healthcare system):

I think the Delphi is a useful method. The difficulty is moving between what you know is possible given the constraints and what you would like to be possible in an ideal world.” (GP 1)

By comparison, while patients and carers rated more indicators valid, they expressed feelings of powerlessness to influence both their care and health professional behaviour in negotiating care. Patients and carers also felt that professionals were not accountable for the decisions they took and this was reflected in their advocacy of the use of guidelines and protocols.

The providers, the doctors, the consultants are certainly not right about everything. But some of them act like God. And it is not on. There’s too much power not shared responsibility. It’s terrible.” (Patient 1)

I think the system has got worse, It isn’t about money either, it is about getting a proper protocol in to the system. For instance, my son walks out of hospital and he’s got no money, he’s got no food and no-one has checked up on him. So that means I am going to give somebody else a load of grief, because I am going to write and I am going to say, you know, why didn’t anyone check on him? It could all be avoided if they had a proper protocol for when this happens. Where is the protocol?” (Carer 1)

Past experiences and future expectations also influenced participants’ ratings, both providers and patients. The availability of appropriate services was a key issue for the general practitioners and psychiatrists, linked to a recognition of the limitations of the healthcare system, and also carers and patients:

.. we have had poor secondary care services and we have not had a regular consultant for the last 5 years..” (GP 2)

Say I had a crisis just now and I phoned up the practice, I would probably be very lucky to find someone there because they are not there are they.” (Carer 1)

Patients and carers advocated the importance of non-clinical outcomes and the need for professionals to understand and recognise what it is like to live with a mental health problem (treating individuals as persons not as a diagnosis). While a commissioner of services stressed that “people react to these things in very different ways based on their life experience and based on what their jobs are and so on” (service development officer), the health service professionals interviewed emphasised a strategic definition of quality of care, encompassing how mental health services should be organised and delivered rather than what is important at the individual level of the consultation.

I think psychiatrists might see it (holistic approach to mental health, taking into account cultural and spiritual aspects) as being wishy-washy. For some people I would recognise that it was important but not as a general concept. It is also probably to do with the process of dealing with serious mental illness and people who can’t even look after themselves properly who have got very basic needs. You reflect that the mind set is about those very basic needs.” (Psychiatrist 2)

The idea that you can identify someone who is having a specific need which requires little silos of care is not appropriate in my eyes. I see a quality service as being involved in primary and community settings and accessing specialist services from a variety of sorts at different points in their care pathway. The primary and community setting and that level of care is critical for the vast majority of people with mental health problems.”(GP 2)

However, patient and carer participants viewed a quality service in relation to how it met individual needs and advocated an holistic approach, highlighting psychological and social aspects of care such as friends and family support, social care support, psychological therapies, and activities producing a sense of self-esteem.


While the level of agreement within panels was high, there were differences between panels, particularly between the patient/carer panels and the GP/psychiatrist panels. The exploratory interviews provide some qualitative insights into factors that influence individual panellist’s ratings in a consensus technique in relation to primary mental health care. These include differing value systems, experiences, expectations and power relationships. Panellists’ ratings were influenced by how the indicators were interpreted and by how quality of care was conceptualised; for example, in meeting individual needs and the roles and relationships panellists have in relation to service provision.


While this study provides some quantitative and qualitative insights into the determinants of panellists’ ratings in a Delphi technique, it has a number of limitations. Firstly, the indicators intentionally related to non-clinical aspects of organisation1 and were not diagnostically specific (i.e. depression). They cannot therefore be sensitive to issues of severe and enduring mental illness (psychosis or schizophrenia) or more common illnesses. Secondly, the interviews were exploratory and were undertaken with only a limited number of panellists for resource reasons and did not cover all 11 panels. The panellists who were interviewed were sampled purposively to reflect quantitative differences in panel ratings. While the views expressed by the participants were correlated with their panel’s rating behaviour in terms of high or low ratings of particular indicators, it is not possible to know if these views are generalisable to other panellists involved in the Delphi process or to wider stakeholder perspectives generally (patients, psychiatrists, etc). The results must therefore be treated as suggestive insights into rating behaviour by individual panellists rather than definitive determinants which require replicating in a larger sample. Alternative qualitative approaches could include an ethnographic approach observing how panels functioned. Finally, while a panel composition of 9–12 members is normal for consensus techniques,18 the numbers of panellists per panel is only nominal. However, disagreement within panels was low and using a median cut off of 8 or 9, as in this study, has been shown to increase the reproducibility of ratings.30

Implications for those undertaking consensus techniques

While agreement between the 11 panels was obtained in terms of a core set of indicators,1 this does not mean that there was consensus on what constitutes quality of care. Some panels (e.g. patients and carers) rated a greater number of indicators valid than others (e.g. GPs and psychiatrists). This has implications for multiple panel consensus techniques. A key characteristic of good consensus methods is that ratings are democratic.31 All panellists within a panel had equal weight in influencing the panel’s ratings. However, because we only included indicators rated valid by all 11 panels in the core set,1 this meant that those panels rating the least number of indicators valid (GPs, psychiatrists) had a greater influence on the final set than those rating high numbers of indicators valid (e.g. patient and carers). If the Delphi technique had been restricted to just GPs’ ratings, the final set would not have been vastly different; the same cannot be said for patients.

The differences in the ratings of the 11 panels emphasise the importance of involving all relevant stakeholders and the need to have specific objectives for conducting the procedure—in this case, to identify a core set of indicators rated valid by 11 different stakeholder groups. The type of feedback received by panellists can also influence their ratings.25 If the panellists on all 11 panels in this study had received collective feedback rather than panel specific feedback, the core set of indicators would have been different.

Narratives from the interviews would suggest that the Delphi technique might be likened to a social process where outcomes are the product of the current situation in which panellists find themselves. The processes involved in interpreting a question and formulating an answer are complex and can be affected by a range of social and cultural factors including people’s understanding of the process and subjective interpretation of indicators.32 Moreover, panellists often had different reasons for rating the same indicator valid. The social process might be even more important when panellists meet face to face as in a nominal group technique or the RAND appropriateness method.18

Implications for policy and practice

Although the patient and carer participants advocated indicators reflecting a therapeutic relationship in terms of their own individual circumstances, they also advocated protocol adherence by professionals. These views incorporate both evidence based medicine and patient centred care.33 These can seem opposing paradigms, requiring professionals to be both patient centred (for example, providing care based on individual circumstances) and evidence based (for example, protocols). However, they are not mutually inclusive and good doctor communication is required to integrate them.34

While the wide range of factors shown in fig 1 reflects the complexity surrounding primary mental health care, similar findings have been identified in relation to quality of care generally. These include contextual/environmental factors, need, expectations and past experiences of care and the prevailing organisational culture of the healthcare system.35,36 Ratings of quality of care may therefore also be seen as a social construct shaped by values and expectations.

The indicators developed from the Delphi procedure1 may help policymakers, professionals, and patients to reflect on a set of quality indicators demonstrating consensus across multiple stakeholders. However, the Delphi procedure also showed the diversity of stakeholder opinions within primary mental health care and the polarised views of patients and professionals. For care to be patient centred and to reflect the needs/opinions of patients, it must address the higher number of indicators/issues rated valid by the carer and patient panels.

While mental health has been identified as a priority for clinical governance by many primary care trusts in England, many are unsure about how to address the mental health quality improvement agenda including the NSF.37,38 The application of quality indicators demonstrating consensus across different stakeholders is one way of addressing this agenda. Moreover, this study has also shown that one way of addressing the agenda of including users in health care39 is their involvement in consensus techniques.


This study has shown significant differences in ratings about what constitutes quality of primary mental health care across multiple stakeholders. These findings highlight a wide range of potential factors which may influence individual panellists’ ratings in a consensus technique, including the way in which indicators are interpreted and the way in which quality of care is conceptualised.

Key messages

  • Stakeholders in primary mental health care (that is, patients, carers, professionals and managers) define quality of care differently.

  • While mental health is a core part of primary care, there are few validated quality measures and little relevant internationally published research. Consensus panel methods are a useful means of developing quality measures where evidence is sparse and/or opinions are diverse.

  • Little is known about the dynamics of consensus techniques and the factors that influence the judgements and ratings of panels and individual panellists.

  • In an 11 panel Delphi procedure including patients, carers, professionals and managers, agreement within panels was high but there were differences between panels, particularly between the patient/carer panels and the GP/psychiatrist panels.

  • Individual panellist’s ratings were influenced by their value systems, experiences, expectations, and power relationships.


The authors thank all the panellists who took part in the Delphi rating process and all individuals and organisations who contributed to the study; Delphi panellists who agreed to be interviewed; and Dr Mark Hann (NPCRDC, University of Manchester) for statistical advice.


Linked Articles

  • Quality lines
    BMJ Publishing Group Ltd